A financial services company is developing a question-answering system to help its compliance officers quickly find answers within a large, private corpus of regulatory documents. A critical requirement is that the answers provided by the system must be exact snippets from the source documents to ensure auditability and prevent misinterpretation. Which NLP approach is the most suitable for this use case?
A topic modeling approach using Latent Dirichlet Allocation (LDA).
An abstractive question-answering model.
An extractive question-answering model.
A generative pre-trained transformer (GPT) configured for open-dialogue.
The correct answer is an extractive question-answering model. This approach is designed to identify and extract a continuous span of text directly from a source document that answers a given question. This aligns perfectly with the requirement for providing 'exact snippets' to ensure auditability and prevent the model from generating new, potentially misinterpreted information.
An abstractive question-answering model is incorrect because it generates new text by interpreting and summarizing the source information. This process of generation, while creating more human-like answers, introduces the risk of altering the original meaning, making it unsuitable for a strict compliance environment.
A topic modeling approach like Latent Dirichlet Allocation (LDA) is also incorrect. LDA is an unsupervised technique used to identify abstract topics within a collection of documents, not to answer specific questions by extracting text.
A generative pre-trained transformer (GPT) configured for open-dialogue is not the best choice. These models are designed to generate creative and conversational text. In a closed-domain context with private documents, they carry a risk of hallucination (generating plausible but incorrect information) and are not inherently designed to restrict their output to verbatim source text, failing the auditability requirement.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is an extractive question-answering model preferred for compliance-related tasks?
Open an interactive chat with Bash
What are the risks of using abstractive models in compliance scenarios?
Open an interactive chat with Bash
How does Latent Dirichlet Allocation (LDA) differ from question-answering models?