You are tasked with adding word sense disambiguation (WSD) to a question-answering system that must operate in a specialized technical domain for which no sense-tagged corpus is available. The method has to assign WordNet-compatible senses immediately, without any supervised training. Which approach best satisfies these constraints?
Build a neural soft-max classifier that predicts senses after supervised training on labeled context-sense pairs for each ambiguous term.
Fine-tune a BERT-based model on thousands of domain sentences labeled with their correct WordNet senses.
Train a decision list classifier from SemCor and additional domain sentences that have been manually tagged with senses.
Use an extended Lesk algorithm that selects the sense whose WordNet gloss and related glosses yield the highest word-overlap with the local context.
The extended Lesk algorithm is a knowledge-based, unsupervised technique. It compares the gloss of each candidate WordNet sense-plus glosses of related concepts such as hypernyms-to the words that appear in the surrounding context and chooses the sense with the greatest overlap. Because it relies solely on an external lexical resource, it can be applied even when zero sense-tagged examples exist. Decision lists, BERT fine-tuning, and other neural soft-max classifiers are supervised approaches: all of them must be trained on thousands of manually sense-annotated sentences before they can predict senses, so they are unsuitable when labeled data is lacking.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the extended Lesk algorithm?
Open an interactive chat with Bash
What is WordNet, and how is it used in WSD?
Open an interactive chat with Bash
Why can't supervised methods like BERT-based models or decision lists be used without labeled data?