While preparing word embeddings from a 20-million-sentence corpus, a data scientist decides to use the GloVe algorithm rather than a predictive approach such as skip-gram with negative sampling. Which characteristic of GloVe's learning objective distinguishes it from those purely predictive models?
It relies on hierarchical softmax to approximate the full softmax over a large vocabulary during negative sampling.
It factorizes a TF-IDF term-document matrix with truncated singular value decomposition to obtain low-rank word vectors.
It maximizes the conditional probability of each context word given a target word using a full or sampled softmax output layer.
It minimizes a weighted least-squares loss so that the dot product of a word and a context vector equals the logarithm of their co-occurrence count, thereby preserving ratios of co-occurrence probabilities.
GloVe first constructs a global word-word co-occurrence matrix and then learns two sets of vectors by **minimizing a weighted least-squares loss that forces the dot product of a word vector and a context vector to approximate the logarithm of their co-occurrence count (or equivalently, the log of the co-occurrence probability). Because the logarithm of a ratio is the difference of logarithms, this objective makes the resulting vectors explicitly encode ratios of co-occurrence probabilities-information that is not captured when a model simply maximizes the conditional likelihood of a context word given a target word. Predictive models such as skip-gram rely on softmax-based maximum-likelihood objectives (often approximated with negative sampling or hierarchical softmax) rather than on a global least-squares formulation, and latent semantic analysis (LSA) factorizes a term-document matrix, not a word-context co-occurrence matrix. Therefore, the only statement that correctly identifies what makes GloVe unique is the weighted least-squares formulation that ties vector dot products to the log of co-occurrence counts.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a co-occurrence matrix in natural language processing?
Open an interactive chat with Bash
How does GloVe preserve ratios of co-occurrence probabilities?
Open an interactive chat with Bash
What is the difference between GloVe and the skip-gram model with negative sampling?