A machine learning engineer is developing a predictive model for housing prices using a neural network. The dataset includes a zip_code feature with over 30,000 unique values. The engineer is concerned that using one-hot encoding for this high-cardinality feature will lead to extreme sparsity and the curse of dimensionality. To address this, the engineer implements an embedding layer for the zip_code feature. What is the primary advantage of using embeddings in this specific scenario?
It converts each unique zip_code into a unique integer, preserving all original information in a format that is directly usable by the model.
It replaces each zip_code with a single numerical value representing its frequency or its average target value, reducing dimensionality without a neural network.
It performs principal component analysis (PCA) on the one-hot encoded vectors to reduce the feature space to a smaller set of linear components.
It creates a dense, lower-dimensional vector representation that captures latent relationships between zip codes based on their association with the target variable (housing prices).
The correct answer is that embeddings create a dense, lower-dimensional vector representation that captures latent relationships. For high-cardinality categorical features like zip codes, one-hot encoding results in a very high-dimensional and sparse feature space. An embedding layer, trained as part of the neural network, learns to map each zip code to a dense vector of a much smaller, predefined size. During training, the model adjusts these vectors so that zip codes with similar effects on housing prices are positioned closer to each other in the embedding space, thereby capturing complex, non-linear, and latent relationships (e.g., geographic proximity or similar demographic characteristics).
Label encoding, which assigns a unique integer to each zip code, is incorrect because it imposes an arbitrary and misleading ordinal relationship that does not exist in the data. Using frequency or target encoding is also incorrect; while these are valid techniques, they differ from embeddings and do not create a learned multi-dimensional representation. Applying Principal Component Analysis (PCA) is a separate, linear dimensionality reduction technique that is not what an embedding layer inherently does.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What are embeddings in machine learning?
Open an interactive chat with Bash
Why is one-hot encoding problematic for high-cardinality features?
Open an interactive chat with Bash
How do embeddings capture relationships between categories?