A data scientist is tasked with building a regression model to predict customer lifetime value. The dataset contains a large number of features (p > 150), and a preliminary analysis using a correlation matrix and variance inflation factors (VIFs) has revealed significant multicollinearity among several key predictors. The goal is to create a model that is both parsimonious, by performing automatic feature selection, and robust to the effects of these correlated features. Which of the following modeling techniques is most suitable for simultaneously addressing both of these requirements?
The correct answer is Elastic Net regression. This model is specifically designed to handle situations with a high number of features and multicollinearity.
Elastic Net combines the L1 (Lasso) and L2 (Ridge) penalties. The L1 penalty performs feature selection by shrinking some coefficients to exactly zero, creating a parsimonious model. The L2 penalty effectively handles multicollinearity by shrinking the coefficients of correlated predictors together, rather than arbitrarily selecting one. This dual approach directly addresses both requirements outlined in the scenario.
LASSO Regression uses an L1 penalty and is effective for feature selection. However, when faced with a group of highly correlated variables, it tends to arbitrarily select only one and shrink the others to zero, which can lead to model instability and loss of information.
Ridge Regression uses an L2 penalty and is excellent for managing multicollinearity. However, it does not perform feature selection; it shrinks coefficients towards zero but will not set them exactly to zero, meaning all features remain in the model.
Ordinary Least Squares (OLS) Regression does not use any regularization. It is highly sensitive to multicollinearity, which can lead to unstable and unreliable coefficient estimates, and it does not perform any feature selection, making it prone to overfitting with a large number of features.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why does Elastic Net combine L1 and L2 penalties?
Open an interactive chat with Bash
What is multicollinearity, and why is it a problem for regression models?
Open an interactive chat with Bash
How does LASSO differ from Ridge Regression when handling correlated features?