A data scientist is working with a linear regression model and encounters a severe multicollinearity issue, which causes the XᵀX matrix to be singular or near-singular. To address this, they implement ridge regression. The ridge regression solution for the coefficients (β) is given by the formula:
β_ridge = (XᵀX + λI)⁻¹Xᵀy
What is the primary mathematical role of adding the scaled identity matrix, λI, in this specific context?
It is used to standardize the feature variables, ensuring they are on the same scale before the model coefficients are calculated.
It serves as a placeholder to incorporate the model's intercept term into the matrix calculation.
It ensures the (XᵀX + λI) matrix is invertible by adding a positive value to its diagonal elements, which is necessary when XᵀX is singular or ill-conditioned.
It transforms the XᵀX matrix into an orthogonal matrix, thereby completely removing multicollinearity between the predictors.
The correct answer explains the primary role of the identity matrix in the context of ridge regression. When severe multicollinearity exists, the XᵀX matrix becomes singular (its determinant is zero) or near-singular (ill-conditioned), meaning it cannot be inverted. The Ordinary Least Squares (OLS) solution, which relies on (XᵀX)⁻¹, fails in this case. Ridge regression solves this by adding a small, positive constant (λ) to the diagonal elements of the XᵀX matrix. This is achieved by adding λI, where I is the identity matrix. This addition guarantees that the resulting matrix (XᵀX + λI) is non-singular and therefore invertible, providing a stable solution for the coefficients.
The option suggesting standardization is incorrect. While standardizing features is a crucial preprocessing step for ridge regression, it is a separate operation performed on the X matrix itself, not a function of the λI term in the coefficient formula.
The option about transforming XᵀX into an orthogonal matrix is incorrect. Adding λI does not orthogonalize the matrix; it only makes it invertible. Orthogonalization is a different mathematical process.
The option suggesting λI is a placeholder for the intercept is also incorrect. The intercept term is typically handled by augmenting the design matrix X with a column of ones before these calculations are performed.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What causes the `XᵀX` matrix to be singular or near-singular in linear regression?
Open an interactive chat with Bash
Why does adding `λI` to `XᵀX` make the matrix invertible?
Open an interactive chat with Bash
How does the ridge regression penalty (λ) affect the model's coefficients?