A data scientist develops an Ordinary Least Squares (OLS) regression model to predict housing prices. After fitting the model, a residual plot is generated by plotting the model's residuals against the predicted values. The plot reveals that the variance of the residuals increases as the predicted housing prices increase, forming a distinct cone shape. Which OLS assumption is violated, and what is the primary consequence of this violation?
The assumption of no multicollinearity is violated. The plot indicates that independent variables are highly correlated, leading to unstable coefficient estimates.
The assumption of homoscedasticity is violated. This leads to biased standard errors for the regression coefficients, making hypothesis tests and confidence intervals unreliable.
The assumption of homoscedasticity is violated. The primary consequence is that the coefficient estimates become biased and inconsistent.
The assumption of linearity is violated. The model's coefficients are now biased, consistently over- or underestimating the true population parameters.
The correct answer identifies that the assumption of homoscedasticity is violated and that the primary consequence is unreliable standard errors, which invalidates statistical inference. The scenario describes a residual plot with a cone shape, where the spread of residuals increases with the predicted values. This is a classic visual indicator of heteroscedasticity, which is the violation of the OLS assumption of homoscedasticity (constant variance of errors).
In the presence of heteroscedasticity, the OLS coefficient estimators are still unbiased and consistent. However, they are no longer the Best Linear Unbiased Estimators (BLUE) because they are not efficient (i.e., they do not have the minimum variance). The standard formulas used to calculate the standard errors of the coefficients become biased, typically leading to underestimation. This bias in the standard errors renders t-tests, F-tests, and confidence intervals unreliable.
The other options are incorrect:
Stating that coefficient estimates become biased and inconsistent due to heteroscedasticity is incorrect. OLS estimators remain unbiased under heteroscedasticity.
The assumption of linearity is a different assumption. A violation of linearity would typically appear as a clear pattern, such as a curve, in the residuals, but not the fanning-out cone shape described.
The assumption of no multicollinearity relates to the correlation between independent variables, not the pattern of residuals against predicted values. Multicollinearity is diagnosed using tools like the Variance Inflation Factor (VIF), not a residual plot.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What does homoscedasticity mean in OLS regression?
Open an interactive chat with Bash
How can you detect heteroscedasticity in a regression model?
Open an interactive chat with Bash
What are some techniques for addressing heteroscedasticity in regression models?