A data scientist develops a multiple linear regression model to predict housing prices. Upon evaluation, a plot of the model's residuals versus its fitted values reveals a distinct fan shape, where the vertical spread of the residuals increases as the predicted housing price increases. Which of the following statements describes the most critical implication of this observation for the model's statistical inference?
The model suffers from severe multicollinearity, making it difficult to isolate the individual impact of each predictor variable.
The standard errors of the coefficients are biased, rendering hypothesis tests and confidence intervals unreliable.
The coefficient estimates are biased, leading to a systematic overestimation or underestimation of the true population parameters.
The residuals are not normally distributed, which violates the primary assumption required for the coefficient estimates to be valid.
The correct answer is that the standard errors of the coefficients are biased, which renders hypothesis tests and confidence intervals unreliable. The fan-shaped pattern in the residual plot is a classic indicator of heteroskedasticity, which means the variance of the error term is not constant across all levels of the independent variables. In the presence of heteroskedasticity, Ordinary Least Squares (OLS) coefficient estimates remain unbiased, but they are no longer efficient (i.e., not BLUE - Best Linear Unbiased Estimators). The primary issue for statistical inference is that the formulas used to calculate the variance and standard errors of the coefficients, which assume homoskedasticity, become biased. This bias in the standard errors leads to unreliable t-statistics, p-values, and confidence intervals, potentially causing the analyst to draw incorrect conclusions about the statistical significance of the predictor variables.
The coefficient estimates themselves do not become biased due to heteroskedasticity in an OLS model. Multicollinearity is a separate issue related to high correlation between predictor variables, not the variance of the residuals. While the normality of residuals is another OLS assumption, the fan shape specifically points to non-constant variance (heteroskedasticity), not necessarily a deviation from a normal distribution.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is heteroskedasticity?
Open an interactive chat with Bash
How do you detect heteroskedasticity in a regression model?
Open an interactive chat with Bash
How can you address heteroskedasticity in a model?