A data scientist is comparing two multiple linear regression models, Model A and Model B, to predict customer lifetime value using a dataset containing 500 observations. The goal is to select the model that offers the best balance between goodness-of-fit and parsimony.
Model A was built with 5 explanatory variables and has a maximized log-likelihood of -150.
Model B was built with 8 explanatory variables and has a maximized log-likelihood of -145.
The data scientist calculates both the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) to aid in the selection. Which of the following statements BEST describes the outcome of this comparison?
AIC will favor Model B, while BIC will favor Model A because BIC's penalty for model complexity is more severe given the sample size.
Both AIC and BIC will favor Model B because its substantially higher log-likelihood outweighs the penalty for additional parameters in both criteria.
AIC will favor Model A, while BIC will favor Model B because AIC is known to prefer simpler models while BIC is more focused on predictive accuracy.
Both AIC and BIC will favor Model A due to the principle of parsimony, as the small improvement in log-likelihood for Model B does not justify its increased complexity.
The correct choice is determined by calculating and comparing the AIC and BIC values for both models. The model with the lower value for a given criterion is preferred.
The formulas are:
AIC = 2k - 2 * ln(L)
BIC = k * ln(n) - 2 * ln(L) Where:
k = number of parameters
n = number of observations
ln(L) = maximized log-likelihood
For this scenario, n = 500, so ln(500) is approximately 6.21.
For AIC, Model B (306) has a lower score than Model A (310), so AIC favors Model B.
For BIC, Model A (331.05) has a lower score than Model B (339.68), so BIC favors Model A.
This occurs because the penalty for model complexity in BIC's formula (k * ln(n)) is larger than the penalty in AIC's formula (2k) when the number of observations (n) is greater than 7 (since ln(n) > 2). With n=500, BIC imposes a much stronger penalty on the three additional parameters in Model B, outweighing the benefit of its better log-likelihood. AIC's smaller penalty allows the improved fit of Model B to result in a better overall score.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the main difference between AIC and BIC when comparing models?
Open an interactive chat with Bash
Why does BIC impose a stronger penalty for complexity when n is large?
Open an interactive chat with Bash
Why did AIC favor Model B, but BIC favored Model A in this scenario?