CompTIA DataX DY0-001 (V1) Practice Question

A data scientist is building a multiple linear regression model to predict housing prices. The initial model, using only the living area in square feet as a predictor, yields an R-squared value of 0.65. To improve the model, the data scientist adds ten additional predictor variables, including number of bedrooms, number of bathrooms, and age of the house. The new model results in an R-squared value of 0.78. Which of the following is the most critical consideration for the data scientist when interpreting this increase in R-squared?

An R-squared of 0.78 indicates that 78% of the model's predictions for house prices will be correct.
The increase from 0.65 to 0.78 definitively proves that the additional variables have strong predictive power and the new model is superior.
The new R-squared value is high, which invalidates the p-values of the individual coefficients in the model.
The R-squared value will almost always increase when more predictors are added to the model, regardless of their actual significance, potentially leading to overfitting.

Report Issue

Answer Description

The correct answer highlights a key limitation of the R-squared metric. R-squared measures the proportion of the variance in the dependent variable that is explained by the independent variables. A critical characteristic of R-squared is that it will either increase or stay the same whenever a new predictor variable is added to the model, even if that variable has no real relationship with the outcome. This mathematical property means that simply observing an increase in R-squared after adding more variables is not sufficient evidence of a better model. It may indicate that the model is becoming overly complex and fitting to the noise in the training data (overfitting), rather than capturing the true underlying relationships. Therefore, the most critical consideration is recognizing that this increase is expected and could be misleading.

The other options are incorrect. Stating that the increase definitively proves the new model is superior is a flawed interpretation because it ignores the risk of overfitting and the inherent tendency of R-squared to increase. R-squared does not measure predictive accuracy in terms of the percentage of correct predictions; it measures the proportion of explained variance. A high R-squared value does not invalidate the p-values of the coefficients; these are separate (though related) diagnostic measures of a regression model.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.

Why does R-squared increase when more predictors are added?

Open an interactive chat with Bash

What is overfitting in the context of regression models?

Open an interactive chat with Bash

How can a data scientist avoid overfitting when adding predictors?

Open an interactive chat with Bash

CompTIA DataX DY0-001 (V1)

Mathematics and Statistics

Your Score:

SAVE $64

CompTIA DataX Voucher

v1 / DY0-001

$529.00 $465.00

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA DataX DY0-001 (V1) Practice Question

Answer Description

Ask Bash

Why does R-squared increase when more predictors are added?

What is overfitting in the context of regression models?

How can a data scientist avoid overfitting when adding predictors?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams