CompTIA DataX DY0-001 (V1) Practice Question

A data scientist has developed a multiple linear regression model to predict housing prices. After the initial training, the scientist examines the model's performance by creating a residual vs. fitted values plot. The plot reveals that the residuals are not randomly scattered around the zero line; instead, they form a distinct, parabolic (U-shaped) pattern. What is the most likely issue with the model, and what is the most appropriate next step in the model design iteration process?

  • The model is likely overfitting the training data. The next step should be to increase the L2 regularization penalty (e.g., in a Ridge regression) to reduce the model's complexity.

  • The model exhibits non-linearity, indicating it fails to capture the underlying structure of the data. The next step should be to use feature engineering to create polynomial terms for the relevant predictors.

  • The plot shows evidence of heteroscedasticity, meaning the variance of the errors is not constant. The next step should be to apply a Box-Cox transformation to the response variable to stabilize the variance.

  • The plot reveals multicollinearity among the predictor variables. The next step should be to calculate the Variance Inflation Factor (VIF) for each feature and consider removing highly correlated predictors.

CompTIA DataX DY0-001 (V1)
Modeling, Analysis, and Outcomes
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot