CompTIA DataX DY0-001 (V1) Practice Question

A data scientist is developing a linear regression model to predict housing prices. An initial analysis of the residuals versus fitted values plot shows a distinct curvilinear pattern with variance increasing as the predicted value increases, indicating the presence of both non-linearity and heteroscedasticity. The dependent variable, house_price, is right-skewed and contains several valid entries that are exactly zero, representing land-only sales. The goal is to transform the house_price variable to better meet the assumptions of linear regression.

Which of the following data transformation techniques is the most appropriate and robust approach in this scenario?

A standard Box-Cox transformation on the dependent variable.
A standard logarithmic transformation on the dependent variable.
A Box-Cox transformation after adding a constant to the dependent variable.
One-hot encoding the dependent variable.

Report Issue

Answer Description

The correct answer is to apply a Box-Cox transformation after adding a small, constant value to the dependent variable. The scenario describes heteroscedasticity (non-constant variance, seen as a 'fan' or 'cone' shape in the residual plot) and a non-linear relationship, which power transformations are designed to address. The Box-Cox transformation is a powerful statistical method that finds an optimal power transformation to stabilize variance and improve linearity. However, a critical limitation of the standard Box-Cox transformation is that it requires all data to be strictly positive. Since the dataset contains zero values, the most robust solution is to add a small constant (e.g., 1) to every observation of the house_price variable before applying the transformation, making all values positive.

A standard logarithmic transformation is incorrect because the logarithm of zero is undefined. While a log(y+1) transformation is a possible alternative, the Box-Cox method is more general and statistically robust because it algorithmically finds the optimal lambda parameter for the power transform, rather than assuming a log transform is best.
A standard Box-Cox transformation is incorrect because it cannot be applied directly to data containing zero values. This choice fails to account for the data constraints mentioned in the scenario.
One-hot encoding is incorrect because it is a technique used to convert categorical variables into a numerical format for modeling. It is not applicable to a continuous dependent variable like house_price in a regression problem.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.

Why is the Box-Cox transformation preferred over a standard logarithmic transformation in this scenario?

Open an interactive chat with Bash

What does adding a small constant (e.g., 1) achieve in the Box-Cox transformation?

Open an interactive chat with Bash

What does heteroscedasticity mean, and why is it problematic in linear regression?

Open an interactive chat with Bash

CompTIA DataX DY0-001 (V1)

Modeling, Analysis, and Outcomes

Your Score:

SAVE $64

CompTIA DataX Voucher

v1 / DY0-001

$529.00 $465.00

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA DataX DY0-001 (V1) Practice Question

Answer Description

Ask Bash

Why is the Box-Cox transformation preferred over a standard logarithmic transformation in this scenario?

What does adding a small constant (e.g., 1) achieve in the Box-Cox transformation?

What does heteroscedasticity mean, and why is it problematic in linear regression?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams