CompTIA Data+ DA0-002 (V2) Practice Question

While preparing a customer-segmentation dataset for a k-means clustering project, a data analyst notices that the variable annual_spend ranges from 0 to 250 000 dollars, whereas survey_score ranges only from 1 to 5. To keep the larger-scale variable from dominating the Euclidean distance calculation, the analyst wants to re-express each value as the number of standard deviations it lies from that variable's mean. Which data-transformation technique should be applied before running the algorithm?

Convert every numerical feature to its z-score by subtracting the mean and dividing by the standard deviation.
Apply a base-10 logarithm to each feature to compress its scale.
Rescale each feature to the interval 0-1 using its minimum and maximum values.
Replace extreme values in each feature with the 5th- and 95th-percentile values.

CompTIA Data+ DA0-002 (V2)

Data Acquisition and Preparation

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA Data+ DA0-002 (V2) Practice Question

Answer Description

Ask Bash

Why does standardization use the mean and standard deviation?

How does min-max normalization differ from z-score standardization?

Why is z-score standardization better for k-means clustering than other methods?

Monthly

$19.99 $11.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99 $26.99

One time purchase of $26.99,
Does not auto-renew.

Annual Pass

$119.99 $71.99

One time purchase of $71.99,
Does not auto-renew.

Lifetime Pass

$189.99 $113.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

CompTIA Data+ DA0-002 (V2) Practice Question

Report Issue

Answer Description

Ask Bash

Why does standardization use the mean and standard deviation?

How does min-max normalization differ from z-score standardization?

Why is z-score standardization better for k-means clustering than other methods?

Report Issue