While preparing a customer-segmentation dataset for a k-means clustering project, a data analyst notices that the variable annual_spend ranges from 0 to 250 000 dollars, whereas survey_score ranges only from 1 to 5. To keep the larger-scale variable from dominating the Euclidean distance calculation, the analyst wants to re-express each value as the number of standard deviations it lies from that variable's mean. Which data-transformation technique should be applied before running the algorithm?
Apply a base-10 logarithm to each feature to compress its scale.
Rescale each feature to the interval 0-1 using its minimum and maximum values.
Convert every numerical feature to its z-score by subtracting the mean and dividing by the standard deviation.
Replace extreme values in each feature with the 5th- and 95th-percentile values.
Standardization (often called z-score scaling) centers each numeric feature at a mean of 0 and rescales it to a standard deviation of 1. By subtracting the feature mean and dividing by its standard deviation, every value is expressed in standard-deviation units, ensuring that variables measured in different units or ranges contribute equally to distance-based methods such as k-means.
Rescaling to 0-1 (min-max normalization) adjusts the range but does not center the data or guarantee equal variance. A logarithmic transform changes distribution shape rather than aligning means and variances, and winsorization (capping values at the 5th and 95th percentiles) mitigates outliers but leaves variables on their original scales. Therefore, only z-score standardization satisfies the requirement stated in the scenario.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why does standardization use the mean and standard deviation?
Open an interactive chat with Bash
How does min-max normalization differ from z-score standardization?
Open an interactive chat with Bash
Why is z-score standardization better for k-means clustering than other methods?
Open an interactive chat with Bash
CompTIA Data+ DA0-002 (V2)
Data Acquisition and Preparation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .