You are preparing a set of numeric customer-behavior features for a k-means clustering model. One of the variables, lifetime_value, is highly right-skewed and contains several extreme outliers that would dominate Euclidean distance calculations if left untreated. You want each feature to contribute proportionally to the distance metric without letting those few large values distort the scale. Which preprocessing technique should you apply before running the clustering algorithm?
Apply a robust scaler that centers on the median and scales by the interquartile range.
Apply min-max scaling to force every feature into a 0-1 range.
Apply Z-score standardization so each feature has mean 0 and standard deviation 1.
Apply a logarithmic transformation followed by min-max scaling.
Robust scaling uses the median as its center and the interquartile range (IQR) as its scale factor. Because both the median and IQR are resistant to extreme values, the transformation keeps most observations on a comparable scale while preventing a handful of very large lifetime_value records from stretching the range of the whole variable.
Min-max scaling rescales the data between fixed bounds such as 0 and 1, but the presence of even a single extreme value pushes all other observations into a narrow interval, so outliers still dominate. Standardization (Z-score) centers data on the mean and scales by the standard deviation; since both statistics are sensitive to outliers, the resulting values can still be overstretched by extreme cases. A log transform followed by min-max scaling can reduce skew but will still tie the upper bound of the scale to the largest remaining value, offering less protection than an IQR-based approach. Therefore, robust scaling is the most appropriate choice for this scenario.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is Robust Scaling particularly suited for handling outliers?
Open an interactive chat with Bash
How does Euclidean distance affect clustering when using raw data with outliers?
Open an interactive chat with Bash
How does Min-Max Scaling compare to Robust Scaling in handling outliers?
Open an interactive chat with Bash
CompTIA Data+ DA0-002 (V2)
Data Acquisition and Preparation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .