A data analyst is working with a dataset containing customer ages. They notice several missing values and also some extreme outliers in the age column. Which imputation method should the analyst use to fill the missing values while minimizing the influence of the outliers?
The median is the most appropriate choice because it is robust to outliers. The mean would be skewed by the extreme values, leading to inaccurate imputations. The mode is typically used for categorical data, not continuous numeric data like age. Imputing with a constant value like zero would distort the statistical properties of the age distribution and is generally not a good practice unless zero has a specific meaning in the context of the data.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is the median more robust than the mean?
Open an interactive chat with Bash
What are some alternatives to median for imputing missing values?
Open an interactive chat with Bash
How does median-based imputation affect data distribution?