A data analyst is working with a dataset containing customer ages. They notice several missing values and also some extreme outliers in the age column. Which imputation method should the analyst use to fill the missing values while minimizing the influence of the outliers?
The median is the most appropriate choice because it is robust to outliers. The mean would be skewed by the extreme values, leading to inaccurate imputations. The mode is typically used for categorical data, not continuous numeric data like age. Imputing with a constant value like zero would distort the statistical properties of the age distribution and is generally not a good practice unless zero has a specific meaning in the context of the data.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is the median robust to outliers?
Open an interactive chat with Bash
When is it appropriate to use the mean for imputation?
Open an interactive chat with Bash
What is an outlier in a dataset, and how do you identify one?