A customer analytics team is cleaning a dataset that contains customer age (fully observed), loyalty tier (fully observed), and total annual spending, of which about 18 % of the values are missing. Exploratory analysis shows that customers who are younger and those in the highest loyalty tier are less likely to report spending. However, within any given age-tier combination, the probability that spending is missing is unrelated to the true (unobserved) spending amount. Which description best characterizes the missingness mechanism for the spending variable in this situation?
Missing Completely at Random (MCAR); missingness is unrelated to any observed or unobserved variables.
Missing Completely at Random due to a random data-entry glitch that uniformly deleted 18 % of spending values across the dataset.
Missing Not at Random (MNAR); higher or lower spending directly influences the chance that the value is missing, even after accounting for age and tier.
Missing at Random (MAR); the probability of a missing spending value depends only on the observed age and loyalty tier.
The missingness depends on two fully observed variables-age and loyalty tier-but, conditional on them, it is not related to the spending values that are actually missing. This matches the definition of Missing at Random (MAR). Under MAR, the missing-data mechanism is considered ignorable for likelihood-based models or multiple imputation, provided the observed predictors that drive missingness are included in the analysis. The mechanism is not Missing Completely at Random (MCAR) because younger, high-tier customers have a higher propensity for missingness, and it is not Missing Not at Random (MNAR) because spending itself does not influence whether it is missing once age and tier are taken into account.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the difference between MAR, MCAR, and MNAR in data analysis?
Open an interactive chat with Bash
Why is MAR considered ignorable in likelihood-based models or imputation?
Open an interactive chat with Bash
What techniques can be used to handle MAR missing data?