A consumer-loyalty survey records each customer's self-reported monthly spend. Exploratory analysis shows that the probability a value is missing rises after controlling for all observed predictors and is highest for customers whose true monthly spend (observed for a hold-out audit sample) is very large. You need unbiased regression coefficients that use the spend variable.
Which description of the missingness mechanism and corresponding modelling strategy is most appropriate in this situation?
Missing values are structural; replace them with zero to retain the full sample.
The data are missing completely at random; simply drop cases with missing spend values.
The data are not missing at random; use a selection or pattern-mixture model that explicitly links spend and the missing-data indicator.
The data are missing at random; multiple imputation based only on observed demographics is sufficient.
Because the chance that a response is missing depends on the unreported spend itself-an unobserved value-the data are not missing at random (MNAR). Under MNAR the missing-data mechanism is non-ignorable, so you must model it jointly with the outcome. Selection models, pattern-mixture models, or similar joint-likelihood approaches incorporate this dependency and can deliver unbiased estimates. Treating the data as MCAR (listwise deletion) or MAR (standard multiple imputation based only on observed covariates) leaves residual bias, and filling the blanks with zero assumes a structural absence that does not exist in the scenario.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What does 'Not Missing At Random (MNAR)' mean?
Open an interactive chat with Bash
What is the difference between MCAR, MAR, and MNAR?