A data scientist is analyzing a large, multi-site clinical trial dataset. During exploratory data analysis, it's discovered that a number of entries for the 'Resting Heart Rate' variable are missing. Which of the following scenarios provides the strongest evidence that the data for 'Resting Heart Rate' is Missing Completely At Random (MCAR)?
A single data collection device at a high-volume urban clinic was found to be improperly calibrated. All readings from this device were flagged as invalid and subsequently removed during the data cleaning phase.
Due to a software bug, the data collection application failed to save the 'Resting Heart Rate' entry for approximately 5% of patients. The failures occurred unpredictably across all clinical sites and demographic groups.
The study protocol allowed clinicians to skip the 'Resting Heart Rate' measurement for patients whose blood pressure was within a normal range, as it was deemed less critical. Blood pressure data is fully recorded for all patients.
Patients who reported experiencing palpitations, a condition often correlated with high resting heart rates, were more likely to have their measurement postponed by clinicians, leading to missing entries.
Data is considered Missing Completely At Random (MCAR) when the probability of a value being missing is entirely independent of both the observed data and the unobserved (missing) data. In other words, the cause of the missingness is a purely random event.
The correct scenario describes a random software glitch causing data loss unpredictably across all sites and patient groups. This is a classic example of an external, random event that is not correlated with any patient characteristics (observed variables) or the resting heart rate values themselves (unobserved data), fitting the definition of MCAR perfectly.
The scenario where missingness is linked to patient blood pressure is an example of Missing At Random (MAR), because the probability of data being missing depends on another observed variable (blood pressure).
The scenario where patients with palpitations (often linked to high heart rates) have more missing values is an example of Missing Not At Random (MNAR). Here, the missingness is related to the value of the 'Resting Heart Rate' variable itself, even though that value is unobserved.
The scenario involving an improperly calibrated device at a single clinic results in data that is likely MAR, not MCAR. The missingness is systematic to one clinic, so the probability of being missing is dependent on the 'clinic' variable. This is not a random process across the entire dataset.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What distinguishes MCAR (Missing Completely At Random) from MAR (Missing At Random)?
Open an interactive chat with Bash
How does missingness classified as MNAR (Missing Not At Random) differ from MCAR or MAR?
Open an interactive chat with Bash
Why is the scenario involving a software bug considered a classic case of MCAR?