A manufacturing company deployed a gradient-boosted model to predict bearing failures from streaming sensor data. Two weeks after a firmware update changed the calibration of the vibration sensors, the model's precision fell from 0.82 to 0.55 even though the proportion of actual failures in the field remained at 3.4 %. Subsequent analysis shows that the mean and variance of multiple vibration-related features have shifted by more than two standard deviations, but the conditional relationship between those features and the failure label appears unchanged. Which phenomenon is the most likely root cause of the model's performance degradation?
Concept drift because the physical mechanism of bearing failure has evolved
Data drift (covariate shift) caused by the firmware-induced change in input feature distributions
Model over-fitting resulting from excessively high variance during initial training
Data leakage introduced by inadvertently training on target-related features
The firmware update altered the statistical distribution of several input features (covariate shift) while the underlying mapping from features to the failure label stayed the same. This is the textbook definition of data drift. Data drift (also called feature or covariate drift) occurs when P(X) changes but P(Y|X) remains stationary; such a mismatch between the training and production input distribution causes a loss of predictive power even though the concept being modeled has not changed. Concept drift would require the relationship P(Y|X) itself to change, data leakage involves improper inclusion of future or target-related variables during training, and classic over-fitting/under-fitting problems originate in the training process rather than in a post-deployment shift in feature statistics.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is data drift in machine learning?
Open an interactive chat with Bash
How does concept drift differ from data drift?
Open an interactive chat with Bash
What steps can be taken to monitor and address data drift?