A data science team is developing a model to predict customer churn for a subscription service. The historical data reveals a significant class imbalance, with only 5% of customers churning. The team's new deep learning model achieves 95.5% accuracy on the test set. To justify the model's value and demonstrate it has learned a meaningful pattern, which of the following is the most critical benchmark to establish?
Benchmark the deep learning model against a Gradient Boosting Machine (GBM) to evaluate relative performance.
Perform extensive hyperparameter tuning to increase the model's accuracy above 96%.
Compare the model's performance against a baseline that always predicts the majority class ('no churn').
Recalculate the model's performance using F1-score and Area Under the ROC Curve (AUC).
The correct answer is to compare the model against a baseline that always predicts the majority class. In a dataset with a 95%/5% class split, a simple model that always predicts the majority class ('no churn') would achieve 95% accuracy. This is known as the Zero-Rule (ZeroR) baseline. The deep learning model's 95.5% accuracy is only a marginal improvement over this naive baseline, suggesting it may have very little practical predictive power. Establishing this benchmark is the most critical first step to determine if the complex model provides any real value before investing more resources in tuning or comparing it to other complex models. While calculating F1-score and AUC is important for imbalanced data, these metrics are most insightful when also compared against the baseline's performance. Benchmarking against another complex model like a GBM is a subsequent step in model selection, not the initial justification of value over a naive strategy. Hyperparameter tuning is premature if the model has not yet proven its worth against the simplest possible approach.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a Zero-Rule (ZeroR) baseline model?
Open an interactive chat with Bash
Why is accuracy not a sufficient metric for imbalanced datasets?
Open an interactive chat with Bash
What alternative metrics can be used to evaluate models in imbalanced datasets?