A data scientist develops a classification model to identify fraudulent financial transactions. The test dataset contains 1,000,000 transactions, of which 1,000 (0.1%) are fraudulent. After testing, the model produces the following confusion matrix:
Predicted: Fraud
Predicted: Not Fraud
Actual: Fraud
800 (TP)
200 (FN)
Actual: Not Fraud
500 (FP)
998,500 (TN)
The primary business objective is to minimize the number of missed fraudulent transactions (False Negatives), even at the cost of flagging some legitimate transactions for review (False Positives). Given this objective and the severe class imbalance, which performance metric provides the most relevant assessment of the model's effectiveness for its intended purpose?
Recall (Sensitivity or True Positive Rate) is calculated as TP / (TP + FN). It measures the proportion of actual positive cases that the model correctly identified. In this scenario, Recall = 800 / (800 + 200) = 80%. This metric directly addresses the business objective of minimizing missed fraudulent transactions (False Negatives). A high recall indicates that the model is effective at identifying the vast majority of actual fraud cases.
Accuracy is incorrect because it is a misleading metric for datasets with severe class imbalance. It is calculated as (TP + TN) / Total, which in this case is (800 + 998,500) / 1,000,000 = 99.93%. While this number seems very high, a naive model that predicts "Not Fraud" for every transaction would achieve 99.9% accuracy, making it a poor indicator of the model's ability to detect the rare positive class.
Precision is incorrect in this context. Precision is calculated as TP / (TP + FP) and measures the proportion of positive predictions that were actually correct. Here, Precision = 800 / (800 + 500) = 61.5%. This metric is important when the cost of a False Positive is high. However, the business objective explicitly prioritizes minimizing False Negatives over False Positives, making Recall the more relevant metric.
Matthews Correlation Coefficient (MCC) is a sophisticated and generally robust metric for imbalanced datasets because it considers all four cells of the confusion matrix. However, the question asks for the metric that is most relevant to the specific business objective of minimizing False Negatives. While MCC provides a balanced, overall score, Recall is the most direct and explicit measure of the model's performance against that particular goal.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is recall a better metric than accuracy for imbalanced datasets?
Open an interactive chat with Bash
What is the difference between recall and precision?
Open an interactive chat with Bash
When should we consider using Matthews Correlation Coefficient (MCC)?