A data science team is tasked with selecting a model for real-time fraud detection. The key business requirement is to identify the maximum number of fraudulent transactions to minimize financial losses, while accepting that this may lead to a higher number of legitimate transactions being flagged for manual review. The test dataset is highly imbalanced.
The final performance measures for the three candidate models are:
Model X: Accuracy: 99.85%, Precision: 0.98, Recall: 0.80, F1-Score: 0.88
Model Y: Accuracy: 99.70%, Precision: 0.82, Recall: 0.95, F1-Score: 0.88
Model Z: Accuracy: 99.90%, Precision: 0.99, Recall: 0.78, F1-Score: 0.87
Given the business requirement, which model represents the best choice based on these final performance measures?
Model Y, because its superior Recall score directly aligns with the primary business objective of catching the most fraudulent transactions.
Model Z, because its overall accuracy is the highest, indicating the best performance on the entire dataset.
Model X, because its F1-Score is tied for the highest, and its superior Precision will reduce the workload on the manual review team.
Either Model X or Model Y, as their identical F1-Scores indicate equivalent overall performance, and the choice depends on further business clarification.
The correct answer is the model with the highest Recall. Recall, also known as sensitivity or the true positive rate, measures the model's ability to identify all actual positive instances. In this fraud detection scenario, a 'positive' is a fraudulent transaction. The business objective is to minimize financial losses by catching as many fraudulent transactions as possible, which means minimizing false negatives (fraudulent transactions classified as legitimate). Therefore, maximizing Recall is the primary goal.
Model Y has the highest Recall (0.95), meaning it correctly identifies 95% of all actual fraudulent transactions. This directly aligns with the stated business requirement.
Model X has a higher Precision, meaning that when it flags a transaction as fraudulent, it is highly likely to be correct. However, its lower Recall (0.80) means it misses more fraudulent transactions than Model Y, which contradicts the primary business goal.
Model Z has the highest accuracy, but accuracy is a misleading metric in highly imbalanced datasets. A model that predicts 'not fraud' for every transaction would still achieve a very high accuracy.
Although Model X and Model Y have identical F1-Scores, the F1-Score is a harmonic mean of Precision and Recall. When there is a specific business priority, such as minimizing false negatives, the metric that directly measures that priority (Recall) should be the deciding factor.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is Recall more important than Precision in this scenario?
Open an interactive chat with Bash
What issues arise when using Accuracy on a highly imbalanced dataset?
Open an interactive chat with Bash
What is the advantage of using the F1-Score, and why was it not prioritized here?