CompTIA DataX DY0-001 (V1) Practice Question

A data scientist develops a classification model to identify fraudulent financial transactions. The test dataset contains 1,000,000 transactions, of which 1,000 (0.1%) are fraudulent. After testing, the model produces the following confusion matrix:

	Predicted: Fraud	Predicted: Not Fraud
Actual: Fraud	800 (TP)	200 (FN)
Actual: Not Fraud	500 (FP)	998,500 (TN)

The primary business objective is to minimize the number of missed fraudulent transactions (False Negatives), even at the cost of flagging some legitimate transactions for review (False Positives). Given this objective and the severe class imbalance, which performance metric provides the most relevant assessment of the model's effectiveness for its intended purpose?

Accuracy
Precision
Recall
Matthews Correlation Coefficient (MCC)

Report Issue

Answer Description

The correct answer is Recall.

Recall (Sensitivity or True Positive Rate) is calculated as TP / (TP + FN). It measures the proportion of actual positive cases that the model correctly identified. In this scenario, Recall = 800 / (800 + 200) = 80%. This metric directly addresses the business objective of minimizing missed fraudulent transactions (False Negatives). A high recall indicates that the model is effective at identifying the vast majority of actual fraud cases.

Accuracy is incorrect because it is a misleading metric for datasets with severe class imbalance. It is calculated as (TP + TN) / Total, which in this case is (800 + 998,500) / 1,000,000 = 99.93%. While this number seems very high, a naive model that predicts "Not Fraud" for every transaction would achieve 99.9% accuracy, making it a poor indicator of the model's ability to detect the rare positive class.

Precision is incorrect in this context. Precision is calculated as TP / (TP + FP) and measures the proportion of positive predictions that were actually correct. Here, Precision = 800 / (800 + 500) = 61.5%. This metric is important when the cost of a False Positive is high. However, the business objective explicitly prioritizes minimizing False Negatives over False Positives, making Recall the more relevant metric.

Matthews Correlation Coefficient (MCC) is a sophisticated and generally robust metric for imbalanced datasets because it considers all four cells of the confusion matrix. However, the question asks for the metric that is most relevant to the specific business objective of minimizing False Negatives. While MCC provides a balanced, overall score, Recall is the most direct and explicit measure of the model's performance against that particular goal.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.

Why is recall a better metric than accuracy for imbalanced datasets?

Open an interactive chat with Bash

What is the difference between recall and precision?

Open an interactive chat with Bash

When should we consider using Matthews Correlation Coefficient (MCC)?

Open an interactive chat with Bash

CompTIA DataX DY0-001 (V1)

Mathematics and Statistics

Your Score:

SAVE $64

CompTIA DataX Voucher

v1 / DY0-001

$529.00 $465.00

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA DataX DY0-001 (V1) Practice Question

Answer Description

Ask Bash

Why is recall a better metric than accuracy for imbalanced datasets?

What is the difference between recall and precision?

When should we consider using Matthews Correlation Coefficient (MCC)?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams