CompTIA DataX DY0-001 (V1) Practice Question

A data science team has developed a binary classification model to predict fraudulent financial transactions. The historical dataset is severely imbalanced, with fraudulent transactions (the positive class) accounting for only 0.1% of all records. The initial model reports an accuracy of 99.9%. The lead data scientist is concerned this metric is misleading and could mask poor performance in identifying actual fraud.

Which of the following metrics would provide the most reliable and balanced evaluation of this classifier's performance, given the severe class imbalance?

Area Under the ROC Curve (AUC)
Matthews Correlation Coefficient (MCC)
Accuracy
F1 Score

Report Issue

Answer Description

The correct answer is the Matthews Correlation Coefficient (MCC). MCC is considered a highly reliable and balanced performance metric for binary classification, especially when dealing with severe class imbalance. It produces a high score only if the classifier performs well on all four parts of the confusion matrix (True Positives, True Negatives, False Positives, and False Negatives), providing a comprehensive view of the model's performance.

Accuracy is incorrect because it is highly misleading in imbalanced scenarios. A model that simply predicts the majority class (non-fraudulent) for every transaction would achieve 99.9% accuracy but would be useless as it would fail to identify any fraudulent cases. This is known as the accuracy paradox.
F1 Score is a better choice than accuracy but is not the most reliable in this context. The F1 score is the harmonic mean of precision and recall and focuses on the positive class. However, it does not include True Negatives in its calculation. In a scenario like fraud detection, correctly identifying non-fraudulent transactions (True Negatives) is also critically important, and the F1 score's exclusion of this makes it less balanced than MCC.
Area Under the ROC Curve (AUC) is also a common metric for imbalanced data, as it evaluates a model's ability to discriminate between classes across all classification thresholds. However, some research suggests it can be overly optimistic on imbalanced datasets, and it primarily measures the ranking quality rather than the quality of predictions at a specific threshold. MCC provides a single, balanced score reflecting the quality of the confusion matrix itself, making it a more direct and reliable measure for this specific evaluation task.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.

Why is the MCC metric preferred over Accuracy in imbalanced datasets?

Open an interactive chat with Bash

How does MCC differ from the F1 score in evaluating model performance?

Open an interactive chat with Bash

What role does the confusion matrix play in calculating MCC?

Open an interactive chat with Bash

CompTIA DataX DY0-001 (V1)

Mathematics and Statistics

Your Score:

SAVE $64

CompTIA DataX Voucher

v1 / DY0-001

$529.00 $465.00

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA DataX DY0-001 (V1) Practice Question

Answer Description

Ask Bash

Why is the MCC metric preferred over Accuracy in imbalanced datasets?

How does MCC differ from the F1 score in evaluating model performance?

What role does the confusion matrix play in calculating MCC?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams