CompTIA DataX DY0-001 (V1) Practice Question

A data scientist is building a fraud detection model for a financial institution. The historical transaction dataset is highly imbalanced, with fraudulent transactions (the minority class) accounting for only 0.5% of the data. A baseline model trained on this data shows high accuracy but has an extremely low recall for the fraud class. The scientist needs to apply a mitigation technique to rebalance the training data. Which of the following approaches best addresses the class imbalance by creating new, varied examples for the minority class, thereby reducing the specific risk of overfitting that arises from simple duplication?

Synthetic Minority Oversampling Technique (SMOTE)
Randomly undersampling the non-fraudulent (majority) class
Randomly oversampling the fraudulent (minority) class by duplication
Applying L2 regularization to the baseline model

Report Issue

Answer Description

The correct answer is the Synthetic Minority Oversampling Technique (SMOTE). SMOTE works by creating new, synthetic data points for the minority class. It selects a minority class instance, finds its k-nearest minority class neighbors, and then generates synthetic instances by interpolating between the selected instance and its neighbors. This method is superior to simple oversampling because it generates new, plausible examples rather than just duplicating existing ones, which helps the model generalize better and reduces the risk of overfitting.

Randomly undersampling the majority class is incorrect because it involves removing a large number of potentially informative samples from the majority class, which can lead to a loss of information and a biased model.
Randomly oversampling the minority class is a less effective approach because it simply duplicates existing minority class samples. This lack of new information can lead to overfitting, where the model learns the specific duplicated examples instead of the underlying patterns of fraud.
Applying L2 regularization is incorrect in this context. Regularization is a technique used to prevent overfitting by penalizing large model coefficients, but it does not directly address the problem of class imbalance caused by an underrepresented minority class.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.

What is the Synthetic Minority Oversampling Technique (SMOTE)?

Open an interactive chat with Bash

How is SMOTE different from random oversampling?

Open an interactive chat with Bash

Why is undersampling the majority class not ideal?

Open an interactive chat with Bash

CompTIA DataX DY0-001 (V1)

Machine Learning

Your Score:

SAVE $64

CompTIA DataX Voucher

v1 / DY0-001

$529.00 $465.00

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA DataX DY0-001 (V1) Practice Question

Answer Description

Ask Bash

What is the Synthetic Minority Oversampling Technique (SMOTE)?

How is SMOTE different from random oversampling?

Why is undersampling the majority class not ideal?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams