CompTIA DataX DY0-001 (V1) Practice Question

You are tuning a logistic-regression fraud detector trained on 455 000 real and 5 000 fraudulent transactions (≈ 1 % positives). A baseline model built on the imbalanced data yields an average F1 of 0.12 under stratified 5-fold cross-validation (CV). You then apply random oversampling so that the training split is 50 / 50 positive-to-negative, keeping the validation folds untouched. After retraining, you observe:

Training-set F1: 0.93
Cross-validated F1: 0.10
Which explanation best accounts for the drop in CV performance despite the much higher training score?

Duplicating the same minority transactions through random oversampling caused the model to overfit to those repeats, inflating training F1 but hurting generalization.
Oversampling only shifts the decision threshold without affecting learned parameters; the lower CV F1 is expected until you retune the threshold.
Oversampling should always lower variance, so the CV drop indicates target leakage between your folds rather than any overfitting problem.
The oversampler injected label noise that increases model bias; therefore training F1 should have fallen, so the discrepancy must come from a metric-calculation error.

CompTIA DataX DY0-001 (V1)

Machine Learning

Your Score:

SAVE $64

CompTIA DataX Voucher

v1 / DY0-001

$529.00 $465.00

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA DataX DY0-001 (V1) Practice Question

Answer Description

Ask Bash

What is overfitting in logistic regression?

What does stratified 5-fold cross-validation mean?

How does random oversampling affect the training process?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

CompTIA DataX DY0-001 (V1) Practice Question

Report Issue

Answer Description

Ask Bash

What is overfitting in logistic regression?

What does stratified 5-fold cross-validation mean?

How does random oversampling affect the training process?

Report Issue