CompTIA DataX DY0-001 (V1) Practice Question

You are tuning a logistic-regression fraud detector trained on 455 000 real and 5 000 fraudulent transactions (≈ 1 % positives). A baseline model built on the imbalanced data yields an average F1 of 0.12 under stratified 5-fold cross-validation (CV). You then apply random oversampling so that the training split is 50 / 50 positive-to-negative, keeping the validation folds untouched. After retraining, you observe:

  • Training-set F1: 0.93
  • Cross-validated F1: 0.10
    Which explanation best accounts for the drop in CV performance despite the much higher training score?
  • Duplicating the same minority transactions through random oversampling caused the model to overfit to those repeats, inflating training F1 but hurting generalization.

  • Oversampling only shifts the decision threshold without affecting learned parameters; the lower CV F1 is expected until you retune the threshold.

  • Oversampling should always lower variance, so the CV drop indicates target leakage between your folds rather than any overfitting problem.

  • The oversampler injected label noise that increases model bias; therefore training F1 should have fallen, so the discrepancy must come from a metric-calculation error.

CompTIA DataX DY0-001 (V1)
Machine Learning
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot