CompTIA DataX DY0-001 (V1) Practice Question

A data scientist is dealing with a binary fraud-detection dataset that contains 1 000 000 observations, of which only 0.2 % are labeled as fraud. The model of choice is a gradient-boosted decision tree. The scientist plans to mitigate the extreme class imbalance with the Synthetic Minority Over-sampling Technique (SMOTE) and to assess performance with 5-fold stratified cross-validation before evaluating on a separate, untouched test set whose class distribution mirrors production.

Which procedure is the most appropriate for oversampling in this scenario so that the minority class is strengthened without introducing optimistic validation bias or excessive overfitting?

  • Run SMOTE on the entire dataset first so that synthetic minority records are present in every cross-validation fold.

  • Inside each cross-validation fold, apply SMOTE solely to the training partition, then train the model on that augmented data and validate on the untouched fold hold-out.

  • Build an ensemble that draws bootstrap samples from the majority class only, keeping each minority instance exactly once in every bootstrap replica.

  • Before cross-validation, duplicate every minority-class record 499 times to obtain a perfectly balanced 1:1 class ratio, then train and validate on this expanded dataset.

CompTIA DataX DY0-001 (V1)
Mathematics and Statistics
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot