CompTIA DataX DY0-001 (V1) Practice Question

A data science team is developing a fraud detection model using a highly imbalanced dataset where fraudulent transactions represent only 0.5% of the data. To improve the model's ability to recognize the minority class, the team decides to generate synthetic data. Their chosen method involves selecting an instance from the minority class, identifying its k-nearest neighbors within the same class, and then creating a new data point along the line segments connecting the instance to its neighbors. Which sampling-based technique for synthetic data generation does this process describe?

Synthetic Minority Over-sampling Technique (SMOTE)
Bootstrap aggregating (Bagging)
Rejection sampling
Stratified sampling

Report Issue

Answer Description

The correct answer describes the Synthetic Minority Over-sampling Technique (SMOTE). This process is designed specifically to address class imbalance by creating new, synthetic instances of the minority class. It works by selecting a minority class sample, finding its k-nearest minority class neighbors, and generating a new sample at a random point along the line segment connecting the original sample and one of its randomly selected neighbors.

Stratified sampling is incorrect because it is a technique used to partition a dataset (e.g., into training and testing sets) while preserving the original percentage of samples for each class. It does not create new, synthetic data points.
Bootstrap aggregating, or bagging, is an ensemble learning method that involves creating multiple subsets of the original data through sampling with replacement. While it uses sampling, it duplicates existing data points rather than creating novel synthetic ones through interpolation.
Rejection sampling is a statistical method for generating observations from a target distribution by sampling from a simpler proposal distribution and accepting or rejecting the samples based on a specific criterion. Its mechanism is different from the neighbor-based interpolation described in the scenario.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.

What is SMOTE and how does it work?

Open an interactive chat with Bash

Why is SMOTE preferred over simply duplicating minority class samples?

Open an interactive chat with Bash

How does SMOTE differ from other sampling techniques like stratified sampling or bagging?

Open an interactive chat with Bash

CompTIA DataX DY0-001 (V1)

Operations and Processes

Your Score:

SAVE $64

CompTIA DataX Voucher

v1 / DY0-001

$529.00 $465.00

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA DataX DY0-001 (V1) Practice Question

Answer Description

Ask Bash

What is SMOTE and how does it work?

Why is SMOTE preferred over simply duplicating minority class samples?

How does SMOTE differ from other sampling techniques like stratified sampling or bagging?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams