CompTIA DataX DY0-001 (V1) Practice Question

A data-science team is developing a binary classifier that predicts equipment failure seven days ahead from two years of hourly sensor readings. The engineer follows this workflow:

(1) remove rows that contain any null sensor value; (2) compute a 24-hour rolling mean for every sensor and append it as a new feature; (3) randomly split the resulting data into 80 % training and 20 % test sets; (4) fit a StandardScaler on the training split and apply the scaler to both splits; (5) train a gradient-boosting classifier; (6) evaluate accuracy on the test split.

The offline test accuracy is 0.93, but the model's accuracy on live streaming data drops to 0.64.

Which single step in this workflow is the most likely cause of the data-leakage that explains the performance drop, and why?

Step (4) - Scaling the data with StandardScaler fitted on the training split; this is the correct way to scale and does not cause leakage.
Step (3) - Randomly splitting time-stamped data; this puts future observations in the training set and lets the model learn about events that occur after some test instances, creating temporal data leakage.
Step (2) - Computing the 24-hour rolling mean before the split; the feature engineering leaks test values into training features and inflates accuracy.
Step (1) - Eliminating rows with missing readings; this reduces sample size but does not provide the model with information about future failures.

CompTIA DataX DY0-001 (V1)

Machine Learning

Your Score:

SAVE $64

CompTIA DataX Voucher

v1 / DY0-001

$529.00 $465.00

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA DataX DY0-001 (V1) Practice Question

Answer Description

Ask Bash

Why does randomly splitting time-stamped data cause temporal data leakage?

What is the correct way to split time-stamped data to avoid data leakage?

How does a rolling mean in feature engineering avoid causing data leakage?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

CompTIA DataX DY0-001 (V1) Practice Question

Report Issue

Answer Description

Ask Bash

Why does randomly splitting time-stamped data cause temporal data leakage?

What is the correct way to split time-stamped data to avoid data leakage?

How does a rolling mean in feature engineering avoid causing data leakage?

Report Issue