CompTIA DataX DY0-001 (V1) Practice Question

A data scientist is preparing to hand off a machine-learning pipeline that supports drug-trial decisions to the company's compliance team. The source code lives in Git, and the model artifacts are pushed to an internal registry, but the training script

pip install -r requirements.txt     # uses numpy>=1.20, pandas>=1.4
python train.py --data s3://trial-bucket/study_data.csv

Six months later the auditors re-run the same Git commit and obtain different model coefficients because both the S3 object and several Python packages have silently changed. According to data-science life-cycle best practices, which single additional action would have most directly prevented this reproducibility failure?

Record the pseudo-random seed used during training and store it in the model registry metadata.
Increase the hold-out test set from 20 % to 30 % so that validation scores have lower variance.
Schedule weekly retraining jobs that always pull the newest dataset and latest package versions, overwriting the previous model artifact.
Version the exact training dataset and commit a dependency lock file that pins every package and hash alongside the model code.

CompTIA DataX DY0-001 (V1)

Operations and Processes

Your Score:

SAVE $64

CompTIA DataX Voucher

v1 / DY0-001

$529.00 $465.00

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA DataX DY0-001 (V1) Practice Question

Answer Description

Ask Bash

What does it mean to version a training dataset?

What is a dependency lock file, and why is it important?

How does logging a random seed improve reproducibility, and where are its limitations?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

CompTIA DataX DY0-001 (V1) Practice Question

Report Issue

Answer Description

Ask Bash

What does it mean to version a training dataset?

What is a dependency lock file, and why is it important?

How does logging a random seed improve reproducibility, and where are its limitations?

Report Issue