CompTIA DataX DY0-001 (V1) Practice Question

A data scientist is preparing to hand off a machine-learning pipeline that supports drug-trial decisions to the company's compliance team. The source code lives in Git, and the model artifacts are pushed to an internal registry, but the training script

pip install -r requirements.txt     # uses numpy>=1.20, pandas>=1.4
python train.py --data s3://trial-bucket/study_data.csv

Six months later the auditors re-run the same Git commit and obtain different model coefficients because both the S3 object and several Python packages have silently changed. According to data-science life-cycle best practices, which single additional action would have most directly prevented this reproducibility failure?

  • Record the pseudo-random seed used during training and store it in the model registry metadata.

  • Increase the hold-out test set from 20 % to 30 % so that validation scores have lower variance.

  • Schedule weekly retraining jobs that always pull the newest dataset and latest package versions, overwriting the previous model artifact.

  • Version the exact training dataset and commit a dependency lock file that pins every package and hash alongside the model code.

CompTIA DataX DY0-001 (V1)
Operations and Processes
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot