CompTIA DataX DY0-001 (V1) Practice Question

Your organization's GitHub repository contains code for an ML pipeline while the training data (≈200 GB) lives in an Amazon S3 bucket that is overwritten every week. Compliance rules require that anyone who checks out any past Git commit can automatically restore exactly the dataset that was used for that commit, without bloating the repository or exceeding GitHub file-size limits. Which approach best satisfies these requirements?

  • Package every weekly snapshot as a compressed archive and upload it as a GitHub release asset referenced by a repository tag.

  • Store the full dataset in Git Large File Storage so each commit contains a pointer to the data blobs managed by Git LFS.

  • Track the dataset with DVC: commit the lightweight .dvc pointer files to Git and configure an S3 DVC remote so that "git checkout" followed by "dvc pull" retrieves the exact snapshot.

  • Enable S3 object versioning and save the object version IDs in a YAML configuration file that the pipeline reads at runtime.

CompTIA DataX DY0-001 (V1)
Operations and Processes
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot