Your data-science team maintains a Git repository whose training pipeline is orchestrated with DVC. Developers currently pass learning-rate, batch_size, and other hyperparameters on the command line, so the values are not captured in version control or the DVC lock file, making past results hard to reproduce. You must introduce a method that versions every hyperparameter alongside the code and automatically invalidates the train stage so that CI/CD can trigger dvc repro whenever a parameter value changes. Which implementation best follows industry best practices for hyperparameter version control?
Export the hyperparameters as protected environment variables in the CI system and rely on the build log to retrieve them when needed.
Save the hyperparameters in a MongoDB collection keyed by the current Git commit hash and have the training script query the database at runtime.
Create an annotated Git tag for each experiment and write the hyperparameter values into the tag message so they can be viewed later with git show.
Move all hyperparameters into a params.yaml file, list the relevant keys under the params: section of the train stage in dvc.yaml, and commit both params.yaml and the resulting dvc.lock to Git.
Placing the hyperparameters in a plain-text params.yaml file, declaring the specific keys in the params: section of the train stage, and committing both params.yaml and the generated dvc.lock file to Git satisfies both requirements. DVC treats the listed keys as parameter dependencies; when the value of any listed key changes, DVC records the new value in dvc.lock and marks the stage as out-of-date, so dvc status and CI pipelines can automatically rerun the stage. The parameters are stored in the same Git history as the code, guaranteeing full reproducibility.
The other options fail to meet at least one requirement:
Environment variables captured only in build logs are not tracked by Git and do not participate in DVC's dependency graph, so changes will not trigger dvc repro.
Storing parameters in an external MongoDB collection divorces them from the repository's history and prevents DVC from detecting changes.
Git tags with free-text messages preserve values but do not integrate with DVC's automatic change detection and can be forgotten or overwritten, risking inconsistency.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the `params.yaml` file and why is it important in DVC pipelines?
Open an interactive chat with Bash
How does DVC's parameter dependency feature ensure reproducibility?
Open an interactive chat with Bash
Why are the other solutions unsuitable for hyperparameter version control?