Your team's internal audit checklist for regulated machine-learning projects requires that every transformation or training function be fully reproducible from the information stored in the repository. While reviewing the docstring of the Python helper prepare_features(), you find that it already contains a concise purpose statement, descriptions of all parameters and return values, and an executable usage example. The function performs a stratified sampling step that relies on a pseudorandom number generator. Auditors have flagged the docstring as still missing one piece of information that is critical for deterministic re-runs. Which item should you add to the docstring before the code is merged?
A list of hex color codes used in downstream visualization notebooks.
An ASCII flowchart that illustrates the entire data-processing pipeline.
The Git commit hash where prepare_features() was first introduced.
The fixed random seed or random_state value used by the function.
Documenting the fixed random seed (or random_state) makes the stochastic sampling in prepare_features() reproducible; anyone rerunning the code with the same data can obtain exactly the same splits and derived features. Reproducibility guidelines for ML workflows explicitly call out seeding the RNG as a first step toward determinism. Adding pipeline diagrams, commit hashes, or color palettes may be helpful elsewhere (README, Git log, report notebooks), but none of those items controls nondeterministic behavior at execution time, so they do not satisfy the auditors' requirement.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is documenting the random seed important for reproducibility?
Open an interactive chat with Bash
What is a pseudorandom number generator and why does it rely on seeds?
Open an interactive chat with Bash
How does specifying `random_state` in Python affect ML workflows?