While preparing a production-ready Python package, a data-science team is refactoring a 300-line Jupyter notebook that performs feature engineering and model training. The notebook is full of hard-coded file paths and hyperparameters, and changing them between experiments routinely breaks unit tests. From a clean code perspective, which single refactoring step will most directly improve both readability and testability of the new codebase?
Move all file paths and hyperparameters into a YAML configuration file that is loaded at start-up and injected into the pipeline functions.
Define the file paths and hyperparameters as module-level global variables so they can be accessed without passing extra arguments.
Delete all function docstrings and rely on inline comments to keep individual functions as short as possible.
Rewrite the entire feature-engineering logic as a single nested list comprehension to minimize the number of functions.
Externalizing tunable values into a structured configuration file cleanly separates configuration from behavior. By loading the YAML (or JSON) file at runtime and passing its contents into functions, the code sheds magic numbers and hidden paths, becomes easier to understand, and can be unit-tested by injecting alternative settings. The other approaches add implicit state, obscure intent, or remove documentation and therefore hinder maintainability and testing.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is using a YAML configuration file better than hard-coding file paths and hyperparameters?
Open an interactive chat with Bash
What is the advantage of passing the configuration into functions rather than using global variables?
Open an interactive chat with Bash
What makes YAML a good choice for configuration files compared to other formats like JSON?