During model evaluation, you compare two neural-network regressors trained on the same 5 000-row tabular dataset using 10-fold cross-validation:
Regressor A: training RMSE = 3.2, validation RMSE = 3.3
Regressor B: training RMSE = 1.1, validation RMSE = 4.4
You decide to keep Regressor B but want to lower its validation error without substantially raising its training error. According to the bias-variance tradeoff, which single change is most likely to improve the model's generalization performance?
Disable dropout and early stopping so the network can train until the training loss is minimal.
Double the number of hidden units and/or layers to let the network capture more complex patterns.
Keep the architecture unchanged but switch from 10-fold to 3-fold cross-validation for evaluation.
Increase the strength of L2 weight-decay regularization (or prune parameters) to constrain the network.
Regressor B shows a large gap between training and validation errors, which is characteristic of low bias and high variance (overfitting). The most direct remedy is to constrain the model so it cannot fit noise in the training data.
Adding or strengthening L2 (weight-decay) regularization shrinks weights and effectively reduces model complexity. This increases bias slightly but typically cuts variance enough to lower the total expected error, improving validation performance.
Increasing the number of hidden units (or layers) does the opposite-it raises capacity and variance. Disabling dropout and early stopping removes existing regularizers, again increasing variance. Changing 10-fold CV to 3-fold only alters the error estimate; it does not change the learned weights or the variance problem, so generalization will not materially improve.
Therefore, raising the L2 regularization strength (or otherwise simplifying the network) is the appropriate intervention.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the bias-variance tradeoff in machine learning?
Open an interactive chat with Bash
How does L2 weight-decay regularization help reduce overfitting?
Open an interactive chat with Bash
Why does changing from 10-fold to 3-fold cross-validation not solve overfitting?