A data‐science team trains a GradientBoostingClassifier with these key settings:
n_estimators = 400
learning_rate = 0.10
max_depth = 4
subsample = 1.0
The model attains an F1-score of 0.96 on the training data but only 0.80 on a held-out validation set. Because of limited compute, the team must reduce overfitting without noticeably increasing training time, and they may adjust exactly one hyper-parameter. Which single change is most likely to achieve this goal?
Keep parameters unchanged except doubling n_estimators to 800.
Raise learning_rate to 0.30 and cut n_estimators to 150.
Increase max_depth to 8 to capture higher-order feature interactions.
Lower subsample to 0.7 so each tree is trained on a random 70 % of the rows.
Setting subsample below 1.0 turns standard gradient boosting into stochastic gradient boosting, where each tree is fit on a random subset of the training data. This randomization acts as regularization, lowers variance, and typically shortens training because each tree sees fewer records. The other options either increase model capacity (higher depth), increase the effective learning rate (larger learning_rate with fewer trees), or lengthen training time (doubling the number of estimators) and therefore are unlikely to reduce overfitting under the given constraints.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is overfitting in machine learning?
Open an interactive chat with Bash
How does changing the subsample parameter to 0.7 help reduce overfitting?
Open an interactive chat with Bash
What is the difference between stochastic gradient boosting and standard gradient boosting?