A data scientist trains a single CART decision-tree classifier that is allowed to grow until every leaf node is pure. The model attains 100 % accuracy on the 2 000-row training set but only 62 % accuracy on a held-out test set. The scientist wants to primarily reduce the model's variance without introducing a large amount of additional bias. Which action is most likely to achieve this goal?
Restrict the existing tree's maximum depth to two levels.
Set the minimum number of samples per leaf from 1 to 2.
Train hundreds of bootstrap-sampled trees with random feature sub-sampling and average their predictions.
Replace the Gini impurity splitting criterion with entropy.
Averaging the predictions of many bootstrap-sampled trees that each split on random subsets of features (a random forest) lowers the variance of an individual high-variance decision tree. The aggregation keeps each tree unbiased while decorrelating their errors, so the ensemble's bias grows only slightly compared with an aggressively pruned tree. Hard-pruning the original tree to depth 2 would cut variance but at the cost of a large bias increase. Raising min_samples_leaf from 1 to 2 adds minimal regularisation and is unlikely to solve severe overfitting. Switching from Gini impurity to entropy seldom affects generalisation because both criteria choose similar splits; it therefore does not meaningfully address the variance problem.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a random forest and how does it reduce variance?
Open an interactive chat with Bash
What is the difference between Gini impurity and entropy in decision trees?
Open an interactive chat with Bash
Why does pruning increase bias but reduce variance?