A data‐science team is replacing a scikit‐learn GradientBoostingClassifier with XGBoost because the original model suffers from severe overfitting on a wide tabular data set. Their primary goal is to tightly control model complexity by directly penalizing large leaf weights (and optionally excessive numbers of leaves) during training. Which built-in XGBoost feature should they rely on to meet this goal?
L1 and L2 regularization terms (reg_alpha / reg_lambda) built into the objective function
Use of second-order (Hessian) information when computing the best split
Training trees with the histogram-based ("hist") tree_method for faster node expansion
Automatic treatment of missing values when selecting split directions
XGBoost augments the standard gradient-boosting objective with both an L1 (reg_alpha) and an L2 (reg_lambda) regularization term that operates on the weights assigned to each leaf and, together with the gamma term, forms a formal complexity penalty. Increasing these regularization parameters shrinks leaf scores toward zero (or sets them to zero for L1), discouraging overly large leaf predictions and thus reducing variance and overfitting.
Automatic handling of missing values improves robustness but does not constrain model complexity. Using second-order gradients accelerates and stabilizes optimization but likewise does not penalize tree size or weights. The histogram-based tree builder primarily speeds up training and lowers memory use; it is not a regularization mechanism. Therefore, leveraging XGBoost's explicit L1/L2 regularization is the correct way to address the stated objective.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Can you explain L1 and L2 regularization in simple terms?
Open an interactive chat with Bash
How does overfitting relate to large leaf weights in decision trees?