CompTIA DataX DY0-001 (V1) Practice Question

A data science team trains an XGBoost model to predict loan default. The library's default feature-importance plot, which uses the gain metric, ranks the variable Customer_ID highest, while Age appears near the bottom. When the team computes permutation importance on a held-out validation set, Age rises to the top and Customer_ID drops sharply. Which explanation best accounts for the conflicting importance rankings?

  • The conflict arises because permutation importance for classification relies on the Gini impurity formula used in regression trees, which is incompatible with XGBoost models.

  • Gain importance tends to inflate the score of features that have many unique values or potential split points, such as an identifier; permutation importance measures the drop in validation performance and is therefore much less affected by this cardinality bias.

  • Gain importance ignores how frequently a feature is selected for splitting, so variables like Age that create large gains only a few times are hidden from the ranking.

  • Permutation importance is calculated only on the training data, so it undervalues features that generalize well and makes Customer_ID look weaker than it really is.

CompTIA DataX DY0-001 (V1)
Machine Learning
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot