A data scientist is building a decision tree classifier to predict customer churn. They are evaluating a potential split on a categorical feature. The parent node contains 100 samples, with 50 belonging to the 'Churn' class and 50 to the 'No Churn' class. The proposed split creates two child nodes:
Child Node 1: 60 samples, with 40 'Churn' and 20 'No Churn'.
Child Node 2: 40 samples, with 10 'Churn' and 30 'No Churn'. To evaluate the quality of this split, what is the weighted Gini impurity?
The correct answer is 0.417. The weighted Gini impurity is calculated by finding the Gini impurity for each child node and then computing their weighted average based on the number of samples in each node.
0.500 is the Gini impurity of the parent node (1 - (0.52 + 0.52) = 0.5), not the weighted impurity of the split.
0.083 is the Information Gain (Gini Gain), which is calculated by subtracting the weighted Gini impurity from the parent node's Gini impurity (0.500 - 0.417 = 0.083).
0.410 is an incorrect calculation, likely resulting from taking a simple average of the two child node impurities instead of a weighted average.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.