A data scientist is building a decision tree classifier to predict customer churn. At a specific node containing 20 samples, 10 customers have churned and 10 have not. The scientist is evaluating two features, 'Contract Type' and 'Has Tech Support', to determine the optimal split. The results of splitting by each feature are as follows:
Split by 'Contract Type':
Node A ('Month-to-Month'): 12 samples (9 Churn, 3 No Churn)
Node B ('One/Two Year'): 8 samples (1 Churn, 7 No Churn)
Split by 'Has Tech Support':
Node C ('Yes'): 10 samples (3 Churn, 7 No Churn)
Node D ('No'): 10 samples (7 Churn, 3 No Churn)
Given that the algorithm uses entropy to maximize information gain, which of the following conclusions is correct?
The 'Contract Type' feature should be selected because its resulting split has a lower weighted average entropy (approximately 0.705) than the 'Has Tech Support' split (approximately 0.881).
The 'Has Tech Support' feature should be selected because its child nodes are perfectly balanced in size (10 samples each), which maximizes the reduction in impurity.
The 'Has Tech Support' feature should be selected because its resulting split has a lower weighted average entropy than the 'Contract Type' split.
The information gain for both splits is equal, so the Gini index must be calculated to determine the optimal feature.
The correct answer is that 'Contract Type' should be selected because its split results in a lower weighted average entropy. The goal of a decision tree split is to maximize Information Gain, which is equivalent to minimizing the weighted average entropy of the child nodes.
The calculation is as follows:
Calculate the entropy for each child node. The formula for entropy is: E = -p * log2(p) - (1-p) * log2(1-p).
Compare the results. The split on 'Contract Type' (0.705) has a lower weighted average entropy than the split on 'Has Tech Support' (0.881). Therefore, 'Contract Type' yields a higher information gain and is the better split.
The other options are incorrect. The 'Has Tech Support' split has a higher weighted entropy, making it the less desirable choice. The balance of sample sizes in the child nodes for 'Has Tech Support' does not guarantee higher information gain; the purity of the classes within those nodes is what matters. Finally, calculating the Gini index is an alternative to entropy, not a necessary tie-breaker.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is entropy in the context of decision trees?
Open an interactive chat with Bash
What is information gain and why is it important in decision trees?
Open an interactive chat with Bash
How is the weighted average entropy calculated for a split?