You are building an ID3 decision tree on a binary customer-churn data set that contains 20 observations (12 churners, 8 non-churners). Two categorical predictors are being considered for the first split:
ContractType splits the data into "Monthly" (10 churners, 2 non-churners) and "Annual" (2 churners, 6 non-churners).
SupportTier splits the data into "High" (6 churners, 1 non-churner) and "Low" (6 churners, 7 non-churners). Using Shannon entropy (log base 2) to calculate information gain, which attribute will the ID3 algorithm choose for the root node?
ID3 cannot decide between the two attributes because the class distribution is imbalanced
ContractType, because its information gain is approximately 0.26 bits
Both attributes provide the same information gain, so either could be chosen
SupportTier, because its information gain is approximately 0.12 bits
Split on SupportTier: High (7/20) entropy ≈ 0.59 bits; Low (13/20) entropy ≈ 0.99 bits. Weighted child entropy = 0.35 × 0.59 + 0.65 × 0.99 ≈ 0.85 bits. Information gain = 0.97 − 0.85 ≈ 0.12 bits.
Because ContractType yields the larger reduction in entropy, ID3 (which greedily selects the attribute with the highest information gain) will choose ContractType for the first split. SupportTier and the remaining distractors are incorrect because they either have lower gain, an equal-gain assumption that is untrue, or misinterpret the effect of class imbalance.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is entropy in the context of decision trees?
Open an interactive chat with Bash
What is information gain, and how does it guide decision tree splits?
Open an interactive chat with Bash
Why does the ID3 algorithm choose ContractType over SupportTier in this case?