CompTIA DataX DY0-001 (V1) Practice Question

A data scientist is building a decision tree classifier to predict customer churn. At a specific node containing 20 samples, 10 customers have churned and 10 have not. The scientist is evaluating two features, 'Contract Type' and 'Has Tech Support', to determine the optimal split. The results of splitting by each feature are as follows:

Split by 'Contract Type':
- Node A ('Month-to-Month'): 12 samples (9 Churn, 3 No Churn)
- Node B ('One/Two Year'): 8 samples (1 Churn, 7 No Churn)
Split by 'Has Tech Support':
- Node C ('Yes'): 10 samples (3 Churn, 7 No Churn)
- Node D ('No'): 10 samples (7 Churn, 3 No Churn)

Given that the algorithm uses entropy to maximize information gain, which of the following conclusions is correct?

The 'Contract Type' feature should be selected because its resulting split has a lower weighted average entropy (approximately 0.705) than the 'Has Tech Support' split (approximately 0.881).
The information gain for both splits is equal, so the Gini index must be calculated to determine the optimal feature.
The 'Has Tech Support' feature should be selected because its resulting split has a lower weighted average entropy than the 'Contract Type' split.
The 'Has Tech Support' feature should be selected because its child nodes are perfectly balanced in size (10 samples each), which maximizes the reduction in impurity.

Report Issue

Answer Description

The correct answer is that 'Contract Type' should be selected because its split results in a lower weighted average entropy. The goal of a decision tree split is to maximize Information Gain, which is equivalent to minimizing the weighted average entropy of the child nodes.

The calculation is as follows:

Calculate the entropy for each child node. The formula for entropy is: E = -p * log2(p) - (1-p) * log2(1-p).
- E(Node A) (9/12 Churn): -( (9/12) * log2(9/12) + (3/12) * log2(3/12) ) ≈ 0.811
- E(Node B) (1/8 Churn): -( (1/8) * log2(1/8) + (7/8) * log2(7/8) ) ≈ 0.544
- E(Node C) (3/10 Churn): -( (3/10) * log2(3/10) + (7/10) * log2(7/10) ) ≈ 0.881
- E(Node D) (7/10 Churn): -( (7/10) * log2(7/10) + (3/10) * log2(3/10) ) ≈ 0.881
Calculate the weighted average entropy for each split. The formula is the sum of (samples_in_child / total_samples) * entropy_of_child.
- W_avg_entropy('Contract Type') = (12/20) * 0.811 + (8/20) * 0.544 = 0.6 * 0.811 + 0.4 * 0.544 ≈ 0.487 + 0.218 = 0.705
- W_avg_entropy('Has Tech Support') = (10/20) * 0.881 + (10/20) * 0.881 = 0.5 * 0.881 + 0.5 * 0.881 = 0.881
Compare the results. The split on 'Contract Type' (0.705) has a lower weighted average entropy than the split on 'Has Tech Support' (0.881). Therefore, 'Contract Type' yields a higher information gain and is the better split.

The other options are incorrect. The 'Has Tech Support' split has a higher weighted entropy, making it the less desirable choice. The balance of sample sizes in the child nodes for 'Has Tech Support' does not guarantee higher information gain; the purity of the classes within those nodes is what matters. Finally, calculating the Gini index is an alternative to entropy, not a necessary tie-breaker.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.

What is entropy in the context of decision trees?

Open an interactive chat with Bash

What is information gain and why is it important in decision trees?

Open an interactive chat with Bash

How is the weighted average entropy calculated for a split?

Open an interactive chat with Bash

CompTIA DataX DY0-001 (V1)

Mathematics and Statistics

Your Score:

SAVE $64

CompTIA DataX Voucher

v1 / DY0-001

$529.00 $465.00

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA DataX DY0-001 (V1) Practice Question

Answer Description

Ask Bash

What is entropy in the context of decision trees?

What is information gain and why is it important in decision trees?

How is the weighted average entropy calculated for a split?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams