CompTIA DataX DY0-001 (V1) Practice Question

A data scientist is building a decision tree classifier to predict customer churn. They are evaluating a potential split on a categorical feature. The parent node contains 100 samples, with 50 belonging to the 'Churn' class and 50 to the 'No Churn' class. The proposed split creates two child nodes:

Child Node 1: 60 samples, with 40 'Churn' and 20 'No Churn'.
Child Node 2: 40 samples, with 10 'Churn' and 30 'No Churn'. To evaluate the quality of this split, what is the weighted Gini impurity?

0.417
0.410
0.500
0.083

Report Issue

Answer Description

The correct answer is 0.417. The weighted Gini impurity is calculated by finding the Gini impurity for each child node and then computing their weighted average based on the number of samples in each node.

Step 1: Calculate Gini Impurity for Child Node 1

The proportion of 'Churn' is p1 = 40/60 = 2/3.
The proportion of 'No Churn' is p2 = 20/60 = 1/3.
Gini(Node 1) = 1 - (p1^{2 + p2}2) = 1 - ((2/3)^2 + (1/3)^2) = 1 - (4/9 + 1/9) = 1 - 5/9 = 4/9 ≈ 0.444.

Step 2: Calculate Gini Impurity for Child Node 2

The proportion of 'Churn' is p1 = 10/40 = 1/4.
The proportion of 'No Churn' is p2 = 30/40 = 3/4.
Gini(Node 2) = 1 - (p1^{2 + p2}2) = 1 - ((1/4)^2 + (3/4)^2) = 1 - (1/16 + 9/16) = 1 - 10/16 = 6/16 = 0.375.

Step 3: Calculate the Weighted Gini Impurity

The weight for Node 1 is the number of samples in it divided by the total samples in the parent: w1 = 60/100 = 0.6.
The weight for Node 2 is w2 = 40/100 = 0.4.
Weighted Gini = (w1 * Gini(Node 1)) + (w2 * Gini(Node 2)) = (0.6 * 4/9) + (0.4 * 0.375) ≈ 0.267 + 0.150 = 0.417.

Incorrect Answer Analysis:

0.500 is the Gini impurity of the parent node (1 - (0.5^{2 + 0.5}2) = 0.5), not the weighted impurity of the split.
0.083 is the Information Gain (Gini Gain), which is calculated by subtracting the weighted Gini impurity from the parent node's Gini impurity (0.500 - 0.417 = 0.083).
0.410 is an incorrect calculation, likely resulting from taking a simple average of the two child node impurities instead of a weighted average.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.

What is Gini Impurity in decision trees?

Open an interactive chat with Bash

Why use a weighted Gini impurity for child nodes?

Open an interactive chat with Bash

How is Information Gain related to Gini Impurity?

Open an interactive chat with Bash

CompTIA DataX DY0-001 (V1)

Mathematics and Statistics

Your Score:

SAVE $64

CompTIA DataX Voucher

v1 / DY0-001

$529.00 $465.00

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA DataX DY0-001 (V1) Practice Question

Answer Description

Ask Bash

What is Gini Impurity in decision trees?

Why use a weighted Gini impurity for child nodes?

How is Information Gain related to Gini Impurity?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams