I(D) = - Pn log, Pk, k=1 where Pr represents the proportion of training observations in the data D that are the kth class. Nright I(Dright) IG(D, X;) = I(D) – eft (Dieft) – right I(Drighe) -

Linear Algebra: A Modern Introduction
4th Edition
ISBN:9781285463247
Author:David Poole
Publisher:David Poole
Chapter4: Eigenvalues And Eigenvectors
Section4.6: Applications And The Perron-frobenius Theorem
Problem 25EQ
icon
Related questions
Question

This problem is giving me fits. I need to compute the entropy of the root node and I got the answer 0. Can you let me know how to go about solving this problem because I don't think I'm doing it correctly?

**Decision Tree Learning with Information Gain and Entropy**

Consider the training set with a binary response \(Y\) and three predictors \(X_1, X_2, X_3\). The objective is to learn a decision tree from this training set using the information gain (IG) criterion, with entropy as the impurity measurement. Recall the formula for entropy:

\[
I(D) = -\sum_{k=1}^{K} \hat{p}_k \log_2 \hat{p}_k
\]

where \(\hat{p}_k\) represents the proportion of training observations in the data \(D\) that are from the \(k\)-th class.

The information gain when splitting on predictor \(X_j\) is given by:

\[
IG(D, X_j) = I(D) - \frac{N_{\text{left}}}{N} I(D_{\text{left}}) - \frac{N_{\text{right}}}{N} I(D_{\text{right}})
\]

In this formula:
- \(I(D)\) is the entropy of the original dataset.
- \(N_{\text{left}}\) and \(N_{\text{right}}\) are the number of observations in the left and right subsets, respectively, after the split.
- \(I(D_{\text{left}})\) and \(I(D_{\text{right}})\) are the entropies of the left and right subsets, respectively.
- \(N\) is the total number of observations in the dataset.

This approach helps in selecting the best predictor to split the data on, reducing impurity and improving the decision tree's predictive performance.
Transcribed Image Text:**Decision Tree Learning with Information Gain and Entropy** Consider the training set with a binary response \(Y\) and three predictors \(X_1, X_2, X_3\). The objective is to learn a decision tree from this training set using the information gain (IG) criterion, with entropy as the impurity measurement. Recall the formula for entropy: \[ I(D) = -\sum_{k=1}^{K} \hat{p}_k \log_2 \hat{p}_k \] where \(\hat{p}_k\) represents the proportion of training observations in the data \(D\) that are from the \(k\)-th class. The information gain when splitting on predictor \(X_j\) is given by: \[ IG(D, X_j) = I(D) - \frac{N_{\text{left}}}{N} I(D_{\text{left}}) - \frac{N_{\text{right}}}{N} I(D_{\text{right}}) \] In this formula: - \(I(D)\) is the entropy of the original dataset. - \(N_{\text{left}}\) and \(N_{\text{right}}\) are the number of observations in the left and right subsets, respectively, after the split. - \(I(D_{\text{left}})\) and \(I(D_{\text{right}})\) are the entropies of the left and right subsets, respectively. - \(N\) is the total number of observations in the dataset. This approach helps in selecting the best predictor to split the data on, reducing impurity and improving the decision tree's predictive performance.
The text provides information on how to compute the information gain when splitting data using decision tree methods. It includes a table of data and instructions for calculations related to entropy and information gain.

### Data Table:
| X₁ | X₂ | X₃ | Y |
|----|----|----|---|
| 0  | 1  | 1  | 0 |
| 1  | 1  | 1  | 0 |
| 0  | 0  | 0  | 1 |
| 1  | 1  | 0  | 1 |
| 0  | 1  | 0  | 1 |
| 1  | 0  | 1  | 1 |

### Instructions:
(a) **Compute the entropy of the root node.**

(b) **Compute the information gain if we use predictor X₁ to split the data.**

(c) **Repeat step (b) to compute information gains using predictors X₂ and X₃.** Decide which predictor will be used by the decision tree to split the root node. If two predictors have the same information gain, break ties randomly.
Transcribed Image Text:The text provides information on how to compute the information gain when splitting data using decision tree methods. It includes a table of data and instructions for calculations related to entropy and information gain. ### Data Table: | X₁ | X₂ | X₃ | Y | |----|----|----|---| | 0 | 1 | 1 | 0 | | 1 | 1 | 1 | 0 | | 0 | 0 | 0 | 1 | | 1 | 1 | 0 | 1 | | 0 | 1 | 0 | 1 | | 1 | 0 | 1 | 1 | ### Instructions: (a) **Compute the entropy of the root node.** (b) **Compute the information gain if we use predictor X₁ to split the data.** (c) **Repeat step (b) to compute information gains using predictors X₂ and X₃.** Decide which predictor will be used by the decision tree to split the root node. If two predictors have the same information gain, break ties randomly.
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 2 steps with 2 images

Blurred answer
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Linear Algebra: A Modern Introduction
Linear Algebra: A Modern Introduction
Algebra
ISBN:
9781285463247
Author:
David Poole
Publisher:
Cengage Learning