Introduction to Machine Learning

Supervised Learning - Decision Trees & Evaluation

Splitting Criteria and Tree Construction

Key Metrics for Splitting

Entropy:
- Measures the randomness or disorder in the data.
- Formula: $H = -\sum p_i \log_2 p_i$ , where $p_i$ is the proportion of class $i$ .
- Low Entropy: Indicates homogeneous subsets (e.g., all samples in one class).
- High Entropy: Indicates diverse subsets (e.g., equal distribution of classes).
Gini Impurity:
- Measures the likelihood of incorrect classification for a randomly chosen element.
- Formula: $G = 1 - \sum p_i^2$ .
- Low Gini Impurity: Indicates better separation of classes.
Information Gain:
- Reduction in impurity after splitting.
- Formula: $IG = H(parent) - \text{weighted average } H(children)$ .
- Goal: Maximize information gain to choose the best feature for splitting.
Example:
- For Entropy:
  - Suppose two classes: 50% each → $H = - (0.5 \log_2 0.5 + 0.5 \log_2 0.5) = 1$ .
  - One class dominates: $p = 1$ , $H = 0$ (pure subset).
Gini is computed similarly but avoids logarithms.

Step 1: Root Node Entropy

Class distribution at the root node:

Yes: 9
No: 5

Entropy formula:

$H = -\sum_{i=1}^c p_i \log_2(p_i)$

Root node entropy:

$p_{\text{Yes}} = \frac{9}{14}, \, p_{\text{No}} = \frac{5}{14}$ $H_{\text{root}} = -\left( \frac{9}{14} \log_2 \left( \frac{9}{14} \right) + \frac{5}{14} \log_2 \left( \frac{5}{14} \right) \right)$ $H_{\text{root}} \approx 0.940$

Step 2: Entropy for Subsets

Subset: Sunny

Weather	Play
Sunny	No
Sunny	No
Sunny	No
Sunny	Yes
Sunny	Yes

Entropy:

$p_{\text{Yes}} = \frac{2}{5}, \, p_{\text{No}} = \frac{3}{5}$ $H_{\text{Sunny}} = -\left( \frac{3}{5} \log_2\left(\frac{3}{5}\right) + \frac{2}{5} \log_2\left(\frac{2}{5}\right) \right) \approx 0.971$

Subset: Overcast

Weather	Play
Overcast	Yes
Overcast	Yes
Overcast	Yes
Overcast	Yes

Entropy:

$p_{\text{Yes}} = 1, \, p_{\text{No}} = 0$ $H_{\text{Overcast}} = 0$

Subset: Rainy

Weather	Play
Rainy	Yes
Rainy	Yes
Rainy	No
Rainy	Yes
Rainy	No

Entropy:

$p_{\text{Yes}} = \frac{3}{5}, \, p_{\text{No}} = \frac{2}{5}$ $H_{\text{Rainy}} = -\left( \frac{3}{5} \log_2\left(\frac{3}{5}\right) + \frac{2}{5} \log_2\left(\frac{2}{5}\right) \right) \approx 0.971$

Step 3: Information Gain

Information Gain formula:

$IG = H_{\text{root}} - \sum_{k=1}^n \frac{N_k}{N} H_k$

For "Weather":

$IG_{\text{Weather}} = 0.940 - \left( \frac{5}{14} \cdot 0.971 + \frac{4}{14} \cdot 0 + \frac{5}{14} \cdot 0.971 \right)$ $IG_{\text{Weather}} = 0.940 - 0.694 \approx 0.246$

Handling Numerical and Categorical Features

Categorical Features: Split by unique values or grouped categories.
Numerical Features: Split based on threshold values (e.g., $x > 50$ ).
Decision trees can handle mixed data types, making them versatile.

Example: Splitting a dataset on "Age > 30" or "Job Type" for classification.

1 / 36

Introduction to Machine Learning

Supervised Learning - Decision Trees & Evaluation

Introduction

Introduction to Decision Trees

Understanding Decision Tree Structure

Splitting Criteria and Tree Construction

Key Metrics for Splitting

Building a Decision Tree

Comparing Entropy and Gini Impurity

Visualizing Tree Construction

Practical Comparison: Entropy vs. Gini Impurity

Example

Example: Weather Dataset

Example: Weather Dataset

Step 1: Root Node Entropy

Step 2: Entropy for Subsets

Subset: Sunny

Subset: Overcast

Subset: Rainy

Step 3: Information Gain

Step 4: Choosing the Best Split

Conclusion

Model Optimization and Tuning

Handling Numerical and Categorical Features

Hyperparameters in Decision Trees

Balancing Tree Depth and Minimum Samples

Grid Search and Cross-Validation

Balancing Complexity and Performance

Overfitting and Pruning

Overfitting in Decision Trees

Tree Pruning Strategies

Visualizing Overfitting and Underfitting

Validating Pruning Results

Conclusion

Key Takeaways

When to Use Decision Trees