 # Lecture 12: Tree-Based Algorithms¶

### Applied Machine Learning¶

Volodymyr Kuleshov
Cornell Tech

# Part 1: Decision Trees¶

We are now going to see a different way of defining machine models called decision trees.

# Review: Components of A Supervised Machine Learning Problem¶

At a high level, a supervised machine learning problem has the following structure:

$$\underbrace{\text{Training Dataset}}_\text{Attributes + Features} + \underbrace{\text{Learning Algorithm}}_\text{Model Class + Objective + Optimizer } \to \text{Predictive Model}$$

# The UCI Diabetes Dataset¶

To explain what is a decision tree, we are going to use the UCI diabetes dataset that we have been working with earlier.

We can also look at the data directly.

# Decision Trees: Intuition¶

Decision tress are machine learning models that mimic how a human would approach this problem.

1. We start by picking a feature (e.g., age)
2. Then we branch on the feature based on its value (e.g, age > 65?)
3. We select and branch on one or more features (e.g., is it a man?)
4. Then we return an output that depends on all the features we've seen (e.g., a man over 65)

# Decision Trees: Example¶

Let's first see an example on the diabetes dataset.

We will train a decision tree using it's implementation in sklearn.

# Decision Rules¶

Let's now define a decision tree a bit more formally. The first important concept is that of a rule.

• A decision rule $r : \mathcal{X} \to \{\text{true}, \text{false}\}$ is a partition of the feature space into two disjoint regions, e.g.: $$r(x) = \begin{cases}\text{true} & \text{if } x_\text{bmi} \leq 0.009 \\ \text{false} & \text{if } x_\text{bmi} > 0.009 \end{cases}$$
• Normally, a rule applies to only one feature or attribute $x_j$ of $x$.
• If $x_j$ is continuous, the rule normally separates inputs $x_j$ into disjoint intervals $-\infty, c], (c, \infty)$.

# Decision Regions¶

The next important concept is that of a decision region.

• A decision region $R\subseteq \mathcal{X}$ is a subset of the feature space defined by the application of a set of rules $r_1, r_2, \ldots, r_m$ and their values $v_1, v_2, \ldots, v_m \in \{\text{true}, \text{false}\}$, i.e.: $$R = \{x \in \mathcal{X} \mid r_1(x) = v_1 \text{ and } \ldots \text{ and } r_m(x) = v_m \}$$
• For example, a decision region in the diabetes problem is: $$R = \{x \in \mathcal{X} \mid x_\text{bmi} \leq 0.009 \text{ and } x_\text{bp} > 0.004 \}$$

# Decision Trees: Definition¶

A decision tree is a model $f : \mathcal{X} \to \mathcal{Y}$ of the form $$f(x) = \sum_{R \in \mathcal{R}} y_R \mathbb{I}\{x \in R\}.$$

• The $\mathbb{I}\{\cdot\}$ is an indicator function (one if $\{\cdot\}$ is true, else zero) and values $y_R \in \mathcal{Y}$ are the outputs for that region.
• The set $\mathcal{R}$ is a collection of decision regions. They are obtained by recursive binary splitting.
• The rules defining the regions $\mathcal{R}$ can be organized into a tree, with one rule per internal node and regions being the leaves.

We can also illustrate decision trees via this figure from Hastie et al.