Dimensionality reduction is another important unsupervised learning problem with many applications.
We will start by defining the problem and providing some examples.
We have a dataset without labels. Our goal is to learn something interesting about the structure of the data:
Consider a dataset $\mathcal{D} = \{x^{(i)} \mid i = 1,2,...,n\}$ of motorcylces, characterized by a set of attributes.
mph
and $x^{(i)}_k$ is the speed in km/h
.We would like to automatically identify the right data dimensionality.
Another example can be obtained on the Iris flower dataset.
# import standard machine learning libraries
import numpy as np
import pandas as pd
from sklearn import datasets
# Load the Iris dataset
iris = datasets.load_iris()
Consider the petal length and the petal width of the flowers: they are closely correlated.
This suggests that we may reduce the dimensionality of the problem to one dimension: petal size.
from matplotlib import pyplot as plt
plt.rcParams['figure.figsize'] = [12, 4]
# Visualize the Iris flower dataset
setosa_flowers = (iris.target == 0)
plt.scatter(iris.data[setosa_flowers,0], iris.data[setosa_flowers,1], alpha=0.5)
plt.plot([4.3, 5.8], [2.8, 4.2], '->')
plt.ylabel("Sepal width (cm)")
plt.xlabel("Sepal length (cm)")
plt.legend(['"Size" Dimension'])
<matplotlib.legend.Legend at 0x12bdea4e0>
More generally, a dimensionality reduction algorithm learns from data an unsupervised model $$f_\theta : \mathcal{X} \to \mathcal{Z},$$ where $\mathcal{Z}$ is a low-dimensional representation of the data.
For each input $x^{(i)}$, $f_\theta$ computes a low-dimensional representation $z^{(i)}$.
Suppose $\mathcal{X} = \mathbb{R}^d$ and $\mathcal{Z} = \mathbb{R}^p$ for some $p < d$. The transformation $$f_\theta : \mathcal{X} \to \mathcal{Z}$$ is a linear function with parameters $\theta = W \in \mathbb{R}^{d \times p}$ that is defined by $$ z = f_\theta(x) = W^\top \cdot x. $$ The latent dimension $z$ is obtained from $x$ via a matrix $W$.
Dimensionality reduction can reveal interesting structure in digits without using labels.
Even linear dimensionality reduction is powerful. Here, in uncovers the geography of European countries from only DNA data
We will focus on linear dimensionality reduction this lecture, but there exist many other methods:
See the scikit-learn
guide for more!
We will now describe principal component analysis (PCA), one of the most widely used algorithms for dimensionality reduction.
At a high level, an unsupervised machine learning problem has the following structure:
$$ \underbrace{\text{Dataset}}_\text{Attributes} + \underbrace{\text{Learning Algorithm}}_\text{Model Class + Objective + Optimizer } \to \text{Unsupervised Model} $$The dataset $\mathcal{D} = \{x^{(i)} \mid i = 1,2,...,n\}$ does not include any labels.
Suppose $\mathcal{X} = \mathbb{R}^d$ and $\mathcal{Z} = \mathbb{R}^p$ for some $p < d$. The transformation $$f_\theta : \mathcal{X} \to \mathcal{Z}$$ is a linear function with parameters $\theta = W \in \mathbb{R}^{d \times p}$ that is defined by $$ z = f_\theta(x) = W^\top x. $$ The latent dimension $z$ is obtained from $x$ via a matrix $W$.
Principal component analysis (PCA) assumes that
In this example, the data lives in a lower-dimensional 2D plane within a 3D space (image credit).