{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "i_f5u2x9nn6I", "slideshow": { "slide_type": "slide" } }, "source": [ "# **Lecture 12: Support Vector Machines**\n", "In this lecture, we are going to cover support vector machines (SVMs), one the most successful classification algorithms in machine learning." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# 12.1. Classification Margins\n", "\n", "We start the presentation of SVMs by defining the classification *margin*." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## 12.1.1. Review and Motivation \n", "\n", "### 12.1.1.1. Review of Binary Classification\n", "\n", "Consider a training dataset $\\mathcal{D} = \\{(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), \\ldots, (x^{(n)}, y^{(n)})\\}$.\n", "Recall that we distinguish between two types of supervised learning problems depending on the targets $y^{(i)}$. \n", "\n", "1. __Regression__: The target variable $y \\in \\mathcal{Y}$ is continuous: $\\mathcal{Y} \\subseteq \\mathbb{R}$.\n", "\n", "2. __Binary Classification__: The target variable $y$ is discrete and takes on one of $K=2$ possible values.\n", "\n", "In this lecture, we focus on binary classification and assume $\\mathcal{Y} = \\{-1, +1\\}$.\n", "\n", "#### Linear Model Family\n", "\n", "In this lecture, we will work with linear models of the form:\n", "\n", "$$\n", "\\begin{align*}\n", "f_\\theta(x) & = \\theta_0 + \\theta_1 \\cdot x_1 + \\theta_2 \\cdot x_2 + ... + \\theta_d \\cdot x_d\n", "\\end{align*}\n", "$$\n", "\n", "where $x \\in \\mathbb{R}^d$ is a vector of features and $y \\in \\{-1, 1\\}$ is the target. The $\\theta_j$ are the *parameters* of the model.\n", "We can represent the model in a vectorized form as\n", "\n", "$$\n", "\\begin{align*}\n", "f_\\theta(x) = \\theta^\\top x + \\theta_0.\n", "\\end{align*}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### 12.1.1.2. Binary Classification Problem and The Iris Dataset\n", "\n", "In this lecture, we will again use the Iris flower dataset. We will transform this problem into a binary classification task by merging the two non-Setosa flowers into one class.\n", "We use $\\mathcal{Y} =\\{-1,1\\}$ as the label space.\n", "\n", "The resulting dataset is partly shown below." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | sepal length (cm) | \n", "sepal width (cm) | \n", "petal length (cm) | \n", "petal width (cm) | \n", "target | \n", "
---|---|---|---|---|---|
0 | \n", "5.1 | \n", "3.5 | \n", "1.4 | \n", "0.2 | \n", "-1 | \n", "
4 | \n", "5.0 | \n", "3.6 | \n", "1.4 | \n", "0.2 | \n", "-1 | \n", "
8 | \n", "4.4 | \n", "2.9 | \n", "1.4 | \n", "0.2 | \n", "-1 | \n", "
12 | \n", "4.8 | \n", "3.0 | \n", "1.4 | \n", "0.1 | \n", "-1 | \n", "
16 | \n", "5.4 | \n", "3.9 | \n", "1.3 | \n", "0.4 | \n", "-1 | \n", "