Suppose you trained an image classifier with 80% accuracy. What's next?

- Add more data?
- Train the algorithm for longer?
- Use a bigger model?
- Add regularization?
- Add new features?

We will next learn how to prioritize these decisions when applying ML.

Overfitting is one of the most common failure modes of machine learning.

- A very expressive model (a high degree polynomial) fits the training dataset perfectly.
- The model also makes wildly incorrect prediction outside this dataset, and doesn't generalize.

Models that overfit are said to be **high variance**.

Underfitting is another common problem in machine learning.

- The model is too simple to fit the data well (e.g., approximating a high degree polynomial with linear regression).
- As a result, the model is not accurate on training data and is not accurate on new data.

Because the model cannot fit the data, we say it's **high bias**.

Learning curves show performance as a function of **training set size**.

Learning curves are defined for **fixed hyperparameters**. Observe that dev set error decreases as we give the model more data.

It is often very useful to have a target upper bound on performance (e.g., human accuracy); it can also be visualized on the learning curve.

In the example below, the dev error has plateaued and we know that adding more data will not be useful.

We can further augment this plot with training set performance.