Underfitting vs. Overfitting

We can determine if the performance of a model is poor by looking at prediction errors on the training set and the evaluation set. The model might be too simple to describe the evaluation data (i.e., underfitting the training data) or too complex (i.e., overfitting the training set). In this post, we will take a look at why underfitting or overfitting happens and how to deal with them.

These figures were from Plot of Underfitting vs. Overfitting in scikit-learn. It shows how a linear regression with polynomial features fits the samples that a target function (cosine function in this case) generated. The blue line indicates the trained polynomial regression model with degree 1, 4 or 15. The green line is the cosine function that generated the samples. In this case, we might want to train our model to approximate the target function as much as possible like the middle figure.

Underfitting

The left figure shows that our model is underfitting the training data because the line does not predict the samples well. It performs poorly on the training data, and definitely, it will show the poor performances on the evaluation and test sets, too.

The underfitting model has high bias and low variance. This means that the predicted values are far from the target values (i.e., high errors), but the predicted values will not be affected much by differences between input datasets. So, how to overcome the underfitting problem?

Adding features: One of causes of underfitting is that the model is too simple. Making it more complex by adding features would be helpful.
Adding polynomial features: Often, it is not easy to add new features. A simple way to do this is to add powers of each feature as new features. We can do this by increasing the degree of the polynomial.
Decreasing the amount of regularization: If a model is regularized too much, it would be underfitting. In this case, let’s try to decrease the regularization hyperparameter $latex \lambda$. You can find more details about the regularization in this post.

Overfitting

The right figure shows that our model is overfitting the training data because the line predicts the training samples pretty correctly. However, it will show the poor performances on the evaluation and test sets.

The overfitting model has low bias and high variance. This model’s accuracy on the training data is high (i.e., low errors), but the predicted values change much by differences between input datasets. The followings are generally suggested to solve this problem:

Getting more training examples: Generally, we can reduce variance by adding more training data.
Feature selection: Let’s try to reduce the model complexity by decreasing the number of features through feature engineering.
Increasing the amount of regularization: Try to increase the regularization hyperparameter $latex \lambda$ in your regularization term. You can find more details about the regularization in this post.

Posts

Gradient Descent

Batch vs. Online Learning

Knowledge Graph Completion

What are Embeddings?

Batch Normalization

Gradient Boosting

The Normal Equation

Handling Categorical Values

How to Deal with Missing Values

Underfitting vs. Overfitting