L1 and L2 as Cost Function
In machine learning, L1 and L2 techniques are widely used as cost function and regularization. It is worth to know key differences between L1 and L2 for a better model. The differences are also frequently asked in job interviews.
Cost Functions
- L1-norm cost function
- L1-norm cost function is the sum of the absolute differences between the estimated values and the corresponding target values.
- L2-norm cost function
- L2-norm cost function is the sum of the square of the differences between the estimated values and the corresponding target values.
- Comparisons between L1 and L2
- Robustness: L1 > L2
- Robustness is sensitivity to outliers. The more robust model is less sensitive to outliers. The L1-norm is more robust than the L2-norm. The reason is pretty obvious. The L2-norm squares the errors, and the errors for outliers cause that the cost increases extremely. The model will be adjusted more to minimize the errors for outliers than a model using the L1-norm.
- Stability: L1 < L2
- The L2-norm is more stable to a small adjustment of a single input data than the L1-norm. This means that the regression line trained with L1 changes more than L2 when a value of a single data is changed with a small amount.
- The number of solutions: L1 (multiple solutions) > L2 (one solution)
- L2 is Euclidian distance (the green line), which means that there is only one path between two points. It is the unique shortest path. L1 has many solutions (e.g., red, blue and yellow lines) as the shortest path between two points. Even in higher dimensions, there is only one solution to the minimum error with L2 while L1 has many solutions to the minimum loss.
- Robustness: L1 > L2
L1 or L2 cost function for your model?
So, how can we decide which one is good for your model? Generally, the L2-norm is preferred in neural networks because it is differentiable for the backpropagation while the L1-norm is not. For most of general machine learning algorithms, the L2-norm doesn’t perform well when there are outliers in the dataset. When outliers affect the cost significantly, the L1-norm would be preferred, or the L2-norm would be work after removing outliers.