What is regularization?
Regularization improves a model performance by decreasing variance at the cost of increasing a bit of bias but to a lesser extent.
The general shape of a regularized loss function is the following:
Where is the dataset on which we compute the loss and is the vector of parameters for our model.
The hyper-parameter allows us to control the importance of the regularization. When the regularizer is canceled and we get the unregularized solution. When , we get an intercept-only model.
TODO graph the curve as lambda variates.
A word of caution
It is important to normalize the features before using regularization. Failure to do so will yield incoherent regularization behavior.
The L2 norm
An ordinary least squares regression with regularization is named a ridge regression.
The L1 norm
An ordinary least squares regression with regularization is named a lasso regression.