Machine-learning 5 minutes

# What is ridge regression?

A ridge regression is an OLS regression that uses L2-regularization.

The regularized loss function on some dataset $\sets$ is thus:

where $\lambda$ is an hyper-parameter that controls the importance of the regularization.

For a complete discussion about the effect of L2-regularization on the parameters of the model, check out our dedicated article: L2-regularization.

Common mistake: it is important to normalize the features before using regularization. Failure to do so will yield incoherent regularization behavior.

## Analytical solution

Let $\trainset$ be the training-set and note $\mx$ and $\vy$ the corresponding design matrix and output vector.

We can compute the value of the parameter vector that minimizes the regularized loss using differentiation.

 $\grad\l(\trainset, \vw) = - \frac{1}{\card{\trainset}}\mx^{\top}(\vy - \mx\vw)$ $\grad\lambda\normtwo{\vw} = 2\lambda\vw$

Setting all directional derivatives to $0$, we get: