Machine-learning 5 minutes

The bias-variance-noise decomposition

The MSE loss is attractive because the expected error in prediction can be explained by the bias-variance of the model and the variance of the noise. This is called the bias-variance-noise decomposition. In this article, we will introduce this decomposition using the tools of probability theory.

In short, when , the bias-variance-noise decomposition is:

Notations

Let be a pair of random variables on .

Assume there exists a -mean random noise and a function such that:

The goal of a regression is to use a sample to estimate this function:

For instance, in a linear regression the function is a linear function with parameter :

And the regression aims at estimating from the training-set:

Once the function is estimated, we can measure the error between a predictions and the true value :

The expected error in prediction is:

Define as a shorthand.

does not depend on and does not depend on , so:

Recall that :

Since is a -mean noise we have:

Hence:

Finally, the term is exactly the error in estimation between and . We can exprees it using the bias-variance decomposition:

Finally, the bias-variance-noise decomposition is: