Machine-learning 5 minutes

Linear regressions in simple terms

A linear regression is a model used to predict the value of a (continuous) variable.

We are given a dataset made of records . Each record contains data about an object:

Where can be a single variable or a vector of several variables.

The aim is to predict the output value based on the input vector using a linear function :

Where the function is defined by:

Before we can make predictions, we need to learn the parameter vector . This is called fitting the model.

Model fitting

We will fit the model using a subset of the dataset:

We are looking for the value of that minimizes the error between the output values and the predictions .

The total error made on a dataset using the parameter is measured by the loss function:

Using this (yet undefined) loss function, we can compute the best parameters:

These parameters depend on the trainset used. Different trainsets might yield (slightly) different parameters.

Making predictions

Using the best parameters , we can make predictions on the whole dataset and compute an estimate for the output value :


To assess our model’s performance on new data, we can compute the loss on another subset which is disjoint from our trainset: