Linear regressions in simple terms
A linear regression is a model used to predict the value of a (continuous) variable.
We are given a dataset made of records . Each record contains data about an object:
Where can be a single variable or a vector of several variables.
The aim is to predict the output value based on the input vector using a linear function :
Where the function is defined by:
Before we can make predictions, we need to learn the parameter vector . This is called fitting the model.
We will fit the model using a subset of the dataset:
We are looking for the value of that minimizes the error between the output values and the predictions .
The total error made on a dataset using the parameter is measured by the loss function:
Using this (yet undefined) loss function, we can compute the best parameters:
These parameters depend on the trainset used. Different trainsets might yield (slightly) different parameters.
Using the best parameters , we can make predictions on the whole dataset and compute an estimate for the output value :
To assess our model’s performance on new data, we can compute the loss on another subset which is disjoint from our trainset: