Statistics 5 minutes

# Difference between statistics, machine learning and probability

Statistics, probability theory and machine learning are often confused. This article tackles the difference between statistics and probability and their difference with machine learning. We will see that machine learning can be used as a drop in replacement for {statistics + probabilistic prediction} at the expense of interpretability.

### Statistics vs Probability

Statistical theory and probability theory are often mixed because statistics use the language of probability theory to quantify uncertainty. But the underlying purpose of the two discipline is different.

#### Statistics: data ⟹ model

Statistical theory aims at obtaining knowledge about a population based on a sample of this population. It uses the probability language to quantify the certainty of the knowledge gained.

• Process of interest conceptualised as a probability model
• Data viewed as observed outcomes from model
• Use outcomes to learn about the model

#### Probability: model ⟹ outcomes

Probability theory conceptualise a process of interest (“flipping a coin”) as a probability model (“Bernoulli distribution”) and then studies this model to draw conclusions about the possible outcomes (“both outcomes are equally likely”).

• Process of interest conceptualised as a probability model
• Use model to learn about probability of potential outcomes.

### Statistics vs Machine Learning

On the one hand, machine learning’s aim is to make predictions based on data. On the other hand, statistics can be combined with probability theory to make predictions too. While the outcome is the same, the process is fundamentally different.

#### Statistics + Probability: data ⟹ model ⟹ outcomes

Statistics can infer a probability model from a sample, and probability theory can be used to make predictions about other samples.

Compared to machine learning practitioners, statisticians must understand how the data was collected; statistical properties of the estimator (p-value, unbiased estimators); the underlying distribution of the population they are studying, and the kinds of properties you would expect if you did the experiment many times. They need to know precisely what they are doing and come up with parameters that will provide the predictive power. Statistical modeling techniques are usually applied to low dimensional data sets.

• Tool: probability model
• Sample used to learn parameters of the model via statistical inference
• Predictions made via probability theory based on the model

#### Machine Learning: data ⟹ algorithm ⟹ outcomes

The objective of machine learning is to make predictions. A machine learning algorithm is trained on a sample (called training set). The algorithm will not be able to give insights about the data like descriptive statistics do, but it will be able to make predictions about another sample data (the data set).

Compared to the statistical approach, machine learning requires no prior assumptions about the underlying relationships between the variables (i.e. no probability model). Throw in all the data you have; the algorithm processes the data and discovers patterns, using which you can make predictions on the new data set.

• Tool: machine learning algorithm
• Sample used to learn parameters of the algorithm
• Predicitions made by running the algorithm

The difference can be summarized with the diagram below: Check out my non-technical introduction to statistics!