Primer on stochastic convergence
Types of convergence
Convergence in distribution
Let a sequence of distribution functions and let a distribution function with same domain. Let the set of continuity points of . We say that converge in distribution to :
When for all continuity point :
Relation to functional analysis: convergence in distribution is pointwise convergence of the distribution functions on the set of continuity points.
By abuse of notation, we extend this definition to sequence of random variables/vectors :
Convergence in probability
Let a sequence of random vector. We say that it converges in probability to the random vector :
When for all we have:
Since a random variable is a random vector of dimension , for random variable the condition is writen:
Difference between p and d convergence
- -convergence relates distribution functions. It says the probabilistic behavior of a sequence becomes more and more alike to that of the limit .
-convergence relates random variables. It says the actual realisations of can be progressively approximated with high probability by those of .
- -convergence implies -convergence.
- -convergence does not imply -convergence.
Example: let . We have:
There is a partial converse when the limit is a constant :
Bonus: Cramer-Wold Device
As a side note, there is a link between univariate and multivariate -convergence:
Let be a sequence of random vectors of and a random vector. For any constant vector , the random variable is univariate. We have:
Fundamental convergence theorems
Law of large numbers
Let be a sequence of indepent random vectors with and for all . Then:
Interpretation: since it is -convergence, this means that as the sample size increases, there is higher and higher probability that the value of the sample average: is a good approximation to the mean .
But what is the uncertainty associated with this approximation? Under slightly stronger assumptions on the sequence, the following theorem is the answer.
Central limit theorem
Let be an i.i.d. sequence of random vectors with mean and covariance matrix . Then:
When the dimension is , the covariance matrix reduces to the variance and the theorem reads:
Let be an i.i.d. sequence with mean and variance . Then:
Interpretation: as the sample size increases, the distribution of the sample average is a normal distribution with mean and standard deviation :
Notice that the standard deviation shrinks at the speed of .
Weighted sum central limit theorem
A more general version of the CLT is often useful when combined with the tools presented in the next section.
Let be an i.i.d. sequence of real random variables with common mean and variance . Let be a sequence of real constants.
Use the following notations: , and .
If, in the limit, any single component contributes a negligible proportion of the total variance, i.e:
Setting for all yields the previous univariate central limit theorem.
New approximations from old ones
These theorems are used to approximate complicated distributions by simpler ones. Here are some transformation results that let us obtain new approximations from the old ones.
Continuous mapping theorem
Let be a continuous function. Then:
Let be a continuous function and , two sequences of random variables and a constant. Then:
The continuous mapping theorem would be applicable if the joint-distribution of -converged to that of . But Slutsky’s theorem is a stronger result because we only assume marginal convergence.
The delta method
Let where , and . Let be continuously differentiable at point . Then: