Limit theorems describe how random variables behave as sample sizes grow. They formalize different types of convergence and provide the theoretical backbone for statistical inference, connecting probability theory to practical data analysis.
This topic covers three modes of convergence (in probability, almost sure, and in distribution), the major theorems associated with each, rates of convergence, and applications to statistical practice.
Convergence in probability
Convergence in probability describes a sequence of random variables getting arbitrarily close to some target value as grows, in the sense that the probability of a large deviation shrinks to zero. It's weaker than almost sure convergence but still powerful enough to justify many asymptotic results.
Weak law of large numbers
The weak law of large numbers (WLLN) says that the sample mean converges in probability to the population mean. If are i.i.d. with finite mean , then for any :
where .
This is the formal justification for using sample averages to estimate population means. Roll a fair die many times: the average of your rolls will be close to 3.5 with high probability for large .
Convergence of random variables
More generally, a sequence converges in probability to a random variable if, for every :
This is denoted . The key intuition: as increases, the chance that deviates from by more than any fixed amount vanishes. For example, the proportion of heads in fair coin flips converges in probability to 0.5, and the sample variance converges in probability to the population variance.
Continuous mapping theorem
If and is a function that is continuous at , then:
This theorem lets you "push" convergence through continuous functions. A practical example: if the sample variance converges in probability to , then the sample standard deviation (which is just , a continuous function) converges in probability to .
Almost sure convergence
Almost sure (a.s.) convergence is a stronger requirement than convergence in probability. A sequence converges almost surely to if the set of outcomes where has probability one. Almost sure convergence implies convergence in probability, but the reverse does not hold in general.
Strong law of large numbers
The strong law of large numbers (SLLN) upgrades the WLLN: the sample mean converges to the population mean almost surely, not just in probability. If are i.i.d. with finite mean :
The distinction matters. The WLLN says large deviations become unlikely; the SLLN says that on almost every realization of the sequence, the sample mean eventually stays close to forever. Think of flipping a coin indefinitely: with probability 1, the running proportion of heads converges to 0.5.
Kolmogorov's three-series theorem
This theorem gives necessary and sufficient conditions for the almost sure convergence of a series of independent random variables. The series converges a.s. if and only if, for some , all three of the following hold:
- converges
The three conditions control, respectively, the probability of large jumps, the drift of the truncated terms, and the variability of the truncated terms. Applications include determining convergence of random harmonic series and random power series.
Borel-Cantelli lemmas
These two lemmas connect the summability of event probabilities to whether events occur infinitely often (abbreviated "i.o.").
- First Borel-Cantelli lemma: If , then . The events occur only finitely many times, a.s.
- Second Borel-Cantelli lemma: If the events are independent and , then . The events occur infinitely often, a.s.
The independence condition in the second lemma is critical. Together, these lemmas are a workhorse for proving almost sure results, such as the almost sure recurrence of symmetric random walks on .
Convergence in distribution
Convergence in distribution (also called weak convergence) is the weakest of the three modes. A sequence converges in distribution to if the CDF of converges to the CDF of at every continuity point of the latter. Convergence in probability implies convergence in distribution, but not vice versa.

Central limit theorem
The CLT is arguably the most important result in probability. If are i.i.d. with mean and variance , then:
No matter the shape of the underlying distribution, the standardized sample mean is approximately normal for large . This is why the normal distribution appears so often in practice: the sum of many small, independent effects tends toward a Gaussian. It underpins confidence intervals, hypothesis tests, and much of statistical inference.
Characteristic functions
The characteristic function of a random variable is:
where is the imaginary unit. Characteristic functions always exist (unlike moment generating functions) and uniquely determine the distribution.
Their main role in limit theory: pointwise convergence of characteristic functions implies convergence in distribution. That is, if for all , then . This is often the cleanest way to prove CLT-type results. Classic applications include showing that a properly scaled Binomial converges to a Poisson, or proving the CLT itself.
Lindeberg-Feller theorem
This theorem generalizes the CLT to independent but not necessarily identically distributed random variables. Let be independent with means and variances , and let .
The Lindeberg condition requires that for every :
Intuitively, this says no single summand dominates the total variance. If the Lindeberg condition holds, then:
The Feller converse states that if the CLT conclusion holds and the individual variances are uniformly negligible (), then the Lindeberg condition must hold.
Delta method
The delta method translates the asymptotic normality of an estimator into the asymptotic normality of a smooth function of that estimator. If:
and is differentiable with , then:
This is extremely useful in practice. For instance, if you know the sample mean is asymptotically normal, the delta method immediately gives you the asymptotic distribution of or , which you need for constructing confidence intervals for transformed parameters.
Functional limit theorems
Functional limit theorems extend convergence in distribution from random variables to entire stochastic processes. Instead of a sequence of numbers converging to a number, you have a sequence of random functions converging to a random function, with convergence defined in an appropriate function space.
Donsker's theorem
Donsker's theorem is the functional analogue of the CLT. Let be i.i.d. with distribution function , and let be the empirical distribution function. Then:
in the space of càdlàg functions, where is a standard Brownian bridge. The entire rescaled empirical process converges as a process, not just at a single point.
This result is the foundation for distribution-free goodness-of-fit tests and confidence bands for CDFs.
Empirical process theory
Empirical process theory studies the asymptotic behavior of processes built from random samples, such as the empirical distribution function, empirical characteristic function, and empirical moment functions. The goal is to derive limiting distributions for functionals of these processes (suprema, integrals, etc.).
Key applications include:
- The Kolmogorov-Smirnov test, which uses and relies on the limiting distribution from Donsker's theorem
- The Cramér-von Mises test, which uses an integrated squared difference
Brownian motion approximation
Many stochastic processes can be approximated by Brownian motion or related Gaussian processes in the large-sample limit. The key idea is the invariance principle (also called the functional CLT): under suitable conditions, the rescaled partial sum process of a sequence of random variables converges in distribution to Brownian motion.
This extends beyond i.i.d. settings. Versions exist for weakly dependent sequences, martingales (the functional CLT for martingales), and even queueing processes (approximated by reflected Brownian motion). These approximations are central to the study of random walks, diffusion limits, and stochastic simulation.

Rates of convergence
Knowing that a sequence converges is useful, but knowing how fast it converges is often more important in practice. Rates of convergence tell you how good your normal approximation actually is for a given , and they guide the construction of refined approximations.
Berry-Esseen theorem
The Berry-Esseen theorem quantifies the rate of convergence in the CLT. If are i.i.d. with mean , variance , and finite third absolute moment , then:
where is the standard normal CDF and is a universal constant (the best known value is ).
The bound is , so the normal approximation improves at rate . For skewed distributions (large ), the approximation is worse at any given .
Edgeworth expansions
Edgeworth expansions refine the normal approximation by adding correction terms based on the cumulants of the distribution. The first-order expansion is:
where is the third cumulant (related to skewness) and is the standard normal density.
The correction term captures the leading-order effect of skewness. Higher-order expansions incorporate kurtosis and beyond. These expansions are used to improve coverage probability of confidence intervals and to justify bootstrap refinements, which achieve accuracy instead of .
Sanov's theorem
Sanov's theorem belongs to large deviations theory and describes the exponential rate at which the empirical distribution of i.i.d. random variables deviates from the true distribution. If are i.i.d. with distribution , and is the empirical measure, then for a set of probability measures:
where is the Kullback-Leibler divergence from to .
The probability of the empirical distribution landing in a "wrong" region decays exponentially in , with the rate governed by the KL divergence to the nearest distribution in that region. This has direct applications to error exponents in hypothesis testing and to information-theoretic problems.
Applications of limit theorems
The theorems above are not just theoretical results. They provide the asymptotic machinery that makes most of classical statistics work.
Statistical inference
Limit theorems give the asymptotic distributions of estimators. Maximum likelihood estimators, method of moments estimators, and least squares estimators are all asymptotically normal under regularity conditions, a fact established through the CLT and its extensions. This asymptotic normality is what lets you compute standard errors and build confidence intervals without knowing the exact finite-sample distribution.
Hypothesis testing
Major test procedures rely on limit theorems for their justification:
- The likelihood ratio test statistic is asymptotically under the null (Wilks' theorem)
- The Wald test uses the asymptotic normality of the MLE
- The score test uses the asymptotic distribution of the score function
In each case, the CLT or delta method provides the asymptotic null distribution, which determines critical values and p-values.
Confidence intervals
Asymptotic confidence intervals follow directly from the CLT. If is asymptotically normal with known asymptotic variance , then an approximate confidence interval is:
The delta method extends this to functions of parameters. For example, if you have a confidence interval for a mean , the delta method lets you construct one for or .
Asymptotic normality
Many statistical methods are built on the assumption that estimators or test statistics are approximately normal for large . The CLT, Lindeberg-Feller theorem, and delta method are the primary tools for establishing this. Once asymptotic normality is proven, you can use standard normal quantiles for inference, which greatly simplifies both computation and interpretation. This applies to MLEs under regularity conditions, to likelihood ratio statistics, and to a wide range of M-estimators and U-statistics.