Fact-checked by Grok 2 weeks ago

Law of large numbers

The law of large numbers (LLN) is a foundational in stating that, under appropriate conditions, the average of the results obtained from a large number of independent trials of a random experiment converges to the of the describing the experiment. This principle underpins much of by justifying the reliability of sample means as estimators for population parameters when sample sizes grow sufficiently large. The LLN exists in two primary forms: the weak law of large numbers (WLLN) and the strong law of large numbers (SLLN). The WLLN asserts that the sample mean converges in probability to the expected value μ, meaning that for any ε > 0, the probability that the absolute difference between the sample mean and μ exceeds ε approaches zero as the sample size n tends to infinity; formally, P(|\bar{X}_n - \mu| \geq \varepsilon) \to 0 as n \to \infty. A classic version, due to Chebyshev, applies to uncorrelated random variables with finite variance, where the variance of the sample mean diminishes to zero. In contrast, the SLLN provides a stronger guarantee of convergence almost surely (or with probability 1), so \bar{X}_n \to \mu almost surely as n \to \infty. Kolmogorov's 1933 theorem establishes the SLLN for independent random variables with finite expectations, holding when they are identically distributed or when \sum_{n=1}^\infty \frac{\mathrm{Var}(X_n)}{n^2} < \infty. Historically, the LLN traces its origins to Jacob Bernoulli's seminal 1713 work Ars Conjectandi, where he proved a version of the WLLN for Bernoulli trials, demonstrating that the relative frequency of successes approximates the probability p with high probability for sufficiently large n. Bernoulli's result, often called his "golden theorem," marked the first rigorous limit theorem in probability and addressed both convergence and the "inverse problem" of determining sample sizes for desired precision. Subsequent developments by figures such as Abraham de Moivre (1733), who introduced normal approximations to binomial distributions, Pierre-Simon Laplace (1814), who refined Bayesian applications, and Siméon Denis Poisson (1837), who coined the term "law of large numbers," generalized the theorem to broader settings. In the 19th century, Pafnuty Chebyshev and Irénée-Joseph Bienaymé developed key inequalities supporting proofs of the WLLN, while 20th-century contributions by Andrey Markov, Alexander Khinchin, and Andrei Kolmogorov extended the laws to dependent variables and established the modern SLLN framework. Beyond mathematics, the LLN has profound applications in fields like statistics, economics, physics, and insurance, explaining phenomena such as why repeated measurements yield stable averages and enabling risk assessment in large-scale systems. For instance, in quality control, it assures that defect rates in mass production align closely with true probabilities as production volumes increase. Extensions continue to evolve, incorporating heavy-tailed distributions and non-i.i.d. sequences, ensuring the LLN's enduring relevance in contemporary probability research.

Overview and Intuition

Intuitive Explanation

The law of large numbers describes the phenomenon where, as the number of independent trials of a random experiment increases, the average outcome of those trials tends to approach the experiment's expected value, providing a sense of predictability in repeated processes. This principle is often misconstrued as the "law of averages," which wrongly suggests that short-term deviations, such as a streak of unfavorable outcomes, will quickly balance out; in reality, it applies only to long-run behavior, where fluctuations become relatively insignificant over many repetitions. In practical settings, this leads to greater stability in observed proportions as sample sizes grow. For instance, in election polling, surveying a small group might yield erratic results due to chance, but polling thousands of voters produces a proportion of support for a candidate that closely mirrors the true population preference, with variability diminishing as the sample expands. Similarly, in quality control at a manufacturing facility, inspecting a few items from a production batch may show inconsistent defect rates, but examining hundreds or thousands reveals a stable proportion that reliably indicates overall product quality. These examples highlight how larger samples reduce the impact of random variation, making estimates more trustworthy. The law relies on the trials being independent—meaning the outcome of one does not influence another—and identically distributed, ensuring each has the same underlying probability structure, which allows the collective average to settle near the expected value. Qualitatively, this convergence can be visualized in a graph where the distribution of sample averages starts broad and variable for small numbers of trials but narrows and centers tightly around the true expected value as the number of trials increases, illustrating the increasing reliability of the average. There exist weak and strong forms of the law, differing in how rigorously they guarantee this convergence, as explored in more technical treatments.

Simple Examples

One of the most straightforward illustrations of the law of large numbers is the repeated tossing of a fair coin, where the probability of landing heads is p = 0.5. In this scenario, the expected proportion of heads is 0.5, and as the number of tosses n grows large, the observed proportion of heads in the sample converges to this value, demonstrating how individual random outcomes average out. To see this in practice, consider simulated runs of coin tosses; as the sample size increases, the proportion of heads stabilizes around 0.5, though it may vary in small samples due to chance. Another basic example involves rolling a fair six-sided die, where each face from 1 to 6 is equally likely, yielding an expected value of \mu = 3.5. The average of the outcomes from multiple rolls will tend toward 3.5 as the number of rolls grows, with early rolls showing more fluctuation but later averages settling closer to the mean. For instance, small samples may deviate noticeably, but larger samples yield averages much closer to 3.5. The law also applies to polling, where the goal is to estimate a population proportion, such as the fraction of voters supporting a candidate. By sampling randomly, the sample proportion \hat{p} approaches the true proportion p as the sample size increases; larger polls reduce the impact of sampling variability, providing more reliable estimates. These examples collectively demonstrate how the law of large numbers enables the averaging out of randomness: individual trials may deviate from the expected value due to chance, but with sufficiently many independent repetitions, the sample average reliably approximates the true expectation, providing a foundation for reliable predictions in probabilistic settings.

Historical Development

Early Contributions

The origins of the law of large numbers trace back to early explorations of probability in the context of games of chance during the Renaissance. In the 16th century, Italian mathematician Gerolamo Cardano, in his unpublished manuscript Liber de ludo aleae (written around 1525 and circulated posthumously in 1663), observed that repeated plays of dice or card games tend to produce outcomes that average out to expected values, anticipating the stabilizing effect of large numbers of trials without a formal proof. This intuitive insight arose from practical gambling analysis, where Cardano calculated odds and expectations for various games, laying groundwork for probabilistic reasoning. Building on such ideas, Dutch mathematician Christiaan Huygens advanced the field in 1657 with , the first systematic treatise on probability. Huygens focused on expected values in fair games, solving the "problem of points" for dividing stakes in interrupted plays and establishing equivalence of chances through mathematical ratios, which influenced subsequent developments in chance theory without directly addressing convergence in large samples. His work provided a combinatorial foundation that Jacob Bernoulli later expanded. The first rigorous formulation emerged in Jacob Bernoulli's posthumously published Ars Conjectandi in 1713, where he presented his "golden theorem" as a cornerstone of conjectural arithmetic. Bernoulli proved that in a sequence of independent Bernoulli trials with success probability p, the sample proportion converges in probability to p: for any \varepsilon > 0 and confidence level $1 - 1/(c+1) (where c > 0), there exists a finite n_0(\varepsilon, c) such that for n \geq n_0, the probability of the proportion deviating from p by more than \varepsilon is less than $1/(c+1). This bound, derived via an akin to Markov's (pre-dating Markov), demonstrated that arbitrary precision could be achieved with sufficiently many trials, justifying the use of empirical frequencies to estimate unknown probabilities. In the , French mathematicians and Siméon-Denis Poisson generalized and refined these ideas. Laplace, in his Théorie Analytique des Probabilités (first edition 1812), extended Bernoulli's result to inverse problems using Bayesian methods, showing that for p successes in p+q trials, the probability that the true parameter \theta lies within a specified interval around the observed proportion exceeds $1 - \delta for large samples, with error estimates based on normal approximations. Poisson further broadened the theorem in Recherches sur la probabilité des jugements (1837), coining the term "law of large numbers" and proving a version for independent but non-identically distributed trials with bounded expectations, where the average converges in probability to the , including explicit bounds on deviation probabilities for judicial and civil applications. These expansions shifted focus from cases to broader , emphasizing practical error control in large datasets.

Formal Statements

The formal statements of the law of large numbers evolved significantly in the 19th and early 20th centuries, building on early probabilistic ideas from in the late 17th century, which had focused on settings, to more general formulations applicable to broader classes of random variables. A pivotal advancement came with Irénée-Joseph Bienaymé's 1853 proof of the weak for independent random variables, using his inequality to show convergence in probability to the . This was followed in 1867 by Pafnuty Chebyshev's generalization, which extended the law beyond Bernoulli trials to independent and identically distributed (i.i.d.) random variables possessing finite variance. In his paper "Des valeurs moyennes," Chebyshev employed what is now known as the Bienaymé-Chebyshev inequality to establish that the sample average converges in probability to the , providing the first rigorous weak under these conditions. This work marked a shift toward using moment-based inequalities to bound deviations, laying groundwork for subsequent developments. Andrey Markov contributed further in the early , with his textbook Ischislenie Veroiatnostei (Calculus of Probabilities) presenting early statements of the weak and rigorizing related limit theorems, including aspects of Chebyshev's . Markov's 1906 paper, "Rasširenije zakona bol'šich čisel na veličiny, zavisyjaščie drug ot druga" (Extension of the of large numbers to quantities depending one upon another), provided the first weak versions for dependent variables, showing that pairwise dependence under certain conditions suffices for in probability, thus challenging the of full . These efforts highlighted the robustness of the to weakened assumptions on variable interactions. Aleksandr Lyapunov's 1901 work served as a historical bridge to more advanced limit theorems, where his rigorous proof of the central limit theorem using the method of moments implicitly supported generalizations of the law of large numbers by demonstrating asymptotic normality under finite third-moment conditions for i.i.d. variables. This connected the law's convergence properties to distributional approximations. The culmination of these formalizations occurred with Andrey Kolmogorov's contributions, including his 1933 monograph Grundbegriffe der Wahrscheinlichkeitsrechnung (Foundations of the Theory of Probability), which established the first general strong law of large numbers for i.i.d. random variables with finite mean, proving almost sure convergence of the sample average to the expectation without requiring finite variance. Kolmogorov's earlier 1930 paper had provided sufficient conditions under finite variance assumptions. The 1933 work integrated this result into an axiomatic framework, solidifying the strong law as a cornerstone of modern probability theory.

Formal Statements

Weak Law of Large Numbers

The weak law of large numbers (WLLN) asserts that the sample average of a of independent and identically distributed random variables converges in probability to their common , provided the expectation is finite. Let X_1, X_2, \dots be a of independent and identically distributed random variables, each with finite \mu = \mathbb{E}[X_i]. Define the partial sum S_n = \sum_{i=1}^n X_i and the sample average \bar{X}_n = S_n / n. The weak law states that \bar{X}_n converges in probability to \mu as n \to \infty. Convergence in probability means that for every \varepsilon > 0, \lim_{n \to \infty} P\left( \left| \bar{X}_n - \mu \right| > \varepsilon \right) = 0. This implies that the probability of the sample average deviating from the true by more than any fixed positive amount \varepsilon approaches zero as the number of observations grows. The key assumptions are that the random variables are and identically distributed, with each having a finite \mathbb{E}[X_i] = \mu < \infty. No finite variance is required for this general form of the theorem. The term "weak" refers to the fact that the convergence is in probability rather than almost surely, which is a stronger mode of convergence addressed in related results. This result originated with Pafnuty Chebyshev's work in the 19th century, where he established an early version under the additional assumption of finite variance.

Strong Law of Large Numbers

The strong law of large numbers provides a more rigorous guarantee than its weak counterpart by asserting convergence with probability 1, rather than merely in probability. Formally, if \{X_i\}_{i=1}^\infty is a sequence of independent and identically distributed random variables with finite expected value \mu = \mathbb{E}[X_1], then the sample mean \bar{X}_n = S_n / n, where S_n = \sum_{i=1}^n X_i, converges almost surely to \mu as n \to \infty. This is expressed mathematically as P\left( \left\{ \omega \in \Omega : \lim_{n \to \infty} \frac{S_n(\omega)}{n} = \mu \right\} \right) = 1, meaning that the set of outcomes where the limit fails has probability zero. The key assumption for this result is that the random variables are i.i.d. and possess a finite first absolute moment, i.e., \mathbb{E}[|X_1|] < \infty, which ensures the existence of \mu and prevents divergence due to heavy tails. This version of the theorem, often called , was established by , who showed that these conditions are necessary and sufficient for almost sure convergence in the i.i.d. case. The "strong" aspect refers to pathwise convergence: for almost every sample path \omega, the sequence S_n(\omega)/n converges to \mu pointwise in n, providing certainty that the average stabilizes at the true mean over sufficiently long sequences, barring a negligible set of exceptions. This almost sure convergence implies the , where convergence holds in probability. N. Etemadi showed that the SLLN still holds if the random variables \{X_i\}_{i=1}^\infty are only required to be pairwise independent, not necessarily mutually independent.

Key Differences and Variants

The weak law of large numbers (WLLN) establishes in probability of the sample mean to the , meaning that for any ε > 0, the probability of the absolute deviation exceeding ε approaches zero as the sample size n increases to . This form is typically easier to prove, often relying on moment conditions and inequalities like Chebyshev's, and it applies more broadly under weaker or dependence assumptions. In contrast, the strong law of large numbers (SLLN) requires almost sure , where the sample mean to the with probability 1, providing a pathwise guarantee that holds for almost every realization of the sequence. The SLLN implies the WLLN, as almost sure entails in probability, but the reverse does not hold; counterexamples exist where the WLLN is satisfied yet the SLLN fails, particularly with dependent random variables that exhibit occasional large deviations infinitely often, disrupting pathwise stability while still controlling probabilistic deviations. The uniform law of large numbers extends the classical LLN to families of distributions or functions, asserting that the supremum over a parameter space (e.g., a of measurable functions) of the difference between the empirical and the true converges to zero, either in probability or . This variant is essential in statistical learning and empirical process theory, ensuring uniform consistency across a of models, such as in non-parametric , under conditions like bounds on the function . Borel's law of large numbers provides a specialized strong law for independent random variables bounded in [0,1], guaranteeing almost sure convergence of the sample mean to the without additional finite variance assumptions beyond the inherent finite mean from boundedness. This early formulation, proved in the context of trials, laid groundwork for broader SLLN results and applies directly to binary or indicator processes. Other variants adapt the LLN to structured dependence. For ergodic Markov chains, the SLLN holds for time averages of functions of the state, converging to the stationary distribution's expectation, facilitating analysis in stochastic processes like queueing systems. Similarly, for martingale sequences with bounded increments or square-integrable differences, strong laws ensure that normalized partial sums converge to zero or the , with applications in sequential analysis and .

Proof Techniques

Proofs for the Weak Law

The weak law of large numbers (WLLN) states that for a sequence of independent and identically distributed (i.i.d.) random variables X_1, X_2, \dots with finite mean \mu = \mathbb{E}[X_i], the sample average \bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i satisfies \bar{X}_n \xrightarrow{P} \mu as n \to \infty, where convergence in probability means that for every \varepsilon > 0, \mathbb{P}(|\bar{X}_n - \mu| > \varepsilon) \to 0. One standard proof of the WLLN relies on and assumes that the X_i have finite variance \sigma^2 = \mathbb{E}[(X_i - \mu)^2] < \infty. This condition ensures the variance of the sample average vanishes as n grows. By linearity of expectation, \mathbb{E}[\bar{X}_n] = \mu. The variance is then \text{Var}(\bar{X}_n) = \text{Var}\left( \frac{1}{n} \sum_{i=1}^n X_i \right) = \frac{1}{n^2} \sum_{i=1}^n \text{Var}(X_i) = \frac{\sigma^2}{n}, since the X_i are independent and identically distributed. Applying , which states that for any random variable Y with finite mean m and variance v, \mathbb{P}(|Y - m| > \varepsilon) \leq v / \varepsilon^2 for \varepsilon > 0, yields \mathbb{P}(|\bar{X}_n - \mu| > \varepsilon) \leq \frac{\text{Var}(\bar{X}_n)}{\varepsilon^2} = \frac{\sigma^2}{n \varepsilon^2}. As n \to \infty, the right-hand side tends to 0, so \mathbb{P}(|\bar{X}_n - \mu| > \varepsilon) \to 0, establishing convergence in probability. This proof, originally due to Chebyshev in the context of his inequality, demonstrates the WLLN under the finite variance . A more general proof of the WLLN, which requires only finite mean \mu without assuming finite variance, uses characteristic functions and Lévy's continuity theorem. The characteristic function of a random variable X is \phi_X(t) = \mathbb{E}[e^{itX}] for t \in \mathbb{R}. For the i.i.d. sequence, the characteristic function of \bar{X}_n is \phi_{\bar{X}_n}(t) = \mathbb{E}\left[ e^{it \bar{X}_n} \right] = \left( \phi_{X_1}\left( \frac{t}{n} \right) \right)^n, since the sum is a and characteristic functions multiply under . To show \bar{X}_n \xrightarrow{P} \mu, it suffices to prove that \phi_{\bar{X}_n}(t) \to e^{it\mu} pointwise for all t, as of characteristic functions to that of a constant \mu implies convergence in distribution (and hence in probability) to the at \mu, by Lévy's continuity theorem. Consider the Taylor expansion of the around 0: since \mathbb{E}[|X_1|] < \infty, \phi_{X_1}(u) = 1 + i\mu u + o(u) as u \to 0. Thus, \phi_{X_1}\left( \frac{t}{n} \right) = 1 + i\mu \frac{t}{n} + o\left( \frac{1}{n} \right). Taking the logarithm, \log \phi_{X_1}\left( \frac{t}{n} \right) = i\mu \frac{t}{n} + o\left( \frac{1}{n} \right), and exponentiating the n-th power gives \phi_{\bar{X}_n}(t) = \exp\left( n \log \phi_{X_1}\left( \frac{t}{n} \right) \right) = \exp\left( n \left( i\mu \frac{t}{n} + o\left( \frac{1}{n} \right) \right) \right) = \exp\left( i\mu t + o(1) \right) \to e^{i\mu t} as n \to \infty. The o(1/n) term requires justification via dominated convergence: since \mathbb{E}[|X_1|] < \infty implies \mathbb{E}[|e^{itX_1/n} - 1 - itX_1/n|] \to 0 and is bounded, the remainder vanishes. This Fourier-analytic approach, developed by Lévy, extends the WLLN to the finite-mean case.

Proof for the Strong Law

The strong law of large numbers, stating that the sample average S_n / n converges almost surely to the expected value \mu = \mathbb{E}[X_1] for independent and identically distributed (i.i.d.) random variables X_1, X_2, \dots with finite first moment \mathbb{E}[|X_1|] < \infty, was proved by Kolmogorov in 1933 using measure-theoretic tools and the Borel-Cantelli lemma. This proof assumes only the finite expectation condition, which is necessary and sufficient for i.i.d. sequences. (Kolmogorov's earlier 1930 result established the SLLN under the additional finite variance assumption.) Kolmogorov's approach begins by centering the variables so that \mu = 0 (without loss of generality, by considering X_i - \mu) and employs truncation to handle the finite moment assumption. Define the truncated variables Y_n = X_n \mathbf{1}_{\{|X_n| \leq n\}}, where \mathbf{1} is the , and let S_n^{(Y)} = \sum_{k=1}^n Y_k. The goal is to show S_n^{(Y)} / n \to 0 almost surely and that the truncation error (S_n - S_n^{(Y)}) / n \to 0 almost surely. To establish the truncation error vanishes almost surely, note that |S_n - S_n^{(Y)}| / n \leq (1/n) \sum_{k=1}^n |X_k| \mathbf{1}_{\{|X_k| > k\}}. The events A_k = \{ |X_k| > k \} satisfy \sum_{k=1}^\infty P(A_k) = \sum_{k=1}^\infty P(|X_1| > k) < \infty, since the finite expectation implies \mathbb{E}[|X_1|] = \int_0^\infty P(|X_1| > t) \, dt < \infty, and the sum is a Riemann approximation to this integral. By the Borel-Cantelli lemma, which states that if \sum P(A_k) < \infty for independent events A_k, then P(\limsup A_k) = P(A_k \text{ i.o.}) = 0, only finitely many A_k occur almost surely. Thus, for sufficiently large n, the sum involves only finitely many non-zero terms, each fixed, so divided by n tends to 0 almost surely. For the truncated sum, \mathbb{E}[Y_n] \to 0 by the dominated convergence theorem (or monotone convergence for the positive part), so \mathbb{E}[S_n^{(Y)}] / n \to 0. To show (S_n^{(Y)} - \mathbb{E}[S_n^{(Y)}]) / n \to 0 almost surely, one standard approach (following modern expositions) considers the series \sum (Y_k - \mathbb{E}[Y_k])/k. Since the Y_k - \mathbb{E}[Y_k] are independent and centered, Kolmogorov's three-series theorem can be applied after truncation to establish almost sure convergence of the series, from which Kronecker's lemma implies S_n^{(Y)} / n \to 0 a.s. Specifically, for the three series: (i) \sum P(|Y_k - \mathbb{E} Y_k| > k) \leq \sum P(|X| > k) < \infty; (ii) \sum \mathbb{E}[ (Y_k - \mathbb{E} Y_k)/k \cdot 1_{| \cdot | \leq k} ] converges because the terms are o(1/k); (iii) \sum \text{Var}( (Y_k - \mathbb{E} Y_k)/k \cdot 1_{| \cdot | \leq k} ) < \infty by bounding \text{Var} \leq \mathbb{E}[|X|/k], and \sum 1/k^2 < \infty after adjustment for the i.i.d. structure. An alternative proof, due to Etemadi in 1981, extends the strong law to pairwise independent random variables (not necessarily fully independent) under the same finite mean assumption, using a similar truncation but relying on along subsequences rather than maximal inequalities. This approach simplifies some steps by avoiding full independence for the Borel-Cantelli application while preserving the almost sure convergence.

Limitations and Conditions

Required Assumptions

The weak law of large numbers (WLLN) requires that the random variables X_1, X_2, \dots be independent and identically distributed (i.i.d.) with a finite expected value \mathbb{E}[X_i] = \mu < \infty. A specific version, often proved using , additionally assumes finite variance \mathrm{Var}(X_i) = \sigma^2 < \infty to bound the probability of deviation from the mean. The strong law of large numbers (SLLN), as formalized by Kolmogorov, holds under the weaker condition of i.i.d. random variables with finite absolute expectation \mathbb{E}[|X_i|] < \infty, without requiring finite variance. Mutual independence among the X_i is crucial, as it ensures that the joint behavior of the partial sums can be controlled through tools like the , preventing persistent dependencies that could undermine convergence; weaker conditions, such as pairwise independence or mixing, suffice in some variants but demand additional moment controls. The identical distribution assumption can be relaxed in extensions to stationary ergodic sequences, where the ergodic theorem guarantees convergence of the sample mean to the mean under finite expectation, provided the process mixes sufficiently to average out initial conditions over time. When the mean is infinite, such as in distributions with heavy tails where \mathbb{E}[|X|] = \infty, the sample average fails to converge and may diverge to infinity in probability, as rare but extreme realizations dominate the sum.

Cases of Failure

The law of large numbers fails to hold in several scenarios where its underlying assumptions—such as the existence of a finite mean, independence of random variables, or identical distributions—are violated. These counterexamples illustrate the boundaries of the theorem, showing how deviations from the required conditions can prevent the sample mean from converging to the expected value, either in probability or almost surely. A prominent case of failure occurs when the random variables have an infinite mean, violating the finite expectation assumption. For independent and identically distributed (i.i.d.) random variables drawn from a standard , which has probability density function f(x) = \frac{1}{\pi (1 + x^2)} and no defined mean (since \mathbb{E}[|X|] = \infty), the sample mean \bar{X}_n = S_n / n does not converge in probability to any constant. Remarkably, \bar{X}_n retains the standard for every n, as its characteristic function is \mathbb{E}[e^{i t \bar{X}_n}] = e^{-|t|}. Consequently, for any \epsilon > 0, P(|\bar{X}_n| > \epsilon) does not approach 0; specifically, P(|\bar{X}_n| > 1) = 0.5 for all n. This exact non-convergence highlights the necessity of the finite mean condition, as the heavy tails of the cause persistent large deviations. Dependence among the random variables can also cause the law to fail, even if individual expectations are finite. Consider a sequence where X_1 is a random variable with finite mean \mu, and X_{n+1} = X_n + 1 for n \geq 1, making the variables strongly dependent. This implies X_n = X_1 + (n-1), so the partial sum is S_n = n X_1 + n(n-1)/2 and the sample mean is \bar{X}_n = X_1 + (n-1)/2. As n \to \infty, \bar{X}_n \to +\infty almost surely, diverging despite the finite (though increasing) expectations \mathbb{E}[X_n] = \mu + n - 1. This constructed dependence structure demonstrates how perfect positive correlation can override the averaging effect required for convergence. When the random variables are independent but not identically distributed, particularly with variances increasing too rapidly, the weak law fails. A classic counterexample involves X_k = k Z_k for k = 1, \dots, n, where the Z_k are i.i.d. with mean 0 and variance 1 (e.g., standard normal). Each X_k has mean 0, but the variance of the sample mean is \mathrm{Var}(\bar{X}_n) = \frac{1}{n^2} \sum_{k=1}^n k^2 \mathrm{Var}(Z_k) \approx \frac{n}{3}, which diverges to infinity. Since the variance diverges to infinity, \bar{X}_n cannot converge in probability to 0. This shows that uniform boundedness or controlled growth of second moments is essential even under independence. For the strong law, failure can arise via the Borel-Cantelli lemmas when the probabilities of large deviations do not sum to a finite value. Consider independent random variables with X_1 = 0 and, for n \geq 2, X_n = \frac{n}{\epsilon} with probability p_n = \frac{1}{n \log n} and X_n = 0 otherwise, for fixed \epsilon > 0. Each has mean \mathbb{E}[X_n] = \frac{1}{\epsilon \log n} \to 0, but P(|X_n| > \epsilon n) = p_n = \frac{1}{n \log n}, and \sum_{n=2}^\infty p_n = \infty. By the second Borel-Cantelli lemma (since the events are independent), |X_n| > \epsilon n occurs infinitely often . Thus, \limsup_{n \to \infty} |\bar{X}_n| \geq \epsilon , preventing to 0. This underscores the role of tail summability in strong law proofs.

Implications and Extensions

The law of large numbers (LLN) describes the convergence of the sample mean to the population mean, while the (CLT) provides a more refined asymptotic description by specifying that the standardized sample mean, \bar{X}_n - \mu scaled by \sqrt{n}/\sigma, converges in to a standard N(0,1), or equivalently, \sqrt{n}(\bar{X}_n - \mu) converges to N(0, \sigma^2). This means the LLN captures the deterministic trend of convergence at rate $1/n, whereas the CLT quantifies the random fluctuations around the mean on the scale of $1/\sqrt{n}, enabling approximations for the of sums even for non-normal parent distributions under mild moment conditions. The law of the iterated logarithm (LIL) extends the strong LLN by characterizing the precise order of the almost sure fluctuations of the partial sums S_n = \sum_{i=1}^n X_i beyond the mean n\mu. Specifically, for i.i.d. random variables with mean \mu and finite variance \sigma^2 > 0, \limsup_{n \to \infty} \frac{S_n - n\mu}{\sqrt{2 n \sigma^2 \log \log n}} = 1 \quad \text{almost surely}, with a corresponding liminf of -1, indicating that the deviations oscillate and reach up to the boundary \sqrt{2 n \sigma^2 \log \log n} infinitely often but do not exceed it. This result sharpens the LLN by showing that the convergence is not faster than this logarithmic scale, providing the optimal boundary for the growth of deviations. The ergodic theorem generalizes the strong LLN from i.i.d. sequences to dependent processes that are and . For a measure-preserving T on a (\Omega, \mathcal{F}, P) and an integrable function f: \Omega \to \mathbb{R}, the theorem states that the time average \frac{1}{n} \sum_{k=0}^{n-1} f(T^k \omega) \to \int f \, dP = \mathbb{E} as n \to \infty, equating the temporal average along orbits to the spatial (ensemble) average. Thus, under , long-run averages match expectations, extending the LLN to dynamical systems and sequences where independence does not hold. The Glivenko-Cantelli theorem provides a uniform strong law of large numbers for empirical distribution functions. For i.i.d. random variables X_1, \dots, X_n with common cumulative distribution function F, the empirical CDF F_n(x) = n^{-1} \sum_{i=1}^n \mathbf{1}_{\{X_i \leq x\}} satisfies \sup_{x \in \mathbb{R}} |F_n(x) - F(x)| \to 0 \quad \text{almost surely} as n \to \infty, regardless of the continuity of F. This uniform convergence implies pointwise LLN at every quantile and enables consistent estimation of the entire distribution, forming the basis for nonparametric inference.

Modern Developments

In the latter half of the , advancements in law of large numbers focused on refining almost sure convergence rates, building on earlier results like Kolmogorov's. The Menshov-Rademacher theorem, originally from the 1920s, saw extensions to provide sharper bounds on the for series of orthogonal random variables. A key improvement came through inequalities that quantified the logarithmic growth in the order of convergence, ensuring almost sure convergence under conditions involving sums of coefficients weighted by log-squared terms. Further refinements in the introduced new Menshov-Rademacher-type inequalities that enhanced the strong law's applicability to broader classes of random variables, improving error rates in probabilistic approximations. Extensions of the law of large numbers to dependent emerged prominently post-1950, addressing limitations in the independent and identically distributed (i.i.d.) assumptions. Mixing conditions, such as alpha-mixing, became central for establishing weak and strong laws under dependence, where the dependence between variables decays over time. For alpha-mixing sequences of non-identically distributed random variables, the weak law holds with only finite first moments, covering processes like autoregressive moving averages and near-epoch dependent sequences without requiring rapid mixing rate decay. These results filled gaps in non-i.i.d. cases, enabling laws for dependent variables in econometric models and analysis. Functional laws of large numbers extended the classical framework to processes, providing over function spaces. The functional strong law applies to partial sum processes of i.i.d. random variables with finite first moments, where the scaled process converges to zero uniformly on compact intervals, even without finite variance. Precursors to in the laid groundwork for these, influencing theory and limit theorems for random functions in ergodic settings. Computational applications gained traction from the 1980s onward, with methods leveraging the law of large numbers for error estimation in simulations. In , the sample mean of independent replicates converges to the true expectation, allowing error assessment via the standard deviation scaled by the square root of the number of trials, as justified by the alongside the law. Resampling techniques like bootstrap and jackknife further estimate this Monte Carlo error, determining the required number of simulations for desired precision in statistical analyses. Recent developments up to 2025 have explored analogs in quantum probability and justifications in . In quantum settings, a law of large numbers for random linear operators in Banach spaces establishes for compositions of independent semigroups, extending classical results to p-norms where 1 ≤ p < ∞. For , the law underpins empirical risk minimization by ensuring the empirical risk over i.i.d. training data converges to the true expected risk, validating model selection and generalization bounds in statistical learning theory.

Applications

In Statistics and Estimation

In statistics, the law of large numbers (LLN) establishes the consistency of the sample mean as an unbiased estimator of the population mean . For independent and identically distributed random variables X_1, \dots, X_n with finite expectation E[X_1] = \mu, the sample mean \bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i converges in probability to \mu as n \to \infty. This is formalized by the weak LLN, which states that for any , P(|\bar{X}_n - \mu| < \epsilon) \to 1 as n \to \infty, provided the variables have finite mean. Consistency ensures that, with large samples, the estimator \bar{X}_n becomes arbitrarily close to the true parameter with high probability, forming a cornerstone of point estimation in statistical inference. The LLN underpins the law of large samples, which justifies the use of asymptotic normality for hypothesis testing and interval estimation in large samples. By ensuring the consistency of sample moments, the LLN complements the central limit theorem (CLT) to approximate the distribution of standardized estimators as normal, enabling valid t-tests and other procedures even when exact distributions are unknown. For instance, in testing H_0: \mu = \mu_0, the t-statistic \sqrt{n} (\bar{X}_n - \mu_0)/s_n (where s_n is the sample standard deviation) relies on this large-sample normality for its critical values and p-values. Bootstrap methods further leverage the LLN for resampling-based , approximating the of estimators without assumptions. Introduced by Efron, the bootstrap resamples the empirical of the with replacement, treating it as a for the true ; the LLN guarantees that this empirical converges to the underlying one as sample size grows, validating the method's for and . A practical illustration appears in confidence intervals for the , where small samples yield wide intervals due to high variability, but the LLN ensures tightening as n increases. For example, consider estimating the lifetime of light bulbs from an with \mu \approx 2000 hours; with n=25 and sample \bar{X}_n = 2132 hours, a 95% interval is approximately [1348, 2916] hours, reflecting substantial uncertainty. Doubling to n=100 narrows it to [1740, 2524] hours, as the scales with $1/\sqrt{n}, demonstrating how larger samples concentrate the estimate around \mu.

In Other Fields

In , the law of large numbers underpins risk pooling by enabling insurers to aggregate a large number of risks, thereby stabilizing the claim experience and allowing premiums to be set close to the of losses. This principle ensures that as the pool of policyholders grows, the variability in actual payouts diminishes, making financial outcomes more predictable and reducing the impact of outliers on . For instance, health insurers rely on this to balance high-cost claims from a few individuals against the majority who incur minimal expenses, justifying competitive premium rates that cover anticipated aggregate claims without excessive reserves. In physics, particularly , the law of large numbers facilitates the , where macroscopic properties emerge reliably from the collective behavior of vast numbers of particles. This convergence supports the , which posits that time averages of observables in a equal ensemble averages over , justifying the use of statistical ensembles to predict states like or . , tied to the strong law of large numbers, ensures that long-term measurements on a single align with probabilistic expectations, underpinning derivations of thermodynamic laws from microscopic dynamics. In , the law of large numbers drives the convergence of , where the average loss on a training dataset approximates the true expected as sample size increases. This justifies optimizing model parameters via on the empirical loss, as the procedure minimizes a proxy that reliably reflects population-level performance under suitable assumptions like i.i.d. data. For example, in tasks, large datasets ensure that updates lead to models whose generalization error is bounded, with guarantees from variants of the law. In economics, the law of large numbers applies to large markets by promoting the law of one price through aggregated agent behaviors, where deviations in individual valuations average out to enforce arbitrage-free equilibrium pricing. In models of random matching or competitive markets with many participants, it ensures that cross-sectional averages of trades or allocations converge to expected values, stabilizing market outcomes like supply-demand balances. This framework explains why, in sufficiently large economies, heterogeneous preferences lead to uniform pricing across identical assets, mitigating inefficiencies from small-sample fluctuations. In , specifically , the law of large numbers stabilizes frequencies in large populations under random and no selection, , or , as deviations due to sampling effects diminish with population size. This underpins the Hardy-Weinberg equilibrium, where genotypic proportions remain constant across generations because finite-sample drift is negligible, allowing allele ratios to reflect long-run probabilistic expectations. For neutral loci, it predicts that frequencies of genetic variants will hover near their initial values in expansive populations, facilitating inferences about evolutionary neutrality from observed stability. A classic illustration of the law of large numbers in is , where dropping many needles of length l onto a plane with parallel lines spaced d apart (l \leq d) yields a proportion of crossings that converges to \frac{2l}{\pi d}, enabling estimation of \pi through repeated trials. As the number of drops increases, the empirical ratio approaches this theoretical probability , demonstrating how the law transforms a into a reliable computational tool for constants.

References

  1. [1]
    Law of Large Numbers | Strong and weak, with proofs and exercises
    The adjective Strong is used to make a distinction from Weak Laws of Large Numbers, where the sample mean is required to converge in probability. Kolmogorov's ...
  2. [2]
    Kolmogorov's strong law of large numbers - PlanetMath
    Mar 22, 2013 · 1. The random variables are identically distributed; · 2. For each n n , the variance of Xn X n is finite, and. ∞∑n=1Var[Xn]n2<∞. ∑ n = 1 ∞ Var ⁡ ...
  3. [3]
    [PDF] A Tricentenary history of the Law of Large Numbers - arXiv
    The Law of Large Numbers, starting with Jacob Bernoulli's Theorem in 1713, states that as sample size increases, uncertainty decreases, and relative ...
  4. [4]
    [PDF] Jakob Bernoulli On the Law of Large Numbers Translated into ...
    His Ars Conjectandi (1713) (AC) was published posthumously with a Foreword by his nephew, Niklaus Bernoulli (English translation: David (1962, pp. 133 – 135); ...
  5. [5]
    [PDF] 8 The Laws of Large Numbers - Stat@Duke
    Oct 25, 2017 · 8.1 Proofs of the Weak and Strong Laws. Here are two simple versions (one Weak, one Strong) of the Law of Large Numbers; first we prove an ...
  6. [6]
    7.1.1 Law of Large Numbers - Probability Course
    It states that if you repeat an experiment independently a large number of times and average the result, what you obtain should be close to the expected value.Missing: sources | Show results with:sources
  7. [7]
    Law of Large Numbers: the Theory, Applications and Technology ...
    The statement of the weak law of large numbers implies that the average of a random sample converges in probability towards the expected value as the sample ...
  8. [8]
    [PDF] Law of Large Numbers
    The Law of Large Numbers predicts that the outcomes for this random variable will, for large n, be near 1/2. In Figure 8.1, we have plotted the distribution for ...
  9. [9]
    [PDF] the law of large numbers & the CLT - Washington
    example: polling. Poll of 100 randomly chosen voters finds that K of them favor proposition 666. ... Strong Law of Large Numbers. Most surprisingly, averages all ...
  10. [10]
    [PDF] STAT1010 – law of large numbers 1
    Pretty far from the true probability of flipping a head on a fair coin (0.5). 5. Tossing a coin many MANY times. ▫ It turns out… □ If ...
  11. [11]
    "Law of Large Numbers: Comparing Relative versus Absolute ...
    If you were to flip a coin 10,000 times, you would expect the number of heads to be approximately equal to the number of tails when using a fair coin. The ...
  12. [12]
    "Law of Large Numbers - Dice Rolling Example" by Paul Savory
    This Mathematica demonstration showcases the law of large numbers, a key theorem in probability theory, that describes the result of performing the same ...
  13. [13]
    [PDF] the law of large numbers & the CLT - Washington
    example: polling. Poll of 100 randomly chosen voters finds that K of them favor proposition 666. So: the estimated proportion in favor is K/100 = q. Suppose ...
  14. [14]
    [PDF] 8 Laws of large numbers - Arizona Math
    There is also a stronger theorem that has a stronger form of convergence (strong law of large numbers). We will eventually prove the theorem, but first we ...
  15. [15]
    Cardano, Gambling and the dawn of Probability Theory - GameLudere
    Mar 30, 2020 · In this article we will describe some gambling problems studied by Cardano and other scholars of the period, that introduce the basic concepts of classical ...
  16. [16]
    [PDF] The Early Development of Mathematical Probability - Glenn Shafer
    Bernoulli advanced this theorem (later called the law of large numbers by Poisson) as a justification for using observed frequencies as probabilities, to be ...
  17. [17]
    A Tricentenary history of the Law of Large Numbers
    ### Early Contributions to the Law of Large Numbers
  18. [18]
    Siméon-Denis Poisson - Utah State University
    Poisson published the Law of Large Numbers in 1835, which said that the proportion of successes in independent trials will follow a pattern in the long run ...<|control11|><|separator|>
  19. [19]
    [PDF] Markov and the creation of Markov chains
    Jun 2, 2025 · The Weak Law of Large Numbers (WLLN) and the Central Limit Theorem were the focal probabilistic issues of the times. The paper [27] in which a ...<|control11|><|separator|>
  20. [20]
    275A, Notes 3: The weak and strong law of large numbers
    Oct 23, 2015 · The law of large numbers (or LLN for short), which comes in two formulations, weak (WLLN) and strong (SLLN).
  21. [21]
    [PDF] 18.600: Lecture 30 .1in Weak law of large numbers
    Weak law of large numbers: Markov/Chebyshev approach. Weak law of large ... Statement of weak law of large numbers. ▷ Suppose Xi are i.i.d. random ...Missing: formal | Show results with:formal
  22. [22]
    [PDF] Laws of Large Numbers - UC Davis Math
    We use the Borel-Cantelli lemma applied to the events. An = {ω ∈ Ω : |Sn| ≥ nε}. To estimate P(An) we use the generalized Chebyshev inequality (2) with p ...Missing: 1867 | Show results with:1867
  23. [23]
    [PDF] 3 | Laws of Large Numbers: Weak and Strong - Maxim Raginsky
    The Weak Law of Large Numbers states sample averages converge to the mean in probability. The Strong Law states they converge almost surely.
  24. [24]
    Uniform laws of large numbers and stochastic Lipschitz-continuity
    Uniform laws of large numbers are important tools in econometrics and statistics. They provide the foundation for the consistency and asymptotic normality ...
  25. [25]
    [PDF] Notes 23 : Markov chains: asymptotic behavior
    Lecture 23: Markov chains: asymptotic behavior. 9. 2 Law of large numbers for MCs. Our second asymptotic result is a law of large numbers for countable MCs.
  26. [26]
    On a Strong Law of Large Numbers for Martingales - Project Euclid
    Project Euclid, Open Access April, 1967, On a Strong Law of Large Numbers for Martingales, YS Chow, DOWNLOAD PDF + SAVE TO MY LIBRARY.
  27. [27]
    A. N. Kolmogorov's Work in Probability Theory - SIAM.org
    Andrei Nikolaevitch Kolmogorov, Sur la loi forte des grands nombres, C. R. Acad. Sci., 191 (1930), 910–912. Google Scholar. 10. Andrei Nikolaevitch Kolmogorov ...
  28. [28]
    [PDF] MTH 664 Lectures 16, 17, & 18 - Oregon State University
    Strong Law of Large Numbers. The proof of the Strong Law of Large Numbers (SLLN) utilizes the following two probabilistic results, important on their own.
  29. [29]
    An elementary proof of the strong law of large numbers
    For the weak law of large numbers concerning pairwise independent random variables, which follows from our result, see Theorem 5.2.2 in Chung [1]. Article PDF ...
  30. [30]
    LLN and CLT - A First Course in Quantitative Economics with Python
    Hence the LLN does not hold. The LLN fails to hold here because the assumption E | X | < ∞ is violated by the Cauchy distribution.
  31. [31]
    Proof of the Ergodic Theorem - PNAS
    Feb 17, 2015 · Proof of the Ergodic Theorem. By George D. BirkhoffAuthors Info ... Download this article as a PDF file. PDF. eReader. View this article ...Missing: paper | Show results with:paper
  32. [32]
    [PDF] Some applications of the Menshov–Rademacher theorem - arXiv
    Mar 17, 2021 · This paper extends the classical Menshov–Rademacher theorem on the convergence of orthogonal series to general series of dependent random ...
  33. [33]
    A new inequality of Menshov-Rademacher type and the strong law ...
    Feb 10, 1994 · A new inequality of Menshov-Rademacher type and the strong law of large numbers · Article PDF · References · Author information · Additional ...
  34. [34]
    Laws of Large Numbers for Dependent Non-Identically Distributed ...
    Oct 18, 2010 · Processes covered by the laws of large numbers include martingale difference, ø(·), ρ(·), and α(·) mixing, autoregressive moving average, ...
  35. [35]
    Basic Properties of Strong Mixing Conditions. A Survey and Some ...
    This is an update of, and a supplement to, the author's earlier survey paper [18] on basic properties of strong mixing conditions.
  36. [36]
    [PDF] Advanced Stochastic Processes. - DSpace@MIT
    These theorems have direct analogue in the theory of stochastic processes as Functional Strong Law of Large Numbers (FSLLN) and Functional Central Limit.
  37. [37]
    [PDF] 1 Introduction 2 Law of Large Numbers for Random Functions
    It is well known that the Law of Large Numbers applies to stochastic processes that are stationary and ergodic.
  38. [38]
    On the Assessment of Monte Carlo Error in Simulation-Based ...
    Here we present a series of simple and practical methods for estimating Monte Carlo error as well as determining the number of replications required.2. Monte Carlo Error · 2.3 Results · 4.2 Resampling-Based Methods
  39. [39]
    [2410.07417] Quantum law of large numbers for Banach spaces
    Oct 9, 2024 · The law of large numbers is known in the case p=2 in the form of usual law of large numbers.
  40. [40]
    [PDF] Advanced Introduction to Machine Learning CMU-10715
    Empirical Risk Minimization. 20. Law of Large Numbers: Empirical risk is ... The true risk of what the learning algorithm produces. This is what the ...
  41. [41]
    [PDF] Lecture 3 Properties of MLE: consistency, asymptotic normality ...
    • Law of Large Numbers (LLN):. If the distribution of the i.i.d. sample X1,...,Xn is such that X1 has finite expectation,. i.e. EX1 <. , then the sample average.
  42. [42]
    [PDF] A Primer on Asymptotics - University of Washington
    Jan 7, 2013 · The main statistical tool for establishing consistency of estimators is the Law of Large. Numbers (LLN). The main tool for establishing ...
  43. [43]
    The Bootstrap | American Scientist
    The great results of probability theory—the laws of large numbers, the ergodic theorem, the central limit theorem and so on—describe limits in which all ...
  44. [44]
    [PDF] 8 Laws of large numbers - Arizona Math
    (Weak law of large numbers) Let Xj be an i.i.d. sequence with finite mean and variance. Let µ = E[Xj]. Then. Xn = 1 n n. X j=1. Xj → µ in probability. There ...
  45. [45]
    Risk Pooling: How Health Insurance in the Individual Market Works
    In general, the larger the risk pool, the more predictable and stable the premiums can be. Is the size of a risk pool the only factor? No. Although larger risk ...
  46. [46]
    Risk Distribution: A History and the Law of Large Numbers Fallacy
    May 2, 2024 · According to the Tax Court's captive decisions, the law of large numbers is an absolute requirement for all insurance companies. Risk ...
  47. [47]
    Insurance Glossary - Law of large numbers
    Setting Premiums: The law of large numbers enables insurers to set premiums that are adequate to cover expected losses while remaining competitive. With a ...
  48. [48]
    [PDF] On the foundations of statistical mechanics: ergodicity, many ... - arXiv
    Oct 1, 2014 · The effort to turn the physical notion of measurement into an appropriate math- ematical one has led to the issue of ergodicity, which seems to ...
  49. [49]
    [PDF] Ergodic Theory and Its Significance for Statistical Mechanics and ...
    Thus the concept of ergodicity is essential in order that the so-called "strong law of large numbers" should hold in probability theory and the content of the ...
  50. [50]
    The Ergodic Hierarchy - Stanford Encyclopedia of Philosophy
    Apr 13, 2011 · It is a hierarchy of properties that dynamical systems can possess. Its five levels are ergodicity, weak mixing, strong mixing, Kolmogorov, and Bernoulli.
  51. [51]
    Empirical Risk Minimization - an overview | ScienceDirect Topics
    Under the uniform law of large numbers, the right hand side tends to 0, which then leads to consistency of ERM with respect to the underlying function class F ...
  52. [52]
    Gradient descent inference in empirical risk minimization - arXiv
    Dec 12, 2024 · This paper provides a precise, non-asymptotic distributional characterization of gradient descent iterates in a broad class of empirical risk minimization ...
  53. [53]
    [PDF] Empirical Risk Minimization - Graph Neural Networks
    Sep 11, 2020 · that appears in (1) using the law of large numbers to write. Ep(x,y)[`(y, 舍(x))] ≈. 1. Q. Q. C q=1. `(yq, 舍(xq)). (6). The sum in the right ...
  54. [54]
    [PDF] Large Market Games, the Law of One Price, and Market Structure
    In doing so, he incurs charges on the full amounts of: (i) his bids, and; (ii) his receipts from sales, across both markets. The net gain obtained by the shift ...
  55. [55]
    [PDF] The Exact Law of Large Numbers for Independent Random ...
    Aug 31, 2004 · We provide an exact law of large numbers for independent random matching, under which there is an almost-sure constant cross-sectional ...<|separator|>
  56. [56]
    The Law of One Price in Equity Volatility Markets
    The law of one price states that assets with identical payoffs must have the same price. The author documents systematic law of one price deviations across ...
  57. [57]
    [PDF] Lecture Notes in Population Genetics - Holsinger Lab
    These notes cover genetic structure, natural selection, genetic drift, quantitative genetics, molecular evolution, and phylogeography.
  58. [58]
    Neutral and Stable Equilibria of Genetic Systems and the Hardy ...
    Just after stabilization of the allele frequency the population is subjected to a bottleneck reducing its size to its initial value; a new transient takes place ...
  59. [59]
    [PDF] Chapter 3, Population Genetics for Large Populations - Rutgers Math
    In general, genotype frequencies cannot be recovered from allele frequencies, because they depend not just on numbers of alleles, but on how the alleles are.
  60. [60]
    Buffon's Problems - Random Services
    By the law of large numbers, the proportion of crack crossings should be about the same as the probability of a crack crossing. More precisely, we will denote ...
  61. [61]
    A “Buffon needle” for e – E. Kowalski's blog - ETH Zürich
    Dec 26, 2008 · From the Strong Law of Large Numbers, it follows that this limit can be obtained (almost surely) by taking larger and larger values of n ...
  62. [62]
    An elementary proof of the strong law of large numbers
    Paper by N. Etemadi published in Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, Volume 55, pages 119–122, 1981, proving the strong law under pairwise independence.