Fact-checked by Grok 2 weeks ago

Central limit theorem

The Central Limit Theorem (CLT) is a cornerstone of probability theory stating that, under certain conditions, the distribution of the standardized sum (or average) of a large number of independent and identically distributed random variables, each with finite mean \mu and variance \sigma^2 > 0, converges to a standard normal distribution N(0, 1), irrespective of the underlying distribution of the individual variables.^[1] Mathematically, for i.i.d. random variables X_1, X_2, \dots, X_n with mean \mu and standard deviation \sigma, the standardized sample mean Z_n = \frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} satisfies \lim_{n \to \infty} P(Z_n \leq z) = \Phi(z), where \Phi(z) is the cumulative distribution function of the standard normal distribution.^[1] This approximation holds for sufficiently large n, depending on the skewness of the original distribution.^[2] The theorem's development traces back to the 18th century, with Abraham de Moivre first approximating the distribution of the sum of Bernoulli trials in 1733 using a normal curve for large n.^[3] Pierre-Simon Laplace extended this in 1810–1823 by generalizing it to sums of independent random variables through the method of generating functions, providing the first broad version applicable beyond the binomial case.^[4] Rigorous proofs emerged in the early 20th century, notably from Aleksandr Lyapunov in 1901, who established the theorem under the finite variance condition using characteristic functions, and later refinements by William Feller and others in the 1920s–1930s incorporated Lindeberg–Lévy conditions for non-identically distributed variables.^[3] These advancements shifted the CLT from classical approximations to a foundational element of modern measure-theoretic probability.^[4] In statistics, the CLT underpins parametric inference by justifying the normality of sampling distributions for means, proportions, and other estimators from large samples, allowing the use of z-tests, t-tests, and confidence intervals without assuming population normality.^[2] It explains why many natural phenomena exhibit approximate normality and facilitates simulations, bootstrapping alternatives, and error analysis in various fields.^[2] Extensions, such as the Berry–Esseen theorem, quantify the rate of convergence to normality, while generalizations handle dependent variables or infinite variance via stable distributions.^[4]

Core Statement and Assumptions

Classical Central Limit Theorem

The classical central limit theorem states that if X_1, X_2, \dots, X_n are independent and identically distributed random variables with finite mean \mu and positive finite variance \sigma^2 > 0, then the standardized sum

Z_n = \frac{\sum_{i=1}^n (X_i - \mu)}{\sigma \sqrt{n}}

converges in distribution to a standard normal random variable N(0,1) as n \to \infty.^[5]^[6] This result, rigorously established by Aleksandr Lyapunov in 1901 using characteristic functions, provides the foundational case of the theorem for identically distributed variables. Formally, the theorem asserts that

\lim_{n \to \infty} P(Z_n \leq x) = \Phi(x)

for every x \in \mathbb{R}, where \Phi(x) is the cumulative distribution function of the standard normal distribution.^[5] This convergence holds under the i.i.d. assumption and finite second moment, ensuring the normalized sum's distribution approaches the bell-shaped normal curve regardless of the specific form of the common distribution of the X_i.^[6] Intuitively, the theorem explains why sums of many independent random variables tend toward normality: each variable contributes small, uncorrelated deviations around the mean, and their collective effect—scaled by the square root of the sample size—smooths into a Gaussian shape due to the additive nature of variances.^[5] This holds for a wide range of distributions, such as uniform or Bernoulli, as long as the variance is finite and positive, highlighting the normal distribution's universality in aggregating independent fluctuations.^[6] A classic example is the sum of n fair coin flips, where each X_i is Bernoulli with parameter p = 0.5, mean $0.5, and variance $0.25; for large n, the number of heads S_n = \sum X_i is approximately normal with mean n/2 and variance n/4, as originally approximated by Abraham de Moivre in 1733 and later generalized by Pierre-Simon Laplace. Similarly, the sum of dice rolls, modeled as i.i.d. uniform on \{1, 2, \dots, 6\} with mean $3.5 and variance $35/12, yields a near-normal distribution for large n, illustrating the theorem's practical approximation power.^[5]

Multidimensional Central Limit Theorem

The multidimensional central limit theorem generalizes the classical central limit theorem to the case of random vectors in \mathbb{R}^d. Specifically, let \mathbf{X}_1, \dots, \mathbf{X}_n be independent and identically distributed random vectors each with finite mean vector \boldsymbol{\mu} \in \mathbb{R}^d and positive definite covariance matrix \Sigma \in \mathbb{R}^{d \times d}. Then, the normalized sum \mathbf{S}_n = n^{-1/2} \sum_{i=1}^n (\mathbf{X}_i - \boldsymbol{\mu}) converges in distribution to the multivariate normal distribution \mathbf{N}(\mathbf{0}, \Sigma) as n \to \infty. Equivalently, applying the whitening transformation \Sigma^{-1/2} (where \Sigma^{1/2} is the unique positive definite square root of \Sigma) yields \mathbf{Z}_n = n^{-1/2} \sum_{i=1}^n (\mathbf{X}_i - \boldsymbol{\mu}) \Sigma^{-1/2} \xrightarrow{d} \mathbf{N}(\mathbf{0}, I_d).^[7] The covariance matrix \Sigma plays a central role in characterizing the limiting distribution, as it fully specifies the variances along the principal axes and the correlations between components of the random vector. In the multivariate normal \mathbf{N}(\mathbf{0}, \Sigma), the contours of constant probability density are ellipsoids defined by the quadratic form \mathbf{x}^T \Sigma^{-1} \mathbf{x} = c for constants c > 0, with orientations given by the eigenvectors of \Sigma and semi-axes lengths scaled by the square roots of its eigenvalues. This elliptical structure reflects how linear dependencies in the original vectors propagate to the asymptotic joint behavior.^[8]^[7] When d=1, the theorem reduces to the classical central limit theorem for scalar random variables. For illustration in the bivariate case (d=2), consider i.i.d. random vectors (\mathbf{X}_i)_1^n where each component has uniform marginals on [0,1] but with induced positive correlation, such as \mathbf{X}_i = (U_i, \theta U_i + (1-\theta) V_i) for independent uniforms U_i, V_i \sim \text{Unif}[0,1] and correlation parameter $0 < \theta < 1. The normalized sum \mathbf{S}_n then approximates a bivariate normal distribution \mathbf{N}(\mathbf{0}, \Sigma), where \Sigma incorporates the off-diagonal covariance \theta/12 arising from the shared U_i term.^[7]^[9]

Sufficient Conditions for Convergence

Independence and Identical Distribution

The independence and identical distribution (i.i.d.) assumption posits that a sequence of random variables X_1, X_2, \dots, X_n are statistically independent, meaning the joint probability distribution factors into the product of marginals, and identically distributed, sharing the same marginal probability distribution.^[10] This dual condition guarantees zero covariance between distinct variables—since independence implies uncorrelatedness for variables with finite second moments—and uniform mean \mu and variance \sigma^2 > 0 across all variables, enabling consistent normalization of the sum S_n = \sum_{i=1}^n X_i as \frac{S_n - n\mu}{\sigma \sqrt{n}}.^[11] Under the i.i.d. assumption, the central limit theorem holds because the lack of dependence prevents interference in the accumulation of fluctuations, while identical distributions ensure the normalized sum's distribution converges to the standard normal regardless of the common underlying form, provided finite variance.^[12] The Berry–Esseen theorem further elucidates why i.i.d. suffices by bounding the rate of this convergence: the supremum difference between the cumulative distribution function of the normalized sum and that of the standard normal is of order O(1/\sqrt{n}), depending on the third absolute moment of the common distribution.^[13] A basic relaxation replaces full mutual independence with pairwise independence, where every pair of distinct variables is independent; under additional moment conditions ensuring controlled higher-order dependencies, this still yields convergence to normality in the central limit theorem.^[14] Historically, the i.i.d. framework forms the simplest case foundational to the central limit theorem, originating in the de Moivre–Laplace theorem of 1733–1812, which established local normality for sums of i.i.d. Bernoulli trials approximating the binomial distribution.^[15]

Lyapunov Central Limit Theorem

The Lyapunov central limit theorem, first established by Russian mathematician Aleksandr Lyapunov in 1901, extends the classical central limit theorem to sums of independent random variables that need not be identically distributed, by requiring a condition on the growth of higher-order moments relative to the variances. Let X_1, X_2, \dots, X_n be independent random variables with finite means E[X_i] = \mu_i and positive variances \Var(X_i) = \sigma_i^2. Define s_n^2 = \sum_{i=1}^n \sigma_i^2. For some \epsilon > 0, the Lyapunov coefficient is given by

\delta_n = \frac{1}{s_n^{2 + \epsilon}} \sum_{i=1}^n E\left[ |X_i - \mu_i|^{2 + \epsilon} \right].

The theorem states that if \lim_{n \to \infty} \delta_n = 0, then the standardized sum

Z_n = \frac{\sum_{i=1}^n (X_i - \mu_i)}{s_n}

converges in distribution to a standard normal random variable N(0, 1).^[16]^[17] The Lyapunov condition \delta_n \to 0 guarantees that the contributions of the tails of the individual distributions become negligible in the normalized sum, ensuring uniform asymptotic negligibility of the higher moments compared to the square root of the total variance. This moment bound implies a sufficient control over the deviations, facilitating the convergence of the characteristic function of Z_n to e^{-t^2/2}, the characteristic function of the standard normal distribution. When the random variables are identically distributed with finite (2 + \epsilon)-th moment, the condition holds automatically as a special case.^[17] A concrete example arises with independent centered normal random variables X_i \sim N(0, i^2) for i = 1, \dots, n, where the variances grow quadratically as i^2. In this case, s_n^2 \sim n^3 / 3, so s_n \sim n^{3/2}. For normals, the (2 + \epsilon)-th absolute moment satisfies E[|X_i|^{2 + \epsilon}] = C_\epsilon i^{2 + \epsilon} for some constant C_\epsilon > 0 depending on \epsilon. The sum \sum_{i=1}^n E[|X_i|^{2 + \epsilon}] \sim n^{3 + \epsilon}, while s_n^{2 + \epsilon} \sim n^{3 + (3/2)\epsilon}, yielding \delta_n \sim n^{- \epsilon / 2} \to 0. Thus, Z_n converges in distribution to N(0, 1), illustrating how the theorem accommodates rapidly increasing variances under controlled higher moments.^[17]

Lindeberg Central Limit Theorem

The Lindeberg central limit theorem provides a general sufficient condition for the central limit theorem to hold for sums of independent random variables that may not be identically distributed and can have heterogeneous variances. Consider a sequence of independent random variables X_1, X_2, \dots, X_n with E[X_i] = 0 and finite variances \sigma_i^2 > 0 for each i. Let s_n^2 = \sum_{i=1}^n \sigma_i^2 denote the total variance, and define the normalized sum Z_n = \frac{1}{s_n} \sum_{i=1}^n X_i. The theorem states that if the Lindeberg condition holds—for every \epsilon > 0,

\lim_{n \to \infty} \frac{1}{s_n^2} \sum_{i=1}^n E\left[X_i^2 I(|X_i| > \epsilon s_n)\right] = 0,

where I(\cdot) is the indicator function—then Z_n converges in distribution to a standard normal random variable N(0,1).^[18] This condition ensures that no single random variable or small subset dominates the distribution of the sum as n grows large. By focusing on the expected contribution of the tails—specifically, the second moments of X_i truncated beyond \epsilon s_n—the Lindeberg condition controls the influence of extreme values, allowing the collective behavior of the sum to approximate normality even when individual distributions have heavy tails or differing scales, provided the tails do not contribute disproportionately to the overall variance.^[18] A key related result is Feller's converse theorem, which establishes the necessity of the Lindeberg condition under additional regularity. Specifically, if the triangular array of random variables satisfies the standard setup (independence within rows, zero means, row variances summing to 1) and uniform asymptotic negligibility—meaning \lim_{n \to \infty} \max_{1 \leq i \leq n} \sigma_i^2 / s_n^2 = 0—then convergence of the normalized sums to N(0,1) implies that the Lindeberg condition holds. This equivalence highlights the condition's sharpness in characterizing asymptotic normality for independent non-identically distributed variables.^[19] For an illustration, consider independent Pareto-distributed random variables with shape parameter \alpha > 2, which ensures finite variance (and thus \sigma_i^2 < \infty). In the i.i.d. case, the Lindeberg condition is satisfied because the uniform asymptotic negligibility holds and the tails, while heavy, do not violate the truncation requirement relative to the growing s_n, leading to the normalized sum converging to N(0,1).^[20]

Generalizations for Independent Variables

Central Limit Theorem for Random Sums

The central limit theorem for random sums addresses scenarios where the number of summands is determined by a random variable, such as in renewal processes or sequential sampling. Suppose X_1, X_2, \dots are independent and identically distributed random variables with mean \mu and positive finite variance \sigma^2, and let N_n be a positive integer-valued stopping time independent of the X_i such that E[N_n] = n and N_n / n \to 1 in probability as n \to \infty. If additionally \mathrm{Var}(N_n) = o(n^2), then the standardized random sum

\frac{S_{N_n} - n \mu}{\sigma \sqrt{n}}

converges in distribution to a standard normal random variable N(0,1), where S_{N_n} = \sum_{i=1}^{N_n} X_i. This extends the classical central limit theorem by allowing the effective sample size to fluctuate randomly while maintaining asymptotic normality.^[21]^[22] The condition \mathrm{Var}(N_n) = o(n^2) ensures that the variance of the random sum aligns with the fixed-sum case. Specifically, the variance of S_{N_n} is \mathrm{Var}(S_{N_n}) = E[N_n] \sigma^2 + \mathrm{Var}(N_n) \mu^2 = n \sigma^2 + o(n^2) \mu^2, so \mathrm{Var}(S_{N_n}) \sim n \sigma^2 as n \to \infty. Without this, the term involving \mathrm{Var}(N_n) \mu^2 could dominate if \mu \neq 0, altering the scaling and preventing convergence to the normal distribution under the stated standardization.^[23] Anscombe's theorem provides the foundational uniform continuity condition underpinning this result, particularly for slowly varying stopping times. It states that if the partial sums S_k / (\sigma \sqrt{k}) form a process that is uniformly continuous in probability—meaning for any \epsilon > 0 and \delta > 0, there exists \eta > 0 such that P(|S_{k+m} / (\sigma \sqrt{k+m}) - S_k / (\sigma \sqrt{k})| > \delta) < \epsilon whenever |m| < \eta k—and N_n / n \to 1 in probability, then the random-indexed sum inherits the limiting distribution of the fixed-index process. This theorem, originally developed in the context of sequential estimation, guarantees that moderate fluctuations in N_n do not disrupt asymptotic normality when the X_i satisfy the classical central limit theorem conditions.^[21]^[24] A representative example occurs when N_n follows a Poisson distribution with mean n, so E[N_n] = n and \mathrm{Var}(N_n) = n = o(n^2). Here, S_{N_n} forms a compound Poisson random sum, which approximates a compound Poisson process at large n. Under the moment conditions on the X_i, the standardized sum converges to N(0,1), illustrating how the theorem applies even when the number of terms varies with variance linear in n, as the Poisson fluctuations are sufficiently controlled.^[23]

Products of Positive Random Variables

The central limit theorem extends to products of independent and identically distributed positive random variables Y_i > 0, i = 1, \dots, n, through a logarithmic transformation that converts the multiplicative structure into an additive one. Specifically, if \mathbb{E}[\log Y_i] = \mu and \mathrm{Var}(\log Y_i) = \sigma^2 < \infty, then the normalized logarithm of the product P_n = \prod_{i=1}^n Y_i converges in distribution to a standard normal:

\frac{\log P_n - n \mu}{\sigma \sqrt{n}} \xrightarrow{d} N(0,1).

This result follows directly from applying the classical central limit theorem to the i.i.d. random variables \log Y_i.^[25] For large n, the distribution of P_n is thus approximated by a log-normal distribution, as the exponential of a normal random variable yields a log-normal one. This approximation arises because the sum of the logs behaves normally by the central limit theorem, leading to P_n \approx \exp(n \mu + \sigma \sqrt{n} Z) where Z \sim N(0,1). The finite variance condition on \log Y_i ensures the applicability of the theorem, accommodating original distributions of Y_i that may be highly skewed or heavy-tailed, as long as the logs have well-behaved moments.^[25] This framework is particularly relevant in multiplicative growth models, such as the evolution of stock prices, where successive returns are modeled as i.i.d. positive factors, leading to a log-normal approximation for the price after many periods via the normality of cumulative log-returns.^[26] Similar principles apply in branching processes, where population sizes grow multiplicatively through offspring distributions, yielding log-normal limits under suitable moment conditions on the logs of reproduction factors.^[27]

Extensions to Dependent Processes

Central Limit Theorem under Weak Dependence

In probability theory, weak dependence refers to conditions under which the dependence between random variables in a sequence diminishes as the temporal separation increases, allowing extensions of the central limit theorem (CLT) beyond independent cases. Common measures include \alpha-mixing, also known as strong mixing, where the coefficient is defined as

\alpha(n) = \sup_k \sup_{A \in \mathcal{F}_{-\infty}^k, B \in \mathcal{F}_{k+n}^\infty} |P(A \cap B) - P(A)P(B)|,

with \mathcal{F}_I^J denoting the \sigma-algebra generated by the variables from time I to J; the sequence is \alpha-mixing if \alpha(n) \to 0 as n \to \infty. This condition, introduced by Rosenblatt, captures exponential decay of dependence in many dynamical systems. Another measure is \beta-mixing, or absolute regularity, defined via

\beta(n) = \sup_k E \left[ \sup_{B \in \mathcal{F}_{k+n}^\infty} |P(B | \mathcal{F}_{-\infty}^k) - P(B)| \right],

where the sequence is \beta-mixing if \beta(n) \to 0; this is stronger than \alpha-mixing and often applies to processes with complete asymptotic independence. A simpler form is m-dependence, where random variables separated by more than m lags are independent for some fixed m, implying rapid decay of correlations limited to short ranges.^[28] For a stationary sequence \{X_i\} with finite second moments E[X_i^2] < \infty, mean \mu, and positive long-run variance \sigma^2 = \lim_{n \to \infty} \frac{1}{n} \mathrm{Var}(\sum_{i=1}^n X_i) > 0, the CLT holds under weak dependence: if the sequence is \alpha-mixing with \alpha(n) = o(1/\log n), then the standardized sample mean converges in distribution to a standard normal, i.e.,

\sqrt{n} \left( \bar{X}_n - \mu \right) / \sigma \xrightarrow{d} \mathcal{N}(0,1).

This result accommodates correlations while ensuring the dependence does not hinder asymptotic normality. Similar statements apply to \beta-mixing sequences with \beta(n) \to 0 and m-dependent sequences, where the finite m guarantees the required decay.^[28] Proofs of these CLTs often employ the blocking technique, which divides the sequence into large blocks of size b_n \to \infty (chosen slowly relative to n) separated by smaller gaps of size g_n where mixing ensures near-independence. The sum over blocks approximates a sum of nearly independent random variables, to which the classical CLT applies; contributions from gaps and within-block dependencies are controlled by the mixing rate, vanishing as n \to \infty. This qualitative approach, refined in modern treatments, leverages the weak dependence to bound covariances. A representative example is the autoregressive process of order one (AR(1)), defined by X_t = \rho X_{t-1} + \varepsilon_t where |\rho| < 1 and \{\varepsilon_t\} are i.i.d. with mean zero and finite variance. This process is stationary, \alpha-mixing with exponential decay \alpha(n) = O(\rho^n), and the sample mean satisfies the CLT after standardization by the long-run variance \sigma^2 = \mathrm{Var}(\varepsilon_t)/(1 - \rho^2).

Martingale Difference Central Limit Theorem

The martingale difference central limit theorem provides a framework for establishing asymptotic normality in sequences where increments are conditionally mean-zero given past information, extending the classical central limit theorem to dependent processes with a martingale structure. A martingale difference sequence is defined as \xi_i = X_i - \mathbb{E}[X_i \mid \mathcal{F}_{i-1}], where \{\mathcal{F}_i\} is a filtration and \mathbb{E}[\xi_i \mid \mathcal{F}_{i-1}] = 0, with the conditional variance \sigma_i^2 = \mathbb{E}[\xi_i^2 \mid \mathcal{F}_{i-1}]. This setup captures predictability based on prior observations, making it suitable for processes with feedback or adaptation.^[29] The theorem states that for a martingale difference array \{\xi_{n,i}\}, the normalized sum S_n / \sqrt{V_n} \xrightarrow{d} N(0,1), where S_n = \sum_{i=1}^{k_n} \xi_{n,i} and V_n^2 = \sum_{i=1}^{k_n} \mathbb{E}[\xi_{n,i}^2 \mid \mathcal{F}_{n,i-1}] converges in probability to a positive constant \sigma^2, under the Lindeberg-type condition that for every \epsilon > 0,

\frac{1}{V_n^2} \sum_{i=1}^{k_n} \mathbb{E}\left[ \xi_{n,i}^2 I(|\xi_{n,i}| > \epsilon V_n) \mid \mathcal{F}_{n,i-1} \right] \to_p 0.

This conditional Lindeberg condition ensures that large jumps do not dominate the sum, analogous to the independent case but adapted to the filtration.^[29] A representative example arises in stochastic gradient descent (SGD), where parameter updates \theta_{t+1} = \theta_t - \gamma_t g_t form martingale differences with g_t as noisy gradients conditional on past iterates; the theorem implies that the averaged iterates converge in distribution to a normal around the optimum, quantifying uncertainty in non-convex optimization.^[30] This result is pivotal for inference in machine learning, such as confidence intervals for learned parameters. The theorem's key advantage lies in handling adaptive sampling or feedback mechanisms, where increments depend on previous outcomes, as seen in stochastic approximation algorithms and recursive filtering procedures like extensions of the Kalman filter in state-space models.^[31]^[32]

Proof Techniques

Proof via Characteristic Functions

The characteristic function of a random variable X is defined as \phi_X(t) = \mathbb{E}[e^{itX}], where i = \sqrt{-1} and t \in \mathbb{R}. This function uniquely determines the distribution of X, and for the standard normal distribution Z \sim \mathcal{N}(0,1), it takes the explicit form \phi_Z(t) = e^{-t^2/2}. Characteristic functions are particularly useful in limit theorems because they transform convolution operations on distributions into products, facilitating analysis of sums of independent random variables. Consider the classical central limit theorem for independent and identically distributed (i.i.d.) random variables X_1, X_2, \dots with mean \mu = 0 and finite variance \sigma^2 > 0. Let S_n = \sum_{k=1}^n X_k, and define the normalized sum Z_n = S_n / (\sigma \sqrt{n}). The characteristic function of each X_k satisfies the expansion \phi_{X_k}(t) = 1 - \frac{\sigma^2 t^2}{2} + o(t^2) as t \to 0, which follows from the Taylor series of the exponential and moment conditions. The characteristic function of Z_n is then \phi_{Z_n}(t) = \left[ \phi_{X_1}\left( \frac{t}{\sigma \sqrt{n}} \right) \right]^n. Substituting the expansion yields \phi_{X_1}\left( \frac{t}{\sigma \sqrt{n}} \right) = 1 - \frac{t^2}{2n} + o\left( \frac{1}{n} \right), so \phi_{Z_n}(t) = \left[ 1 - \frac{t^2}{2n} + o\left( \frac{1}{n} \right) \right]^n. To establish convergence, take the logarithm: \log \phi_{Z_n}(t) = n \log \left( 1 - \frac{t^2}{2n} + o\left( \frac{1}{n} \right) \right). Using the expansion \log(1 + u) = u + O(u^2) for small u, this simplifies to \log \phi_{Z_n}(t) = n \left( -\frac{t^2}{2n} + o\left( \frac{1}{n} \right) \right) = -\frac{t^2}{2} + o(1). Thus, \log \phi_{Z_n}(t) \to -\frac{t^2}{2} as n \to \infty, and by continuity of the exponential function, \phi_{Z_n}(t) \to e^{-t^2/2} pointwise for all t \in \mathbb{R}. The convergence is justified under dominated convergence or uniform integrability conditions on the characteristic functions, ensuring the limit holds uniformly on compact sets. Lévy's continuity theorem states that if the sequence of characteristic functions \phi_{Z_n}(t) converges pointwise to a characteristic function \phi(t) that is continuous at t = 0, then Z_n converges in distribution to the random variable with characteristic function \phi(t). Here, \phi(t) = e^{-t^2/2} is the characteristic function of the standard normal distribution and is continuous everywhere, so Z_n \xrightarrow{d} \mathcal{N}(0,1). This completes the proof for the classical case. The characteristic function approach extends naturally to the Lyapunov and Lindeberg central limit theorems for independent but not necessarily identically distributed random variables. In these settings, the expansions of the individual characteristic functions are aggregated under the respective moment or negligibility conditions, leading to the same limiting form e^{-t^2/2} via similar logarithmic arguments.

Proof via Stein's Method

Stein's method offers a probabilistic approach to establishing the central limit theorem by deriving explicit bounds on the approximation error between the distribution of a normalized sum and the standard normal distribution. The core of the method involves solving Stein's equation for the standard normal distribution:

f'(x) - x f(x) = h(x) - \mathbb{E}[h(Z)],

where Z \sim \mathcal{N}(0,1), h is a bounded test function satisfying \|h\|_\infty \leq 1 and \mathrm{Var}(h(Z)) \leq 1, and f is the unique solution in an appropriate function space. This equation characterizes the normal distribution because \mathbb{E}[Af(Z)] = 0 for the Stein operator A f(x) = f'(x) - x f(x) if Z is standard normal.^[33] The solution f to Stein's equation possesses uniform bounds that facilitate error control, specifically \|f\|_\infty \leq \sqrt{2/\pi} and \|f'\|_\infty \leq 1, with more refined estimates available for functions with additional smoothness. These bounds ensure that deviations from normality can be quantified through expectations involving the Stein operator applied to the target random variable. For the central limit theorem applied to the normalized sum Z_n = n^{-1/2} \sum_{i=1}^n X_i of i.i.d. random variables X_i with \mathbb{E}[X_i] = 0, \mathbb{E}[X_i^2] = 1, and finite third moment \beta = \mathbb{E}[|X_1|^3] < \infty, the method yields

|\mathbb{E}[h(Z_n)] - \mathbb{E}[h(Z)]| \leq C \frac{\beta}{\sqrt{n}}

for some universal constant C > 0, typically on the order of 0.5 to 1 depending on refinements.^[33] This bound holds uniformly over the class of test functions h, providing a rate of convergence that depends on the third moment. One prominent example is the derivation of the Berry–Esseen theorem using Stein's method, which establishes a uniform bound on the Kolmogorov distance between the distribution function of Z_n and that of Z, of the form \sup_x |P(Z_n \leq x) - \Phi(x)| \leq C \beta / \sqrt{n}, where \Phi is the standard normal cdf and the constant C can be explicitly computed or bounded (e.g., C \approx 0.4748 in optimized versions).^[34] The advantages of Stein's method lie in its ability to deliver these quantitative rates, which are sharper and more explicit than those from classical proofs, while naturally extending to settings beyond i.i.d. cases. For instance, it applies to dependent sequences such as exchangeable random variables through coupling constructions like the exchangeable pair method, yielding similar error bounds under moment conditions on the dependence structure.^[33] This flexibility makes it particularly useful for proving central limit theorems in weakly dependent processes without relying on independence assumptions.

Relation to the Law of Large Numbers

The law of large numbers (LLN) establishes that for a sequence of independent and identically distributed random variables X_1, X_2, \dots, X_n with finite mean \mu, the sample mean \bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i converges in probability to \mu as n \to \infty, known as the weak LLN.^[35] Under the additional assumption of finite first absolute moment E[|X_i|] < \infty, a stronger version holds, where \bar{X}_n converges almost surely to \mu, termed the strong LLN.^[36] The central limit theorem (CLT) serves as a refinement of the LLN by characterizing the distributional behavior of the deviations from this limit. Specifically, if the random variables also possess finite positive variance \sigma^2, then the normalized sample mean \sqrt{n} (\bar{X}_n - \mu) converges in distribution to a standard normal random variable scaled by \sigma, i.e., N(0, \sigma^2).^[37] This result, often referred to as the Lindeberg–Lévy CLT for i.i.d. variables, highlights that the LLN provides convergence to a point, whereas the CLT delineates the scale of fluctuations around that point, which diminish at the rate of n^{-1/2}.^[38] The complementary roles of the LLN and CLT have profound implications for statistical practice: the LLN guarantees the absence of bias in large-sample estimators by ensuring consistency, while the CLT governs the variance of these estimators, facilitating asymptotic normality for inference procedures such as hypothesis testing and interval estimation.^[37] For instance, in analyzing i.i.d. observations like repeated coin flips with success probability p, the LLN implies the proportion of heads approaches p, but the CLT enables approximation of the probability that this proportion deviates from p by more than a fixed amount, supporting the design of reliable confidence intervals even for small deviations when n is sufficiently large.^[38]

Common Misconceptions and Edge Cases

One common misconception about the Central Limit Theorem (CLT) is that it requires the original random variables to follow a normal distribution; in fact, the theorem applies to any independent and identically distributed random variables with finite mean and variance, regardless of their underlying distribution shape.^[2] Another frequent error is assuming the CLT holds for small sample sizes; the theorem describes an asymptotic behavior as the sample size n approaches infinity, and while a rule of thumb suggests n > 30 suffices for many symmetric distributions, this threshold can be inadequate for highly skewed or heavy-tailed ones.^[2] In edge cases where the variance is infinite, such as with Cauchy distributions, the normalized sums do not converge to a normal distribution but instead to a stable distribution, highlighting the necessity of finite second moments for the standard CLT. Convergence under the CLT can also be notably slow for skewed distributions, like the chi-squared distribution, where the sampling distribution of the mean deviates substantially from normality even at moderate sample sizes due to persistent asymmetry.^[39] Recent simulations of heavy-tailed distributions indicate that sample sizes exceeding 100 are often required for the sampling distribution to approximate normality adequately, underscoring the theorem's limitations in such scenarios.^[40] The Berry–Esseen theorem quantifies the rate of convergence in the CLT, providing a uniform bound of order O(1/\sqrt{n}) on the difference between the cumulative distribution function of the standardized sum and the standard normal, though the implicit constants depend on the third absolute moments of the variables.^[41]

Alternative Formulations

In Terms of Density Functions

The local central limit theorem provides a refinement of the classical central limit theorem by establishing pointwise or uniform convergence of the probability density functions of suitably normalized sums of independent random variables to the standard normal density. Specifically, for independent and identically distributed random variables X_1, X_2, \dots with mean \mu, finite variance \sigma^2 > 0, and non-lattice distribution, let S_n = \sum_{i=1}^n (X_i - \mu) and Z_n = S_n / (\sigma \sqrt{n}). Under these conditions, the density p_n(x) of Z_n, if it exists, satisfies \sup_x |p_n(x) - \phi(x)| \to 0 as n \to \infty, where \phi(x) = \frac{1}{\sqrt{2\pi}} e^{-x^2/2} is the standard normal density.^[42] This uniform convergence holds when the characteristic function \phi(t) = \mathbb{E}[e^{itX_1}] is integrable, which is implied by finite third moments and the non-lattice condition. A key requirement for the existence of densities and the local approximation is Cramér's condition: \limsup_{|t| \to \infty} |\phi(t)| < 1, ensuring the distribution is absolutely continuous and free from lattice structure that could prevent density convergence.^[43] Without such smoothness, the theorem may fail, but under lattice-free assumptions and finite moments up to order three, the approximation is uniform over the real line.^[42] For improved accuracy beyond the basic normal approximation, the Edgeworth expansion offers a series correction incorporating higher cumulants. The first-order Edgeworth approximation for the density is

p_n(x) = \phi(x) \left(1 + \frac{\kappa_3}{6\sqrt{n}} (x^3 - 3x)\right) + O\left(\frac{1}{n}\right),

where \kappa_3 = \mathbb{E}[(X_1 - \mu)^3]/\sigma^3 is the skewness coefficient. This expansion requires finite moments up to the fourth order and Cramér's condition to ensure the remainder term vanishes appropriately.^[44] An illustrative example is the convolution of uniform densities on [0,1], where the sum of n i.i.d. uniform random variables, normalized by \sqrt{n/12}, has a density that converges uniformly to the standard normal as n increases, demonstrating the smoothing effect toward \phi(x). For small n, such as n=2, the density is triangular, but by n=10, it closely resembles the normal curve, highlighting the theorem's practical convergence.^[45]

In Terms of Variance Calculation

In the independent and identically distributed (i.i.d.) case, the asymptotic variance in the central limit theorem (CLT) for the sum S_n = \sum_{i=1}^n X_i of random variables with common variance \sigma^2 < \infty is given by \sigma^2_n = \sum_{i=1}^n \mathrm{Var}(X_i) = n \sigma^2, such that the normalized sum (S_n - n\mu)/\sqrt{\sigma^2_n} converges in distribution to a standard normal random variable.^[46] For the sample mean \bar{X}_n = S_n / n, the asymptotic variance simplifies to \sigma^2 / n, reflecting the scaling that ensures the CLT approximation holds as n \to \infty.^[46] Under weak dependence, such as in stationary processes, the asymptotic variance incorporates covariances between observations, yielding the long-run variance formula \sigma^2 = \mathrm{Var}(X_1) + 2 \sum_{k=1}^\infty \mathrm{Cov}(X_1, X_{1+k}) for the normalized sample mean, which replaces the i.i.d. variance in the CLT to account for serial correlation.^[47] This adjustment arises in extensions of the CLT to dependent data, where the long-run variance captures the cumulative effect of temporal dependencies on the variability of the estimator.^[47] Practical estimation of the asymptotic variance relies on sample-based methods tailored to the dependence structure. In the i.i.d. setting, the plug-in estimator uses the sample variance \hat{\sigma}^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X}_n)^2, which is consistent for \sigma^2 under finite variance assumptions.^[46] For dependent cases, heteroskedasticity and autocorrelation consistent (HAC) estimators, such as the Newey-West estimator, construct a positive semi-definite approximation to the long-run variance by weighting sample autocovariances with a kernel function (e.g., Bartlett kernel) and truncating at a bandwidth l_n = o(\sqrt{n}) to ensure consistency. In time series analysis, the long-run variance is particularly relevant for applying the CLT to the sample mean of a stationary process, where the estimator \hat{\sigma}^2_{LR} = \hat{\gamma}_0 + 2 \sum_{k=1}^l w(k/l) \hat{\gamma}_k (with weights w(\cdot) from a kernel and \hat{\gamma}_k the sample autocovariance at lag k) scales the standard error, enabling valid inference even under autocorrelation.^[48] Recent advancements in the 2020s have focused on robust variance estimation for high-dimensional CLT applications, where the dimension p grows with sample size n, incorporating techniques like random projection and dependency-aware bounds to handle sparsity and temporal correlations while maintaining consistency rates.^[49]

Applications

In Statistical Regression

In linear regression models, the central limit theorem (CLT) plays a crucial role in establishing the asymptotic normality of the ordinary least squares (OLS) estimator. Consider the standard linear model y = X\beta + \epsilon, where y is an n \times 1 vector of observations, X is an n \times k design matrix of regressors, \beta is the k \times 1 parameter vector, and \epsilon is an n \times 1 vector of errors assumed to be independent and identically distributed (i.i.d.) with mean zero and finite variance \sigma^2. The OLS estimator is given by

\hat{\beta} = \left( \frac{X^T X}{n} \right)^{-1} \left( \frac{X^T y}{n} \right).

By rewriting \hat{\beta} - \beta = \left( \frac{X^T X}{n} \right)^{-1} \left( \frac{X^T \epsilon}{n} \right) and applying the CLT to the score term \frac{1}{n} \sum_{i=1}^n X_i \epsilon_i, which is a sample average of i.i.d. random vectors under suitable moment conditions, it follows that \sqrt{n} (\hat{\beta} - \beta) \xrightarrow{d} N(0, \Sigma^{-1} \sigma^2), where \Sigma = \operatorname{plim} (X^T X / n) is positive definite. This result holds for the vector \beta, leveraging the multidimensional CLT.^[50]^[46] The asymptotic normality requires several conditions for validity. Strict exogeneity must hold, meaning E[\epsilon_i | X_i] = 0 for all i, ensuring unbiasedness and consistency. There should be no perfect multicollinearity, so \operatorname{rank}(X) = k and \Sigma is invertible. The errors need to satisfy i.i.d. conditions with finite second moments to invoke the CLT, often via the Lindeberg-Lévy or Lyapunov versions. Homoskedasticity, \operatorname{Var}(\epsilon_i | X_i) = \sigma^2, yields the simple covariance matrix form, but this can be relaxed under heteroskedasticity by using robust variance estimators, preserving asymptotic normality while adjusting the variance to \Sigma^{-1} ( \operatorname{plim} \frac{1}{n} \sum X_i X_i' \epsilon_i^2 ) \Sigma^{-1}.^[46]^[50] When the errors are normally distributed, the OLS estimator is exactly normally distributed for any sample size n, as the linear combination of normal errors remains normal. For non-normal errors, however, exact normality does not hold, but the CLT ensures the sampling distribution of \hat{\beta} approximates a normal distribution for sufficiently large n, making inference reliable in practice even without normality.^[46] This asymptotic normality underpins standard inference procedures in regression analysis. Wald t-tests for individual coefficients, based on \sqrt{n} (\hat{\beta}_j - \beta_j) / \sqrt{ \widehat{\operatorname{Var}}(\hat{\beta}_j) } \xrightarrow{d} N(0,1), and F-tests for linear restrictions on \beta, which follow a chi-squared distribution asymptotically, derive their large-sample validity directly from the CLT applied to the OLS estimator. These tests are pivotal in hypothesis testing and confidence interval construction.^[46]^[50]

In Other Fields and Illustrations

In physics, the central limit theorem manifests through Donsker's invariance principle, which establishes that a scaled random walk converges in distribution to a Brownian motion process.^[51] This functional central limit theorem extends the classical CLT by showing that the trajectory of a simple symmetric random walk on the integers, when appropriately rescaled in space and time, approximates the paths of a standard Brownian motion in the Skorokhod space of continuous functions.^[52] Brownian motion thus serves as the limiting diffusion process for aggregated microscopic random fluctuations, such as particle displacements in fluids, underpinning models in statistical mechanics and diffusion theory.^[51] In finance, the CLT justifies the normality approximation for portfolio returns over extended horizons, as the logarithmic returns of assets are often modeled as independent and identically distributed random variables.^[53] The total log-return of a diversified portfolio can be viewed as the sum of these individual log-returns, leading to a normal distribution by the CLT when the number of periods or assets is large, which facilitates risk assessment via metrics like Value at Risk. This approximation holds under mild conditions on the moments of the log-returns, enabling the use of Gaussian models for pricing derivatives and optimizing allocations despite the non-normality of single-period returns.^[53] In machine learning, the martingale central limit theorem applies to the gradient noise in stochastic gradient descent (SGD), where the iterative updates accumulate noise terms that behave like a martingale difference sequence.^[54] For SGD applied to convex objectives with bounded variance noise, the properly scaled parameter trajectory converges in distribution to a Gaussian process, providing asymptotic normality for the optimizer's error.^[55] This result quantifies the uncertainty in trained models, such as in neural networks, and informs step-size selection to balance bias and variance in the convergence.^[54] In numerical simulations, the CLT characterizes the error in Monte Carlo integration, where the estimator for an integral is the average of independent function evaluations, yielding a normal distribution for the error with variance scaling as the reciprocal of the sample size.^[56] This asymptotic normality enables confidence intervals for the approximation and guides variance reduction techniques, such as importance sampling or control variates, which exploit correlations to shrink the effective variance while preserving unbiasedness. For instance, in high-dimensional integrals common in physics simulations, the CLT-based error bounds justify the method's reliability for large sample sizes, even when the integrand lacks closed-form moments.^[56] Applications in quantum optics leverage CLT variants for photon counting statistics, where the total photon number in multimode Gaussian states follows a normal distribution in the high-intensity limit due to the summation of independent quantum fluctuations. Recent analyses extend this to quantum entropy measures, deriving central limit theorems for the von Neumann entropy of bosonic systems under thermal or coherent driving, which aids in characterizing quantum correlations in optical experiments.^[57] For partially distinguishable photons, a quantum CLT describes the convergence of counting distributions to Gaussian forms, impacting protocols in quantum metrology and imaging where photon bunching or antibunching deviates from classical limits.^[58]

Historical Development

Early Formulations

The early formulations of the central limit theorem arose amid the development of probability theory in the 18th and early 19th centuries, driven by practical needs in analyzing games of chance and astronomical observations. Mathematicians initially explored discrete probability distributions, such as those from dice rolls or card games, to compute fair odds and annuities, as seen in Abraham de Moivre's work on gambling problems. This discrete framework gradually shifted toward continuous approximations to handle larger numbers of trials, motivated by the law of large numbers established by Jacob Bernoulli. In astronomy, the need to quantify measurement errors in celestial data, such as planetary positions, further propelled interest in how sums of independent errors aggregate, bridging probabilistic models with empirical sciences.^[59]^[60] Abraham de Moivre provided the inaugural formulation in 1733, detailed in the third edition of his The Doctrine of Chances (1738). Focusing on the binomial distribution for fair trials (p = 1/2), de Moivre used Stirling's approximation to the factorial to derive the probability of outcomes near the mean, showing that these probabilities approximate those of a normal distribution as the number of trials increases. This result, known as the de Moivre-Laplace theorem, represented an early central limit theorem for symmetric binomial cases, effectively demonstrating the transition from discrete to continuous distributions in repeated independent events like coin flips.^[61]^[62] Pierre-Simon Laplace significantly broadened this in 1810, with a refined version in his 1812 Théorie analytique des probabilités. He extended the theorem to sums of independent and identically distributed random variables possessing finite moments, utilizing generating functions to prove that the standardized sum converges in distribution to a normal random variable. Laplace's approach applied directly to error theory in astronomy, where he analyzed deviations in comet orbit measurements, establishing the normal distribution as the limiting form for aggregated independent errors with mean zero and finite variance.^[63]^[60] Carl Friedrich Gauss contributed to the foundational ideas in 1809 through Theoria motus corporum coelestium in sectionibus conicis solem ambientium, where he posited the normal distribution as the inherent law governing observational errors in astronomical computations. Gauss derived the distribution's properties under the assumption of maximum likelihood for least squares estimation but did not prove a general convergence result for sums of arbitrary independent variables, thus stopping short of a complete central limit theorem. His emphasis on normality in error propagation nonetheless reinforced the probabilistic underpinnings that Laplace and others built upon.^[64]^[59] In the late 19th century, Pafnuty Chebyshev initiated rigorous approaches to the central limit theorem using the method of moments, though his 1887 proof was incomplete and required finite moments up to the fourth order. Andrei Markov completed this line of work in 1898 by providing a moment-theoretic proof under conditions including bounded variances, marking a significant step toward generality. Aleksandr Lyapunov advanced the theorem decisively in 1901 with the first fully rigorous proof for sums of independent random variables with finite variances, employing characteristic functions and introducing the Lyapunov condition—a sufficient criterion based on the existence of a δ > 0 such that the sum of E[|X_{i,n}|^{2+δ}] / s_n ^{2+δ} → 0 as n → ∞, where s_n is the standard deviation of the sum. Jarl Waldemar Lindeberg refined Lyapunov's result in 1922 by establishing a weaker sufficient condition, now known as the Lindeberg condition: for every ε > 0, the sum over i of E[ X_{i,n}^2 1_{|X_{i,n}| > ε s_n} ] / s_n^2 → 0 as n → ∞, allowing the theorem to apply to non-identically distributed variables without higher moments beyond the second. This condition proved foundational for broader applications. Georg Pólya coined the term "central limit theorem" in 1920 to emphasize its pivotal role in probability theory.^[65] Paul Lévy extended the framework in 1935 by proving the Lindeberg condition's necessity and sufficiency for independent variables, while also initiating work on dependent cases like martingales. Harald Cramér contributed a key lemma in 1936 that facilitated validations of these results using characteristic functions. William Feller, in 1935, established necessary and sufficient conditions via characteristic functions, culminating in the Lindeberg-Feller theorem. Modern refinements focus on convergence rates, higher-order approximations, and extensions to functional settings. The Berry–Esseen theorem, independently developed by A. C. Berry in 1941 and Carl-Gustav Esseen in 1942, quantifies the rate of convergence in the classical CLT, bounding the supremum distance between the cumulative distribution function of the normalized sum and the standard normal by C ρ / √n, where ρ involves the third absolute moment divided by the variance^{3/2}, and C is a universal constant (originally around 7.59, later improved).^[66] This provides non-asymptotic error estimates essential for finite-sample approximations. Francis Edgeworth's series expansions, originating in 1904 but refined in the mid-20th century, offer higher-order corrections to the normal approximation, incorporating skewness and kurtosis terms for improved accuracy beyond the leading Gaussian term.^[67] In the functional domain, Monroe Donsker's invariance principle (1951) generalizes the CLT to stochastic processes, stating that the rescaled random walk converges in distribution to Brownian motion in the Skorokhod space, enabling limit theorems for empirical processes and time-dependent sums.^[68] Subsequent developments, such as Trotter's 1959 recognition of the Lindeberg method's applicability to infinite-dimensional spaces, further broadened these functional extensions. These refinements underpin applications in high-dimensional statistics, bootstrap methods, and non-parametric inference, where precise error control is critical.

References

[1]
[PDF] Central Limit Theorem and the Law of Large Numbers Class 6 ...
The central limit theorem says that the sum or average of many independent copies of a random variable is approximately a normal random variable. The CLT goes ...
[2]
7.4 - Central Limit Theorem | STAT 200
The Central Limit Theorem states that if the sample size is sufficiently large then the sampling distribution will be approximately normally distributed.Missing: definition | Show results with:definition
[3]
[PDF] Central Theorems - Stanford University
• History of the Central Limit Theorem. ▫ 1733: CLT for X ~ Ber(1/2) postulated by. Abraham de Moivre. ▫ 1823: Pierre-Simon Laplace extends de Moivre's work ...
[4]
A History of the Central Limit Theorem - SpringerLink
Free delivery 14-day returnsThis study discusses the history of the central limit theorem and related probabilistic limit theorems from about 1810 through 1950.
[5]
Central limit theorem: the cornerstone of modern statistics - PMC
The central limit theorem is the most fundamental theory in modern statistics. Without this theorem, parametric tests based on the assumption that sample data ...Missing: authoritative | Show results with:authoritative
[6]
Central Limit Theorem - Probability Course
It states that, under certain conditions, the sum of a large number of random variables is approximately normal.Missing: authoritative sources
[7]
254A, Notes 2: The central limit theorem | What's new - Terry Tao
Jan 5, 2010 · The central limit theorem (and its variants, which we discuss below) are extremely useful tools in random matrix theory, in particular through the control they ...Missing: source | Show results with:source
[8]
[PDF] Probability and Measure - Southern Illinois University
Oct 30, 2025 · This chapter discusses the central limit theorem, convergence in distribution ... . Theorem 4.41: the Multivariate Central Limit Theorem (MCLT).
[9]
[PDF] Visualizing the Multivariate Normal, Lecture 9 - Stat@Duke
Sep 15, 2015 · of the multivariate normal distribution are ellipsoids. ▷ The axes of the ellipsoids correspond to eigenvectors of the covariance matrix. ▷ The ...Missing: elliptical | Show results with:elliptical
[10]
[PDF] arXiv:2212.08921v2 [math.ST] 29 May 2023
May 29, 2023 · By the central limit theorem, n1/2 ¯g12,g1g2, ¯g1, ¯g2 ... Logistic: Let U, V be two correlated uniform random variables as defined above.
[11]
[PDF] The Central Limit Theorem - UMD MATH
We can think of the i.i.d. condition as meaning that the Xi are repeated exper- iments, or alternately random samples, from some given probability distribution.
[12]
[PDF] Central Limit Theorem - Washington
Central Limit Theorem. Page 8. What does i.i.d mean? Independent and Identically Distributed (i.i.d). For random variables X ,X ,…,X to be i.i.d., they must.
[13]
Statistics 5101 (Geyer, Spring 2022) Central Limit Theorem
Dec 8, 2020 · The Berry-Esseen Theorem says the rate of convergence in the central limit theorem is controlled by skewness. Every other aspect of the ...
[14]
[PDF] Lecture 01 & 02: the Central Limit Theorem and Tail Bounds
The Central Limit Theorem (CLT) for i.i.d. random variables can be stated as follows. Theorem 1 (the Central Limit Theorem). Let Z be a standard Gaussian. For ...
[15]
On the central limit theorem for negatively correlated random ...
A corollary of our main result is that the central limit theorem holds for pairwise independent jointly symmetric random variables under Lindeberg's condition.On The Central Limit Theorem... · 1. Introduction And Main... · Cf. Chen, 1978, Lemma 1.2<|separator|>
[16]
A new direct proof of the central limit theorem - Project Euclid
We provide a brief history of the CLT. The first major contribution to the. CLT was in 1733 by de Moivre. De Moivre proved a version of the CLT for. Bernoulli ...<|separator|>
[17]
[PDF] Central limit theorem
Jan 29, 2021 · ... central limit theorem was discerned, when, in 1901, Russian mathematician Aleksandr Lyapunov defined it in general terms and proved ...
[18]
None
Summary of each segment:
[19]
[PDF] CHAPTER 4. LIMIT THEOREMS IN STATISTICS
Thus, the tails of the sequence of random variables cannot “fatten” too rapidly. The Lindeberg condition allows the variances of the Yk to vary within limits.
[20]
[PDF] Lecture 10 : Setup for the Central Limit Theorem
Theorem 10.4 (Lyapounovs Theorem) If a triangular array satisfies the Triangular Array Con- ditions and the Lyapounov condition (10.8), then L(Si) → N(0, 1).
[21]
[PDF] 9 Sums of Independent Random Variables - Duke Statistical Science
9.2 Limits of Partial Sums and the Central Limit Theorem ... This \Lindeberg Condition" implies both of ... the Pareto distribution (often used to model in ...<|control11|><|separator|>
[22]
Large-sample theory of sequential estimation
LARGE-SAMPLE THEORY OF SEQUENTIAL ESTIMATION. BY F. J. ANSCOMBE. Received 9 April 1952. In a previous large-sample treatment of sequential estimation (l), it ...
[23]
[PDF] ON THE CENTRAL LIMIT THEOREM FOR THE SUM OF A ...
In what follows we shall investigate the limiting distribution of the random variables rlVn for n-*-{-oo where vn (n = 1,2,...) is a sequence of.
[24]
[PDF] IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt Topic for ...
Now using the CLT for the compound Poisson process, with. √. 612,000 ≈ 782 and 600/782 ≈. 0.767, we obtain the approximating probability. P(X(15) ≤ 6000) ...
[25]
[PDF] Anscombe's theorem 60 years later - Allan Gut - DiVA portal
Aug 15, 2011 · Anscombe's theorem, from 1952, concerns limit theorems for randomly indexed processes, where the number of observations is random.
[26]
[PDF] Central limit theorem and almost sure ... - Indian Academy of Sciences
... products of i.i.d. positive, square integrable random variables are asymptotically log-normal. This fact is an immediate consequence of the classical central.
[27]
[PDF] Lognormal Model for Stock Prices - UCSD Math
In view of the Central Limit Theorem, under mild additional conditions—for example, if logX1 has finite variance, then. logX1 must have a normal distribution.
[28]
[PDF] The Fundamentals of Heavy Tails: Properties, Emergence, and ...
Theorem 6.1 (The multiplicative central limit theorem). Suppose {Yi}i≥1 is an i.i.d. sequence of strictly positive random variables satisfying Var[log Yi] ...
[29]
https://projecteuclid.org/journals/annals-of-probability/volume-2/issue-4/Dependent-Central-Limit-Theorems-and-Invariance-Principles/10.1214/aop/1176996608.full
[30]
[PDF] Basic Properties of Strong Mixing Conditions. A Survey and Some ...
[21] R.C. Bradley. A central limit theorem for stationary ρ-mixing sequences with infinite variance. Ann. Probab. 16 (1988) 313-332. MR920274. [22] R.C. Bradley ...
[31]
[PDF] The functional central limit theorem for strongly mixing processes
HERRNDORF, A Functional Central Limit Theorem for Strongly Mixing Sequences of ... I. A. IBRAGIMOV, Some Limit Theorems for Stationary Processes, Theor.
[32]
Dependent Central Limit Theorems and Invariance Principles
This paper proves central limit theorems for martingales and near-martingales without moments or full Lindeberg condition, and extends them to invariance ...
[33]
Normal Approximation for Stochastic Gradient Descent via Non ...
Apr 3, 2019 · A crucial intermediate step is proving a non-asymptotic martingale central limit theorem (CLT), i.e., establishing the rates of convergence ...
[34]
Central limit theorems for stochastic approximation with controlled ...
Abstract. This paper provides a Central Limit Theorem (CLT) for a process {θn,n ≥ 0} satisfying a stochastic approximation (SA) equation of the form θn+1 ...
[35]
ASYMPTOTIC NORMALITY OF THE MAXIMUM LIKELIHOOD ...
These constitute a stationary martingale increment sequence, and hence by a martingale central limit theorem we obtain the stated limit distribution of. Page ...
[36]
A bound for the error in the normal approximation to the distribution ...
6.2 | 1972 A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. Chapter Author(s) Charles Stein ... PDF ...
[37]
A Tricentenary history of the Law of Large Numbers - Project Euclid
The Weak Law of Large Numbers is traced chronologically from its inception as Jacob Bernoulli's Theorem in 1713, through De Moivre's Theorem, ...
[38]
[PDF] An Expectile Strong Law of Large Numbers - Air Force Academy
Jul 20, 2022 · Kolmogorov's strong law of large numbers (Kolmogorov, 1933) is the most principal theorem in asymptotic statistics. The theorem states that ...
[39]
The Law of Large Numbers and the Central Limit Theorem
When pollsters ask a question such as “Do you approve of the job performance of the president?” they usually take large samples.
[40]
17. LLN and CLT - Quantitative Economics with Julia
The CLT refines the LLN. The LLN gives conditions under which sample moments converge to population moments as sample size increases. The CLT provides ...
[41]
[PDF] Chapter 5: The Normal Distribution and the Central Limit Theorem
The speed of convergence of Sn to the Normal distribution depends upon the distribution of X. Skewed distributions converge more slowly than symmetric.<|control11|><|separator|>
[42]
Chapter 5 Simulation of Random Variables
Aug 5, 2023 · Figure 5.6: The Central Limit Theorem in action for an extremely skew population. Even with a sample size of 1000, the density still fails to be normal, ...
[43]
[PDF] Berry–Esseen Bounds for Independent Random Variables
In this chapter we illustrate some of the main ideas of the Stein method by proving the classical Lindeberg central limit theorem and the Berry–Esseen ...
[44]
[PDF] The Local Limit Theorem and the Almost Sure Local Limit Theorem ∗
Generally speaking, the local limit theorem describes how the density of a sum of random variables follows the normal curve. Historically the local limit ...
[45]
[PDF] A Local Limit Theorem - Wharton Faculty Platform
We remark that (2.5) holds if Xi satisfies condition C of Cramer, lim sup iv (z) I < 1, as z ->oc . 3. Nonlattice case with infinite third moment. We now assume ...
[46]
[PDF] The Bootstrap and Edgeworth Expansion - People @EECS
It is aimed at a graduate-level audience who have some expo sure to the methods of theoretical statistics. " This is an authoritative book-length discussion of ...
[47]
A Note on the Convolution of Uniform Distributions - jstor
normal distribution function, +(x) the normal density function, and +(i)(x) its ith derivative; then, following Cram6r, we have approximately. (11) f(x) = +( ...
[48]
[PDF] Asymptotic Theory for OLS - Colin Cameron
Examples of central limit theorems include the following. Theorem A14: (Lindeberg-Levy CLT) Let {Xi} be iid with E[Xi] = µ and V[Xi] = σ2. Then ZN = √N ...
[49]
[PDF] A Theory of Robust Long-Run Variance Estimation
Long-run variance estimation is estimating the scale of a Gaussian process, related to the sum of autocovariances, and is important in time series inference.
[50]
[PDF] Econ 512: Financial Econometrics Time Series Concepts
Mar 30, 2009 · The sample size, T, times the asymptotic variance of the sample mean is often called the long-run variance of yt : lrv(yt) = T · avar(¯y) ...<|separator|>
[51]
[PDF] Central limit theorems for high dimensional dependent data
In this section, we provide proofs of the high-dimensional CLTs on hyper-rectangles in Section 2.1 under α-mixing (Theorem 1), dependency graph (Theorem 2), and ...
[52]
Properties of the OLS estimator | Consistency, asymptotic normality
The OLS estimator has properties such as consistency, meaning it converges to the true value, and asymptotic normality, meaning it is asymptotically ...
[53]
[PDF] LECTURE NOTES ON DONSKER'S THEOREM
Then, Sn ⇒ W as n → ∞, where W denotes Brownian motion. The latter is viewed as a random element of (C[0,1],B). It may help to recall that this means ...
[54]
[PDF] Brownian motion as the limiting distribution of random walks
Aug 28, 2021 · Donsker's invariance principle, also known as the functional central limit theo- rem, extends the central limit theorem from random variables to ...
[55]
Full article: Long-horizon asset and portfolio returns revisited
Jul 24, 2023 · The central limit theorem implies that the virtual continuously compounded returns achieve normal distributions as the time (T) horizon extends.
[56]
[PDF] Normal Approximation for Stochastic Gradient Descent via Non ...
In this section, we prove a multivariate martingale central limit theorem (CLT) with explicit rates and constants. Convergence rates of univariate martingale ...
[57]
[PDF] A Variational Analysis of Stochastic Gradient Algorithms
Invoking the central limit theorem, we assume that the gra- dient noise is Gaussian with variance ∝ 1/S: gS (θ) ≈ g(θ) + 1√. S. ∆g(θ), ∆g(θ) ∼ N(0,C(θ) ...<|separator|>
[58]
[PDF] Monte Carlo and Quasi-Monte Carlo Methods - UCLA Mathematics
The Central Limit Theorem (CLT) (Feller 1971) describes the size and stat- istical properties of Monte Carlo integration error. Theorem 2.1 For N large,. €N ...
[59]
[PDF] Quantum Entropy and Central Limit Theorem - arXiv
CV quantum information has been widely used in quantum optics and other settings to deal with continuous degrees of freedom [1]. Gaussian states, and processes ...
[60]
[PDF] A central limit theorem for partially distinguishable bosons - arXiv
Apr 17, 2024 · The quantum central limit theorem derived by Cushen and Hudson provides the foundations for understanding how subsystems of large bosonic ...
[61]
[PDF] The Early Development of Mathematical Probability - Glenn Shafer
On the mathematical side was the method of generating functions, the central limit theorem, and Laplace's techniques for evaluating posterior probabilities. On ...
[62]
[PDF] History of the Central Limit Theorem - AMS Tesi di Laurea
The term “Central Limit Theorem”, abbreviated with CLT, indicates a collection of theorems, formulated between 1810 and 1935, regarding the convergence of ...Missing: original | Show results with:original
[63]
The doctrine of chances: or, a method of calculating the probabilities ...
Sep 29, 2023 · The doctrine of chances: or, a method of calculating the probabilities of events in play. The second edition, fuller, clearer, and more correct than the first.
[64]
[PDF] De Moivre on the Law of Normal Probability - University of York
His own translation, with some additions, was included in the second edition (1738) of The Doctrine of Chances, pages 235–243. This paper gave the first ...
[65]
Théorie analytique des probabilités - Internet Archive
Feb 5, 2009 · Laplace, Pierre Simon, marquis de, 1749-1827. Publication date: 1812 ... PDF download · download 1 file · SINGLE PAGE PROCESSED TIFF ZIP download.
[66]
[PDF] Theoria motus corporum coelestium in sectionibus conicis solem ...
Google is proud to partner with libraries to digitize public domain materials and make them widely accessible. Public domain books belong to the.
[67]
[PDF] A History of the Central Limit Theorem
Jul 31, 2019 · Laplace used the generating function T(t) = ∑m k=−m pktk, where Pj is equal to the coefficient of tj after the multiplication of [T(t)]n ...
[68]
The Berry-Esseen Theorem for $U$-Statistics - Project Euclid
This concludes a series of investigations on the Berry-Esseen theorem for U U -statistics by Grams and Serfling, Bickel, and Chan and Wierman. Citation.Missing: original | Show results with:original
[69]
275A, Notes 5: Variants of the central limit theorem - Terence Tao
Nov 19, 2015 · There are many variants, refinements, and generalisations of the central limit theorem, and the purpose of this set of notes is to present a small sample of ...Missing: modern developments
[70]
[PDF] A Review of Basic FCLT's - Columbia University
Sep 10, 2016 · Abstract. We briefly review Donsker's functional central limit theorem (FCLT), which is a generalization of the classic central limit ...