Berry–Esseen theorem

The Berry–Esseen theorem is a fundamental result in probability theory that quantifies the rate of convergence in the central limit theorem by providing an explicit upper bound on the difference between the cumulative distribution function of a standardized sum of independent random variables and the cumulative distribution function of the standard normal distribution. Specifically, for independent random variables \xi_1, \dots, \xi_n with zero means, variances summing to 1, and finite third absolute moments, the theorem states that \sup_{z \in \mathbb{R}} |P(W \leq z) - \Phi(z)| \leq C \sum_{i=1}^n E[|\xi_i|^3], where W = \sum_{i=1}^n \xi_i, \Phi is the standard normal CDF, and C is a universal constant.^[1] The theorem was independently established by Andrew C. Berry in 1941 and Carl-Gustav Esseen in 1942. Berry's proof appeared in his paper titled "The accuracy of the Gaussian approximation to the sum of independent variates," published in the Transactions of the American Mathematical Society. Esseen's contribution, "On the Liapounoff Limit of Error in the Theory of Probability," followed shortly thereafter in Arkiv för Matematik, Astronomi och Fysik, extending and refining the bound under Lyapunov conditions.^[1] These works built on earlier qualitative results like the central limit theorem, but introduced the crucial third-moment condition to achieve a rate of order O(1/\sqrt{n}) for identically distributed variables. Over time, the universal constant C in the bound has been sharpened through subsequent research, starting from Esseen's original estimate of 7.59 and improving to less than 0.469 as of 2018 due to I. G. Shevtsova.^[1]^[2] The theorem applies to non-identically distributed variables as long as the third moments are finite, making it versatile for practical approximations in statistics and beyond. Extensions include versions for dependent variables, U-statistics, and Markov chains, enhancing its utility in modern applications such as bootstrap methods and high-dimensional data analysis.^[3]

Introduction

Overview

The Berry–Esseen theorem provides a quantitative measure of how closely the distribution of a standardized sum of independent random variables approximates the standard normal distribution, specifically bounding the supremum distance between their cumulative distribution functions.^[1] This theorem builds on the central limit theorem, which qualitatively asserts convergence to normality as the number of variables increases, by offering an explicit error estimate for the approximation.^[4] The core implication of the theorem is that it quantifies the rate of convergence in this central limit theorem setting, achieving an order of $1/\sqrt{n} for a sum of n variables under suitable moment conditions.^[4] These conditions typically require the random variables to have finite third absolute moments, ensuring the bound remains valid and controls the approximation error effectively.^[1] For instance, in the case of a sum of n independent identically distributed Bernoulli random variables with success probability p, the standardized sum's distribution converges to the standard normal, and the Berry–Esseen theorem supplies a bound on the discrepancy that decreases as O(1/\sqrt{n}), illustrating the theorem's practical utility in assessing normality approximations for discrete data.^[5]

Relation to Central Limit Theorem

The central limit theorem (CLT), established by Lindeberg (1922) and Lévy (1937), asserts that if X_1, X_2, \dots, X_n are independent and identically distributed random variables with finite mean \mu and variance \sigma^2 > 0, then the standardized sum S_n = \frac{\sum_{i=1}^n (X_i - \mu)}{\sigma \sqrt{n}} converges in distribution to the standard normal random variable Z \sim \mathcal{N}(0, 1) as n \to \infty.^[6] This means that the cumulative distribution function (CDF) F_n(x) = P(S_n \leq x) satisfies \lim_{n \to \infty} F_n(x) = \Phi(x) for every continuity point x of the standard normal CDF \Phi(x).^[6] Convergence in distribution provides asymptotic normality but offers no quantitative control over the speed of this approximation, leaving practical applications without explicit error estimates. The Berry–Esseen theorem overcomes this limitation of the CLT by delivering a uniform bound on the discrepancy between F_n(x) and \Phi(x), measured in the supremum norm \|F_n - \Phi\|_\infty = \sup_x |F_n(x) - \Phi(x)|, known as the Kolmogorov distance. In the i.i.d. case, this bound is of the order O(1/\sqrt{n}), explicitly quantifying the rate at which the distribution of S_n approaches normality.^[7] Berry (1941) first derived such a bound for sums of independent (not necessarily identical) variates, showing that under finite third absolute moments, the error is at most a constant times a term involving these moments divided by the standard deviation raised to the third power, yielding the O(1/\sqrt{n}) rate for i.i.d. variables.^[7] Esseen (1942) refined this result, providing a sharper constant and extending the bound to non-identical distributions while confirming the same order of convergence.^[1] This uniform convergence of CDFs strengthens the mere pointwise convergence in distribution from the CLT, enabling reliable finite-sample approximations in statistics and probability. While the CLT relies solely on finite second moments for its validity, the Berry–Esseen bound requires finite third absolute moments to control the error term effectively. The third moment enters because the proofs, typically via characteristic functions, expand the logarithm of the characteristic function around zero using a Taylor series up to the second order for the CLT's Gaussian limit, with the third-order term bounding the remainder to achieve the O(1/\sqrt{n}) precision.^[7] Without finite third moments, the error may not decay at this rate, though weaker bounds exist under higher moment assumptions.^[1]

History

Berry's Contribution

Andrew C. Berry made a pioneering contribution to the quantification of convergence rates in the central limit theorem through his 1941 paper, "The Accuracy of the Gaussian Approximation to the Sum of Independent Variates," published in the Transactions of the American Mathematical Society.^[7] In this work, Berry addressed the error in approximating the distribution of sums of independent random variables by a Gaussian distribution, providing the first explicit uniform bound under finite third-moment conditions.^[7] For the case of independent and identically distributed (i.i.d.) summands X_1, \dots, X_n with mean zero, variance \sigma^2 > 0, and finite third absolute moment E[|X_1|^3] < \infty, Berry established that the supremum norm difference between the cumulative distribution function of the normalized sum S_n / (\sigma \sqrt{n}) and the standard normal distribution function \Phi satisfies

\sup_x \left| P\left( \frac{S_n}{\sigma \sqrt{n}} \leq x \right) - \Phi(x) \right| \leq \frac{C \rho}{\sqrt{n}},

where \rho = E[|X_1|^3] / \sigma^3 and C is a universal constant.^[7] This bound, later refined in subsequent works, captures the O(1/\sqrt{n}) rate of convergence, highlighting the role of the third moment in controlling the approximation error.^[7] Berry derived this result using characteristic functions, leveraging the Fourier inversion formula to express the distribution functions and bound their difference through estimates on the decay of the characteristic function of the sum.^[7] His approach built on earlier qualitative results like those of Lyapunov and Lindeberg, shifting focus to precise quantitative error terms essential for statistical applications.^[7] Independently, Carl-Gustav Esseen developed a similar result shortly thereafter in 1942.

Esseen's Contribution

Carl-Gustav Esseen made pivotal advancements to the Berry–Esseen theorem in his 1942 paper, extending the result beyond identically distributed random variables. In "On the Liapounoff Limit of Error in the Theory of Probability," published in Arkiv för Matematik, Astronomi och Fysik, Esseen proved the theorem for sums of independent random variables that are not necessarily identically distributed, assuming finite third absolute moments. This generalization allowed for broader applicability in central limit theorem approximations, where the error bound is given by

\sup_x \left| F_n(x) - \Phi(x) \right| \le C \frac{\sum_{i=1}^n \mathbb{E}[|X_i|^3]}{\left( \sum_{i=1}^n \mathbb{E}[X_i^2] \right)^{3/2} },

with an explicit constant C \le 7.59. Esseen's work built upon Berry's 1941 contribution for the i.i.d. case by incorporating Lyapunov's condition for convergence, enabling the handling of heterogeneous distributions while maintaining quantitative error control. His proof relied on moment inequalities and truncation arguments to bound the deviation from normality. This refinement improved upon the implicit constants in prior estimates and established a foundational non-i.i.d. version of the theorem. In his subsequent 1945 memoir, "Fourier Analysis of Distribution Functions: A Mathematical Study of the Laplace-Gaussian Law," published in Acta Mathematica, Esseen introduced advanced smoothing techniques via characteristic functions to derive sharper bounds. These methods involved convolving distributions with smooth kernels to facilitate Fourier inversion and yield more precise error estimates in the non-i.i.d. setting. Esseen's Fourier-based unification not only refined the constant further but also profoundly influenced modern probability theory, paving the way for multidimensional extensions and optimal constant pursuits.^[8]

Statement

Independent Identically Distributed Case

The Berry–Esseen theorem in the independent identically distributed (i.i.d.) case quantifies the rate of convergence in the central limit theorem for sums of i.i.d. random variables under finite third-moment assumptions. Let X_1, X_2, \dots, X_n be i.i.d. random variables with mean \mathbb{E}[X_i] = 0, positive variance \sigma^2 = \mathrm{Var}(X_i) > 0, and finite third absolute moment \rho = \mathbb{E}[|X_i|^3] < \infty. Let S_n = \sum_{i=1}^n X_i denote the sum, and let F_n(x) be the cumulative distribution function (CDF) of the normalized sum S_n / (\sigma \sqrt{n}). Let \Phi(x) be the CDF of the standard normal distribution N(0,1). The theorem states that

\sup_{x \in \mathbb{R}} |F_n(x) - \Phi(x)| \leq C \frac{\rho}{\sigma^3 \sqrt{n}},

where C is an absolute constant independent of n and the distribution of the X_i.^[7]^[9] This bound holds because the i.i.d. assumption with finite variance automatically satisfies the Lindeberg condition, which is a key requirement for the central limit theorem in more general settings.^[9] The constant C has been refined over time; for instance, early work established C < 7.59, while later improvements yield C \leq 0.7655.^[10] More recent estimates for the i.i.d. case achieve C \leq 0.469, with a lower bound of C \geq 0.4097.^[11]

Independent Non-Identically Distributed Case

The independent non-identically distributed case extends the Berry–Esseen theorem to sums of independent random variables X_1, \dots, X_n that may have differing distributions, under moment conditions that aggregate across all variables. Specifically, assume the X_i are independent with E[X_i] = 0, \mathrm{Var}(X_i) = \sigma_i^2 > 0, and finite third absolute moments E[|X_i|^3] = \rho_i < \infty for each i = 1, \dots, n. Let S_n = \sum_{i=1}^n X_i and \sigma_n^2 = \sum_{i=1}^n \sigma_i^2, with \sigma_n > 0. These assumptions ensure the variables are centered and have controlled tails, without requiring identical distributions or uniform bounds on individual moments beyond finiteness. The theorem provides a uniform bound on the Kolmogorov distance between the distribution function of the normalized sum and the standard normal distribution:

\sup_{x \in \mathbb{R}} \left| P\left( \frac{S_n}{\sigma_n} \leq x \right) - \Phi(x) \right| \leq C \frac{\sum_{i=1}^n \rho_i}{\sigma_n^3},

where \Phi(x) is the cumulative distribution function of the standard normal distribution, and C is a universal absolute constant independent of n and the distributions of the X_i. This inequality holds for any finite n, offering a non-asymptotic quantitative measure of approximation error. The original proof relies on Fourier analysis of characteristic functions, establishing the bound with C \approx 7.59, though subsequent refinements have sharpened it. The best known value is C \leq 0.5583, with a lower bound of C \geq 0.4097.^[12]^[11] A key feature of this bound is that it does not require the third-moment ratio \sum \rho_i / \sigma_n^3 \to 0 as n \to \infty or \sigma_n \to \infty; instead, it quantifies the error explicitly in terms of this ratio. However, for the central limit theorem to hold—meaning the distribution of S_n / \sigma_n converges to \Phi—the Lindeberg condition must be satisfied, and the Berry–Esseen condition \sum \rho_i / \sigma_n^3 \to 0 ensures the convergence rate is at most O\left( \sum \rho_i / \sigma_n^3 \right). This condition strengthens the Lyapunov condition for third-order moments and guarantees the error vanishes asymptotically. In the special case of identically distributed variables, the bound simplifies to the classical i.i.d. form, with \sum \rho_i / \sigma_n^3 = \rho / (\sigma^3 \sqrt{n}).^[12]

Random Index Case

The random index case extends the Berry–Esseen theorem to sums S_N = \sum_{i=1}^N X_i, where the upper limit N is a non-negative integer-valued random variable independent of the i.i.d. sequence \{X_i\}_{i=1}^\infty. This setting arises in contexts such as renewal processes, where N = N(t) counts the number of events up to time t, or more generally in stopped sums and sequential statistical procedures with random sample sizes. The key assumptions are that the X_i are independent and identically distributed with mean zero, variance \sigma^2 > 0, and finite third absolute moment \mathbb{E}[|X_1|^3] < \infty; N is independent of the X_i, takes values in the non-negative integers, and satisfies \mathbb{E}[N] < \infty along with suitable moment conditions on N (such as \mathbb{E}[N^{3/2}] < \infty or bounded variability relative to \mathbb{E}[N]) to ensure the approximation holds.^[13] Under these conditions, the theorem provides a uniform bound on the deviation between the cumulative distribution function of the randomly normalized sum S_N / (\sigma \sqrt{N}) and the standard normal distribution function \Phi(x). Specifically,

\sup_{x \in \mathbb{R}} \left| \mathbb{P}\left( \frac{S_N}{\sigma \sqrt{N}} \leq x \right) - \Phi(x) \right| \leq C \frac{\mathbb{E}[|X_1|^3]}{\sigma^3 \sqrt{\mathbb{E}[N]}},

where C > 0 is a universal constant (e.g., values around 0.5 to 3.87 have been established in various refinements). This inequality incorporates the expected value \mathbb{E}[N] in the denominator to reflect the scale of the randomness in the index, while additional terms accounting for the variability of N (such as \mathbb{E}|N - \mathbb{E}[N]| / \mathbb{E}[N]) may appear in more general bounds to control the difference from the fixed-index case. For renewal processes, where the interarrival times X_i are positive with mean \mu > 0 and the index N(t) satisfies S_{N(t)} \leq t < S_{N(t)+1}, a closely related bound applies to the distribution of N(t), with the rate O(1/\sqrt{t}) or equivalently O(1/\sqrt{\mathbb{E}[N]}). This formulation is particularly useful in statistical inference with random sample sizes, such as sequential testing or estimation in processes where the number of observations is determined adaptively, providing quantitative control on the normal approximation even when the effective sample size fluctuates. The presence of \sqrt{\mathbb{E}[N]} in the bound ensures the error decreases as the expected number of terms grows, mirroring the fixed-index case but adjusted for the randomness in N.^[13]

Multidimensional Case

The multidimensional case of the Berry–Esseen theorem quantifies the rate at which the distribution of the normalized sum of independent identically distributed random vectors in \mathbb{R}^d converges to a multivariate normal distribution. This extension is necessary because standard univariate metrics, such as the Kolmogorov distance, do not directly generalize to higher dimensions without adjustment; instead, metrics like the supremum over probabilities of convex sets or half-spaces (defined by hyperplanes) are used to capture the error uniformly across the space.^[14] The theorem applies under the assumptions that X_1, \dots, X_n are i.i.d. random vectors in \mathbb{R}^d with mean \mathbb{E}[X_i] = 0, positive definite covariance matrix \Sigma, and finite third absolute moment \mathbb{E}[\|X_i\|^3] < \infty, where \|\cdot\| is the Euclidean norm. Let S_n = \sum_{i=1}^n X_i, and denote by F_n the distribution of the normalized sum S_n / \sqrt{n} and by \Phi_\Sigma the distribution of \mathcal{N}(0, \Sigma). The quantity \beta = \mathbb{E}[\|X_1\|^3] measures the third-moment contribution to the error.^[14] One standard formulation bounds the error in the convex distance metric,

\delta(F_n, \Phi_\Sigma) = \sup_{\substack{A \subseteq \mathbb{R}^d \\ A \text{ convex}}} |F_n(A) - \Phi_\Sigma(A)| \le C_d \frac{\beta}{\sqrt{n}},

where C_d is a dimension-dependent constant; an explicit version gives C_d = 42 d^{1/4} + 16 when \Sigma = I_d. Since half-spaces are convex sets, the bound also applies to the half-space distance \delta_H(F_n, \Phi_\Sigma) = \sup |F_n(H) - \Phi_\Sigma(H)| over all half-spaces H = \{ y \in \mathbb{R}^d : \langle y, u \rangle \le t \} with \|u\| = 1 and t \in \mathbb{R}, yielding a similar form \delta_H(F_n, \Phi_\Sigma) \le C_d' \beta / \sqrt{n} with C_d' also growing with d. Equivalent bounds hold in the Lévy metric, which metrizes weak convergence and involves ε-neighborhoods of sets, with the error controlled by a dimension-dependent multiple of \beta / \sqrt{n}.^[14]^[15] The dependence of C_d on the dimension d highlights a key challenge: as d increases, the constant worsens (e.g., at least linearly in some early bounds but improved to d^{1/4} via Lyapunov-type arguments), making the approximation less sharp in high dimensions due to the complexity of controlling the distribution over more directions and sets. This dimension dependence arises from the need to integrate characteristic function estimates or use Stein-type methods that account for the geometry of \mathbb{R}^d. When d=1, the result specializes to the classical univariate Berry–Esseen theorem with a dimension-free constant.^[14]^[15]

Proof Techniques

Characteristic Functions Method

The classical proof of the Berry–Esseen theorem in the independent identically distributed (i.i.d.) case employs Fourier analysis via characteristic functions, as originally developed by Berry and subsequently refined by Esseen.^[16] This method quantifies the rate of convergence in the central limit theorem by bounding the supremum difference between the cumulative distribution function (CDF) F_n(x) of the standardized sum S_n = n^{-1/2} \sum_{i=1}^n (X_i - \mu) and the standard normal CDF \Phi(x), assuming \mathbb{E}[X_1] = \mu, \mathrm{Var}(X_1) = 1, and finite third absolute moment \rho = \mathbb{E}[|X_1 - \mu|^3] < \infty. The core approach begins with the Lévy inversion formula, which expresses the difference in CDFs in terms of their characteristic functions \phi_n(t) and \phi(t) = e^{-t^2/2}:

F_n(x) - \Phi(x) = \frac{1}{2\pi} \int_{-\infty}^{\infty} \frac{e^{-itx} (\phi_n(t) - \phi(t))}{it} \, dt,

where \phi_n(t) = [\phi(t / \sqrt{n})]^n. This integral representation allows the error to be decomposed and bounded using properties of Fourier transforms.^[17] To evaluate the integral, truncation is applied by splitting it into regions where |t| \leq T and |t| > T for some T > 0 chosen to balance the contributions. For the low-frequency part (|t| \leq T), the integrand is controlled using approximations to the characteristic functions. Specifically, for small t, Taylor expansion of \log \phi(u) around zero yields the key inequality |\phi_n(t) - \phi(t)| \leq C \rho t^3 / \sqrt{n} for some absolute constant C, leveraging the third-moment condition to ensure the remainder term decays appropriately. This approximation exploits the fact that \log \phi(u) = -u^2/2 + O(u^3) as u \to 0, leading to \log \phi_n(t) = n \log \phi(t / \sqrt{n}) = -t^2/2 + O(\rho t^3 / \sqrt{n}), and thus the difference in characteristic functions is of order O(t^3 / \sqrt{n}) after exponentiation.^[17] For the high-frequency tail (|t| > T), the integral is bounded using the decay of the characteristic functions. Under the third-moment assumption, |\phi(u)| \leq 1 - c u^2 for small u, but for larger |t|, the tail is controlled by estimating |\phi_n(t)| \leq e^{-c t^2 / 2} or similar Gaussian-like decay, combined with the denominator |t| to yield a negligible bound after integration. The choice of T \sim n^{1/6} optimizes the overall error to O(\rho / \sqrt{n}).^[17] The singularity at t = 0 in the integrand and the potential discontinuities in the CDFs are addressed through smoothing techniques. A common refinement, due to Esseen, involves convolving the indicator function with a smooth kernel whose characteristic function is supported on [-T, T], such as a triangular kernel with Fourier transform (1 - |t|/T)^+, to replace the sharp cutoff and bound the smoothed error by the original plus a term of order $1/T. This smoothing inequality facilitates the truncation and ensures the proof yields a uniform bound.^[17]

Modern Approaches

Modern approaches to proving Berry–Esseen-type bounds have shifted from the classical characteristic function methods toward probabilistic techniques that offer simpler derivations, explicit constants, and greater flexibility for dependent variables.^[18] These methods, developed primarily in the late 20th and early 21st centuries, leverage tools like Stein's method to characterize normal approximations through differential equations and couplings, enabling bounds that depend on moments such as the third absolute moment of the summands.^[18] Stein's method provides a framework for normal approximation by solving the Stein equation f'(x) - x f(x) = h(x) - \mathbb{E}[h(Z)], where Z \sim \mathcal{N}(0,1) and h is a test function with bounded derivatives.^[18] The error in approximation is then bounded using the structure of the target distribution W, typically | \mathbb{E}[A f(W)] |, where A is the Stein operator. For sums of independent random variables with zero mean and unit variance, this yields a Berry–Esseen bound of order n^{-1/2} involving the sum of third moments, d_W(W, Z) \leq C \frac{1}{\sigma^3} \sum \mathbb{E}[|X_i|^3], with C an explicit universal constant around 0.56.^[18] The method's strength lies in its ability to handle non-smooth test functions and extensions to metrics like Kolmogorov distance via smoothing.^[18] The zero-bias transformation complements Stein's method by constructing an auxiliary random variable W^z such that \mathbb{E}[W f(W)] = \sigma^2 \mathbb{E}[f'(W^z)] for smooth f, effectively "biasing" the distribution toward the normal.^[19] This allows coupling (W, W^z) on a common probability space, where the approximation error is controlled by \mathbb{E}[|W^z - W|], leading to bounds like d_W(W, Z) \leq 2 \mathbb{E}[|W^z - W|].^[18] For independent summands with finite third moments, explicit constructions yield rates of O(1/n) for smooth functionals, improving on classical $1/\sqrt{n} under additional symmetry or moment conditions.^[19] This transform is particularly advantageous for deriving sharper constants without Fourier analysis, as demonstrated in applications to sampling without replacement.^[19] Dependency graphs extend these techniques to weakly dependent settings by modeling local interactions via a graph where vertices represent summands and edges denote dependence neighborhoods of size D.^[18] In Stein's framework, the approximation error for W = \sum X_i / \sigma satisfies d_W(W, Z) \leq C D^2 \frac{\sum \mathbb{E}[|X_i|^3]}{\sigma^3} + C' \sqrt{D / \sigma^2} \sqrt{\sum \mathbb{E}[X_i^4]}, capturing how sparsity in the graph (small D) mitigates global dependence.^[18] For locally dependent data, such as sample quantiles from dependent sequences, this approach converts the problem to sums of indicators and applies Stein's local dependence results, achieving multivariate Berry–Esseen bounds of order (\log n / \sqrt{n}) D.^[20] Overall, these methods simplify proofs by avoiding analytic inversion, facilitate extensions to non-i.i.d. and dependent cases, and often yield more tractable constants for practical use.^[18]

Bounds and Constants

Uniform Bounds

The uniform bound in the Berry–Esseen theorem provides a quantitative measure of the rate of convergence in the central limit theorem, expressed in the supremum norm over the real line. For independent random variables X_1, \dots, X_n with zero means, finite third absolute moments \rho_i = \mathbb{E}[|X_i|^3], and common variance \sigma_n^2 = \sum_{i=1}^n \mathrm{Var}(X_i), the theorem states that

\sup_{x \in \mathbb{R}} \left| F_n(x) - \Phi(x) \right| \leq C \frac{\sum_{i=1}^n \rho_i}{\sigma_n^3},

where F_n is the cumulative distribution function of the standardized sum S_n / \sigma_n, \Phi is the standard normal cumulative distribution function, and C is a universal constant independent of the distribution of the X_i (provided the moments exist). This bound holds across the independent identically distributed, non-identically distributed, and random index cases, with the constant C representing the key factor determining the bound's sharpness.^[1] The historical pursuit of optimal values for C has seen significant refinements since the theorem's inception. Andrew C. Berry's 1941 work implicitly established a bound with C \approx 1.88 for lattice distributions, while Carl-Gustav Esseen's 1942 generalization to non-lattice cases yielded C = 7.59. Subsequent improvements reduced this value markedly; for instance, P. van Beek obtained C < 0.7975 in 1972, and further advancements by I. S. Shiganov in 1986 gave C < 0.7655. The modern benchmark for the general case of non-identically distributed variables is C < 0.5583, achieved by Irina G. Shevtsova in 2013 through refined estimates involving characteristic functions. For identically distributed variables, sharper constants are available, such as C < 0.4690 by Shevtsova (2013).^[11]^[21] Regarding optimality, Esseen established in 1956 a lower bound of C \geq \frac{\sqrt{10} + 3}{6 \sqrt{2\pi}} \approx 0.4097, demonstrating that no smaller universal constant suffices for all distributions. This bound arises from constructing specific examples where the supremum deviation approaches this value asymptotically. While the exact value of C remains unknown, it is conjectured that the optimal constant equals this Esseen lower bound, as upper bounds continue to narrow toward it without violating the inequality.^[22] In the multidimensional setting, the uniform bound extends to the supremum over \mathbb{R}^d, but the constant C_d now depends on the dimension d and grows with it, reflecting increased complexity in higher dimensions. For example, Victor Bentkus showed in 2003 that C_d \lesssim d^{1/4}, though subsequent works have improved the dependence to sublinear rates under additional moment conditions. This dimension-dependent growth underscores the challenges in multivariate approximations compared to the one-dimensional case.^[23]

Non-Uniform Bounds

Non-uniform bounds in the Berry–Esseen theorem offer pointwise estimates for the difference between the cumulative distribution function F_n(x) of the standardized sum and the standard normal cumulative distribution function \Phi(x), where the error term decreases as |x| increases, providing tighter control in the tails compared to uniform bounds. These bounds are particularly valuable for analyzing moderate and large deviation probabilities, where the uniform supremum bound can be overly conservative, as the error diminishes polynomially with |x| while incorporating moment conditions on the underlying random variables. A classical non-uniform bound, derived under finite third-moment assumptions, takes the form

|F_n(x) - \Phi(x)| \leq \frac{C \rho_n}{1 + |x|^3},

where \rho_n = \sum_{i=1}^n \mathbb{E}[|X_i - \mu_i|^3] / \sigma_n^3, \sigma_n^2 = \sum_{i=1}^n \mathrm{Var}(X_i), and C > 0 is an absolute constant. This inequality integrates the third-moment Lyapunov coefficient \rho_n with a tail factor $1 + |x|^3 in the denominator, ensuring the approximation improves for large |x|. The result originates from Nagaev's work on limit theorems for sums of independent random variables, establishing that the supremum over x of (1 + |x|^3) |F_n(x) - \Phi(x)| is bounded by a constant times \rho_n. Such bounds are sharper than uniform versions for |x| \gtrsim n^{1/6}, making them suitable for scenarios involving tail events in statistical inference.^[24] Petrov extended these ideas to a more general non-uniform framework applicable when a uniform bound \Delta = \sup_x |F(x) - \Phi(x)| is available and small (specifically, $0 < \Delta \leq e^{-1/2}), assuming finite p-th absolute moments for some p > 0. The bound states

|F(x) - \Phi(x)| \leq c(p) \Delta \left( \log \frac{1}{\Delta} \right)^{p/2} + \frac{\lambda_p}{1 + |x|^p},

where c(p) is a positive constant depending only on p, and \lambda_p = \left| \int_{-\infty}^{\infty} |y|^p \, dF(y) - \int_{-\infty}^{\infty} |y|^p \, d\Phi(y) \right| measures the difference in p-th moments between F and \Phi. For p=1, the denominator simplifies to $1 + |x|, yielding a linear decay in the tails suitable for distributions with lighter moment conditions. This form leverages the uniform error \Delta to control the central region while the moment-difference term dominates in the tails, offering flexibility for non-identically distributed variables.^[25] These non-uniform estimates are especially effective in moderate deviation regimes, where |x| grows like \sqrt{\log n} but remains o(n^{1/6}), as the tail factor reduces the error below the O(1/\sqrt{n}) rate of uniform bounds without requiring higher moments.^[26]

Applications and Extensions

Statistical Inference

The Berry–Esseen theorem provides quantitative guarantees for the normal approximation in statistical inference, enabling the assessment of error rates in procedures that rely on the central limit theorem (CLT) for finite samples. By bounding the supremum distance between the cumulative distribution function (CDF) of a standardized sum and the standard normal CDF by C \rho / \sqrt{n}, where \rho = E[|X|^3] / \sigma^3 and C \approx 0.5 is a universal constant, it justifies the use of asymptotic methods even for moderate sample sizes when the third moment is finite. In constructing confidence intervals, the theorem is particularly valuable for small to moderate n, such as in one-sample t-tests for the mean, where the normal approximation to the t-statistic's distribution can be validated. For example, the bound ensures that the probability of coverage deviation from the nominal level (e.g., 95%) is controlled by O(1/\sqrt{n}), allowing statisticians to evaluate when the approximation is sufficiently accurate without exact distribution knowledge. This non-asymptotic control is essential for applications in fields like economics and medicine, where sample sizes may not be large enough for the plain CLT to suffice reliably. For hypothesis testing, the Berry–Esseen bound quantifies the approximation error in test statistics under the null, such as z-tests for means, by limiting the difference in rejection probabilities compared to exact normal tests. Improved variants, like Berry–Esseen–Chebyshev bounds, extend this to signed-rank tests and other non-parametric procedures, providing computable error terms for p-value accuracy and power assessment in finite samples.^[27] The theorem also enhances bootstrap methods by quantifying the error between the empirical resampling distribution and the true sampling distribution, crucial for bootstrap confidence intervals and percentile methods. In bootstrap procedures for sums of independent variables, higher-order Berry–Esseen inequalities yield uniform bounds on the CDF difference when higher moments are finite; this ensures bootstrap consistency and error rates for quantile estimation in high dimensions when the effective dimension p = o(n). As an example of higher-order refinement, the Berry–Esseen bound serves as the remainder term in Edgeworth expansions, improving accuracy beyond the CLT by incorporating skewness via corrections like P(\hat{S}_N \leq t) \approx \Phi(t) + \frac{\gamma_3}{6\sqrt{n}} (1 - t^2) \phi(t), where \gamma_3 is the standardized third cumulant and the bound controls the O(1/\sqrt{n}) error for the first-order expansion under irreducibility conditions. This is useful in inference for skewed distributions to derive more precise intervals or tests. A key limitation is the requirement for finite third absolute moments, E[|X|^3] < \infty, which fails for heavy-tailed distributions (e.g., Pareto with shape < 3), rendering the bound inapplicable and leading to slower CLT convergence. In such cases, alternatives include the Lyapunov CLT with adjustable moment orders or approximations via stable distributions.

Dependent Variables and Improvements

The Berry–Esseen theorem extends to dependent random variables under mixing conditions that quantify the decay of dependence, such as α-mixing (strong mixing) and β-mixing (absolute regularity). For stationary α-mixing sequences with finite third moments and a mixing rate α(n) = O(n^{-θ}) for some θ > 1, bounds of order O(1/√n) on the Kolmogorov distance to the normal distribution have been established, matching the independent case rate up to logarithmic factors.^[28] Similarly, for β-mixing sequences, results yield uniform bounds of O(1/√n) under comparable moment and mixing decay assumptions, often via Stein's method or characteristic function approximations tailored to the dependence structure.^[29] Recent advancements in the 2020s have refined constants in these bounds for dependent settings. Emmanuel Rio and collaborators improved Berry–Esseen rates for ρ-mixing sequences (a coupling-based dependence measure stronger than α-mixing) by deriving explicit constants that depend optimally on the mixing coefficients, achieving sharper O(1/√n) bounds without excessive logarithmic penalties.^[30] These improvements have found applications in machine learning, particularly for high-dimensional dependent data in graphical models, where Berry–Esseen bounds facilitate Gaussian approximations for precision matrix estimation under cluster-based dependence, enabling reliable inference in sparse high-dimensional regimes.^[31] Further progress includes improved bounds for strong mixing sequences in 2023, estimates for random variables with sparse dependency graphs in 2024, and non-uniform theorems for weakly dependent variables in 2025.^[32]^[33]^[34] Extensions to non-stationary settings address triangular arrays of dependent variables, such as martingale differences with varying distributions. For non-stationary ρ-mixing triangular arrays, Dedecker, Merlevède, and Rio established Berry–Esseen bounds of order O(1/√n) under uniform moment conditions and controlled mixing rates across rows, accommodating evolving dependence structures common in time series forecasting.^[35] Despite these progresses, open problems persist, notably in attaining optimal constants and rates for strong mixing sequences beyond one dimension, where multidimensional dependence amplifies challenges in bounding higher-order couplings and remains an active area of research.^[36]