Lindeberg's condition

Lindeberg's condition is a fundamental criterion in probability theory that guarantees the applicability of the central limit theorem to sums of independent but not necessarily identically distributed random variables with finite variances.^[1] Formally, for a triangular array of row-wise independent, zero-mean random variables X_{n,k}, k = 1, \dots, r_n, with variances \sigma_{n,k}^2 = \mathbb{E}[X_{n,k}^2] and row sum of variances s_n^2 = \sum_{k=1}^{r_n} \sigma_{n,k}^2 \to \infty, the condition requires that for every \varepsilon > 0,

\lim_{n \to \infty} \frac{1}{s_n^2} \sum_{k=1}^{r_n} \mathbb{E}\left[ X_{n,k}^2 \mathbf{1}_{\{|X_{n,k}| > \varepsilon s_n\}} \right] = 0.

This ensures that no single term dominates the sum, allowing the normalized sum S_n / s_n—where S_n = \sum_{k=1}^{r_n} X_{n,k}—to converge in distribution to a standard normal random variable.^[2] Introduced by Finnish mathematician Jarl Waldemar Lindeberg in his 1922 paper, the condition generalized earlier results by Lyapunov and marked a significant advancement in understanding asymptotic normality under heterogeneous distributions.^[2] The condition's importance was further solidified by William Feller in 1937, who demonstrated its necessity (along with the uniform asymptotic negligibility condition) for the central limit theorem in the context of independent random variables, forming what is now known as the Lindeberg–Feller theorem.^[3] This theorem underpins much of modern statistical inference, particularly in scenarios involving heterogeneous data, such as in econometrics, finance, and machine learning, where variables exhibit varying scales and tails. Extensions of Lindeberg's condition have since been developed for dependent processes, infinite variances, and non-Euclidean spaces, broadening its utility in advanced probabilistic modeling.^[4]

Background Concepts

Central Limit Theorem for Identical Distributions

The classical central limit theorem (CLT) addresses the asymptotic distribution of sums of independent and identically distributed (i.i.d.) random variables, providing a foundational result in probability theory. Specifically, consider i.i.d. random variables X_1, X_2, \dots, X_n each with finite mean \mu and positive finite variance \sigma^2 > 0. The standardized sum \frac{1}{\sqrt{n} \sigma} \sum_{i=1}^n (X_i - \mu) converges in distribution to a standard normal random variable N(0,1) as n \to \infty.^[5] This convergence implies that, for large n, the distribution of the sample mean is approximately normal, regardless of the underlying distribution of the X_i, as long as the second moments are finite.^[5] The origins of this theorem trace back to Pierre-Simon Laplace's work around 1810, where he approximated the distribution of sums of i.i.d. variables, particularly in the context of the binomial distribution via the de Moivre–Laplace theorem.^[6] The result was rigorously established for general i.i.d. variables by Aleksandr Lyapunov in 1901, who proved convergence under the assumption of finite third moments using characteristic functions.^[7] Jarl Waldemar Lindeberg contributed in 1920 by refining the conditions to finite second moments for the i.i.d. case, demonstrating that the third moment is not necessary.^[7] Formally known as the Lindeberg–Lévy theorem in its i.i.d. form, the result states that if S_n = \sum_{i=1}^n X_i and S_n^* = \frac{S_n - n\mu}{\sqrt{n} \sigma}, then S_n^* \xrightarrow{d} N(0,1) as n \to \infty, provided the variables are i.i.d. with finite variance.^[5] This theorem highlights the universality of the normal distribution for sums under identical distributions but finite second moments; however, when the variables are independent but not identically distributed, additional conditions—such as those involving triangular arrays—are required to ensure similar asymptotic normality, motivating generalizations like Lindeberg's condition.^[4]

Triangular Arrays of Random Variables

In probability theory, particularly in the study of limit theorems for sums of random variables, a triangular array refers to a double-indexed sequence of random variables \{X_{n,k}: 1 \leq k \leq k_n, n \geq 1\}, where the nth row consists of k_n independent random variables X_{n,1}, \dots, X_{n,k_n}, and k_n is typically non-decreasing in n. The primary objects of interest are the row sums S_n = \sum_{k=1}^{k_n} X_{n,k}, whose asymptotic behavior is examined as n \to \infty. This framework accommodates scenarios where the number of summands grows and the distributions may vary across rows, providing a flexible structure for analyzing non-stationary sequences.^[1] Normalization in this setup involves centering each random variable by its expectation, denoted E[X_{n,k}] = \mu_{n,k}, yielding the centered sum S_n - E[S_n] = \sum_{k=1}^{k_n} (X_{n,k} - \mu_{n,k}). The scaling factor is the square root of the total row variance, s_n^2 = \sum_{k=1}^{k_n} \mathrm{Var}(X_{n,k}), which must satisfy s_n^2 \to \infty as n \to \infty to ensure the sums exhibit non-degenerate limiting behavior. The standardized row sum is then given by

S_n^* = \frac{S_n - E[S_n]}{s_n}.

This standardization facilitates the study of convergence properties by reducing the sums to a common scale.^[8]^[1] Key assumptions for applying central limit theorems within triangular arrays include row-wise independence of the X_{n,k}, finite second moments for each variable (ensuring \mathrm{Var}(X_{n,k}) < \infty), and the divergence of the total variance s_n^2 \to \infty. These conditions establish a foundational probabilistic structure for sums of independent but potentially non-identically distributed variables. As a special case, the classical central limit theorem for identically distributed variables arises when all rows replicate the same distribution.^[8] To illustrate array construction for non-i.i.d. cases, consider building rows where variances differ systematically; for example, early rows might feature a few terms with moderate variances, while later rows include many terms with shrinking individual variances (e.g., on the order of $1/n) to make s_n^2 grow with n (e.g., on the order of n), allowing the array to model scenarios like weighted averages or heterogeneous noise processes while preserving independence and finite moments.^[1]

Formal Definition

Setup and Assumptions

Lindeberg's condition is defined in the context of a triangular array of random variables \{X_{n,k}\}_{1 \leq k \leq k_n,\, n \geq 1}, where k_n is a sequence of positive integers that may increase with n. For each fixed n, the random variables X_{n,1}, \dots, X_{n,k_n} in the nth row are independent, each with zero mean \mathbb{E}[X_{n,k}] = 0 (which holds without loss of generality by centering the original variables if necessary), and finite variance \sigma_{n,k}^2 = \mathrm{Var}(X_{n,k}) < \infty.^[9] The total variance of the nth row sum is given by s_n^2 = \sum_{k=1}^{k_n} \sigma_{n,k}^2, and a key assumption is that s_n^2 \to \infty as n \to \infty, ensuring the scale of the sums grows sufficiently large for asymptotic normality to emerge.^[10] The object of interest is the normalized row sum S_n^* = s_n^{-1} \sum_{k=1}^{k_n} X_{n,k}, which standardizes the sum to have unit variance. No uniform boundedness on the magnitudes of the X_{n,k} is imposed across the array, allowing for potentially heavy-tailed distributions within rows as long as second moments remain finite for each term; moreover, k_n is permitted to grow arbitrarily with n, accommodating scenarios where the number of summands increases indefinitely.^[9] This flexible setup contrasts with stricter i.i.d. frameworks by enabling heterogeneous variances and distributions row-wise.^[10] The foundational formulation of this probabilistic structure originates from Jarl Waldemar Lindeberg's 1922 paper, which extended Aleksandr Lyapunov's 1901 central limit theorem for identically distributed random variables by relaxing the identical distribution requirement and introducing conditions suitable for non-identical summands.^[11]

The Lindeberg Condition

Lindeberg's condition applies to triangular arrays of independent random variables X_{n,k}, $1 \leq k \leq k_n, n \geq 1, each with mean zero and finite variance, where the row sums S_n = \sum_{k=1}^{k_n} X_{n,k} have variance s_n^2 = \sum_{k=1}^{k_n} \mathrm{Var}(X_{n,k}) satisfying s_n^2 \to \infty as n \to \infty.^[12] The condition is formally stated as follows: for every \epsilon > 0,

\lim_{n \to \infty} \sum_{k=1}^{k_n} \mathbb{E}\left[ \frac{X_{n,k}^2}{s_n^2} \mathbf{1}_{\{|X_{n,k}| > \epsilon s_n\}} \right] = 0,

where \mathbf{1}_A denotes the indicator function of the event A.^[12] In interpretive terms, this requires that the expected contribution to the normalized variance from large deviations—specifically, those where an individual |X_{n,k}| exceeds \epsilon times the total scale s_n—vanishes in the limit as n \to \infty.^[13] This condition was originally introduced by Jarl Waldemar Lindeberg in his 1922 paper.^[14] Equivalent formulations of the condition include one in terms of truncated variances: for every \epsilon > 0,

\lim_{n \to \infty} \sum_{k=1}^{k_n} \mathbb{E}\left[ \frac{X_{n,k}^2}{s_n^2} \mathbf{1}_{\{|X_{n,k}| \leq \epsilon s_n\}} \right] = 1,

which follows since the total normalized variance sums to 1 and the tail contributions approach zero.^[12] Another perspective views the array as being of infinitesimal order, ensuring that the influence of any single term becomes negligible relative to the aggregate.^[13] A key property of the condition is that it implies uniform asymptotic negligibility of the individual terms, meaning \max_{1 \leq k \leq k_n} P(|X_{n,k}| > \epsilon s_n) \to 0 as n \to \infty for every \epsilon > 0.^[8]

Key Theorems

Lindeberg's Central Limit Theorem

Lindeberg's Central Limit Theorem asserts that, for a triangular array of row-wise independent random variables \{X_{n,k}: 1 \leq k \leq k_n\} satisfying the standard setup assumptions (including centered variables with \sum_{k=1}^{k_n} \mathrm{Var}(X_{n,k}) = 1), if the array additionally satisfies Lindeberg's condition, then the normalized row sums S_n^* = \sum_{k=1}^{k_n} X_{n,k} converge in distribution to the standard normal distribution: S_n^* \xrightarrow{d} N(0,1) as n \to \infty.^[14] This theorem was proved by Jarl Waldemar Lindeberg in his 1922 paper, where he established the sufficiency of the condition for asymptotic normality in the non-identically distributed case.^[14] The result marked a significant generalization of earlier central limit theorems, such as Lyapunov's 1901 version, by relaxing the need for uniform moment bounds beyond the second order. While later extensions addressed dependent variables (e.g., via martingale methods), the original theorem applies specifically to arrays with independent entries within each row.^[14] The theorem's applicability hinges solely on the existence of second moments and the fulfillment of Lindeberg's condition, without requiring higher-order moments or identical distributions across the array. This minimal assumption structure makes it a cornerstone for proving central limit theorems in diverse settings, such as triangular arrays arising in statistical applications. A standard proof relies on characteristic functions, showing that the characteristic function of S_n^* converges pointwise to that of the standard normal. Specifically, \mathbb{E}[e^{it S_n^*}] \to e^{-t^2/2} as n \to \infty, for all t \in \mathbb{R}.^[15] The argument proceeds by expressing the logarithm of the characteristic function and approximating it via Taylor expansion around zero; the Lindeberg condition controls the truncation error and remainder terms, ensuring the approximation \log \mathbb{E}[e^{it S_n^*}] \approx -\frac{t^2}{2} holds uniformly.^[15]

Feller's Converse Theorem

Feller's converse theorem establishes the necessity of Lindeberg's condition for the central limit theorem to hold under certain uniformity assumptions. Specifically, consider a triangular array of independent random variables \{X_{n,k}: 1 \leq k \leq k_n, n \geq 1\} with zero means and finite variances \sigma_{n,k}^2, where the row sums are normalized as S_n = \sum_{k=1}^{k_n} X_{n,k} and s_n^2 = \sum_{k=1}^{k_n} \sigma_{n,k}^2 \to \infty, so that S_n^* = S_n / s_n. If S_n^* \xrightarrow{d} N(0,1) and the array satisfies uniform asymptotic negligibility, given by \max_{1 \leq k \leq k_n} \frac{\sigma_{n,k}^2}{s_n^2} \to 0, then the Lindeberg condition must hold.^[3]^[9] This uniformity condition, often called the Feller condition, prevents any single term's variance from dominating the total variance s_n^2, ensuring a balanced contribution across the array elements; it is weaker than requiring identical distributions but essential for establishing necessity in the converse.^[8] The proof proceeds by contradiction, leveraging the tightness of the characteristic functions of the partial sums and showing that failure of the Lindeberg condition implies large deviations in the tails that disrupt convergence to the normal distribution.^[9] Proved by William Feller in 1937 in his paper "Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung II," this result completes the Lindeberg-Feller characterization, demonstrating that under the uniformity assumption, Lindeberg's condition is both necessary and sufficient for asymptotic normality of S_n^*.^[3] Together with Lindeberg's sufficiency theorem, it provides an if-and-only-if criterion for the central limit theorem in the setting of triangular arrays of independent random variables with finite variances.

Interpretation and Examples

Intuitive Explanation

Lindeberg's condition serves as a pivotal assumption in extending the central limit theorem to sums of independent but non-identically distributed random variables, intuitively preventing any individual term from exerting a dominant influence on the overall distribution. By ensuring that "outliers" or variables with potentially heavy tails do not overwhelm the sum, the condition guarantees that the normalized sum behaves asymptotically like a normal distribution, arising from the collective effect of numerous small, balanced contributions rather than erratic large swings. This negligibility of extremes mimics the stabilizing role of averaging in simpler scenarios, where the law of large numbers smooths out irregularities.^[13]^[9] In contrast to the identical distribution case, where each variable shares equally in the variance, non-i.i.d. settings allow varying variances across terms; Lindeberg's condition replicates this equal-share intuition by demanding that no single variance dominates the total, thus preserving the path to normality even when distributions differ. It enforces asymptotic negligibility, such that each normalized term X_{n,k}/s_n \to_p 0 in probability, while strengthening this through tail control: large deviations must be sufficiently improbable so their variance contribution vanishes relative to the aggregate as the array grows. This tail-focused mechanism ensures that rare extreme events fade in impact, allowing the sum's distribution to converge reliably to Gaussian without distortion from heaviness in individual tails.^[10]^[9] Relative to the Lyapunov condition, which imposes stricter uniform bounds on higher moments like the third to verify negligibility, Lindeberg's formulation is more permissive and general, avoiding such moment assumptions in favor of directly assessing the insignificance of large values across the entire array. Both conditions promote the same core principle of distributed contributions, but Lindeberg's broader applicability suits heterogeneous data where moments may not be uniformly controlled, facilitating normal approximations in diverse probabilistic models. The efficacy stems from this tail dilution, which implicitly supports quantitative convergence rates akin to Berry-Esseen bounds by minimizing the distortion from non-normal components.^[10]^[13]

Illustrative Example

To illustrate the practical verification of Lindeberg's condition, consider a triangular array of independent mean-zero random variables designed to test the role of tails in convergence. In one case, the array consists of terms with light tails to demonstrate when the condition holds, while in another, a term with heavier tails shows when it fails. These examples allow for explicit computation of the key sum in the condition using known distribution properties.

Satisfying Case

Consider the triangular array where X_{n,k} = Y_k / \sqrt{n} for k = 1, \dots, n, with the Y_k i.i.d. standard normal N(0,1). The total variance is s_n^2 = \sum_{k=1}^n \mathrm{Var}(X_{n,k}) = 1. The Lindeberg condition requires that for every \varepsilon > 0,

\frac{1}{s_n^2} \sum_{k=1}^n E\left[ X_{n,k}^2 1_{\{|X_{n,k}| > \varepsilon\}} \right] = n E\left[ \left( \frac{Y_1^2}{n} \right) 1_{\{|Y_1| > \varepsilon \sqrt{n}\}} \right] = E\left[ Y_1^2 1_{\{|Y_1| > \varepsilon \sqrt{n}\}} \right] \to 0

as n \to \infty, which follows from the dominated convergence theorem since E[Y_1^2] = 1 < \infty and the indicator converges to 0 almost surely. For a numerical check with \varepsilon = 0.1, the tail expectation for n = 100 (threshold =1) is approximately 0.80 using the formula $2[a \phi(a) + (1 - \Phi(a))] for standard normal, and for n=1000 (threshold ≈3.16), it reduces to approximately 0.02, confirming it vanishes. In this setup, the standardized sum S_n^* = S_n / s_n converges in distribution to N(0,1), approximating a normal distribution well even for moderate n via simulation or moment matching.

Failing Case

Now consider an array where most terms have light tails, but one dominates with heavier tails: X_{n,1} \sim t(3)/\sqrt{3} (Student's t with 3 degrees of freedom, scaled to variance 1), and X_{n,k} \sim N(0, 1/n^2) for k=2,\dots,n, so s_n^2 \approx 1 + (n-1)/n^2 \to 1. The Lindeberg sum for \varepsilon > 0 is dominated by the first term:

\frac{1}{s_n^2} E\left[ X_{n,1}^2 1_{\{|X_{n,1}| > \varepsilon\}} \right] + o(1) \approx \int_{|x| > \varepsilon} x^2 f_{t(3)/\sqrt{3}}(x) \, dx > 0,

a fixed positive value independent of n (computable via the t-density f(z) = \frac{2}{\sqrt{3} \pi} (1 + z^2/3)^{-2} after scaling), so it does not tend to 0. To verify, for \varepsilon=1, the truncated second moment is approximately 0.24 (computed via integration: \int x^2 f(x) dx for |x|>1 using partial fractions for the antiderivative). In this case, S_n^* does not converge to normal; instead, it inherits the heavy tails of the t-distribution, with infinite kurtosis (compared to 3 for the normal), leading to slower decay in the tails and failure of normality approximation; the limiting distribution is that of the scaled t(3) random variable.