In statistics and probability theory, an asymptotic distribution, also known as a limiting distribution, refers to the hypothetical distribution that a sequence of random variables or estimators converges to as the sample size n approaches infinity.[1] This concept provides a large-sample approximation to the exact finite-sample distribution, enabling practical inference when exact distributions are intractable.[2]Asymptotic distribution theory underpins key results such as the law of large numbers (LLN), which establishes that sample averages converge in probability (or almost surely under stronger conditions) to the population mean for independent and identically distributed (i.i.d.) random variables with finite expectation.[3] The central limit theorem (CLT) extends this by showing that, under suitable moment conditions like finite variance, the standardized sample mean \sqrt{[n](/page/N+)}(\bar{X}_n - \mu)/\sigma converges in distribution to a standard normal N(0,1) distribution, justifying the use of normal approximations for confidence intervals and hypothesis tests in large samples.[1] Complementary tools include Slutsky's theorem, which preserves convergence in distribution under continuous transformations involving sequences converging in probability to constants, and the delta method, which derives the asymptotic normality of nonlinear functions of estimators via Taylor expansion.[2]These principles are foundational in econometric and statistical modeling, facilitating the analysis of estimators like ordinary least squares (OLS) under assumptions of consistency and asymptotic normality, even when data are not i.i.d., such as in stratified or clustered sampling.[2] Notable applications span hypothesis testing, where test statistics achieve asymptotic distributions like chi-squared or t-distributions, and bootstrap methods that leverage asymptotic validity for resampling-based inference.[3] While powerful, asymptotic results require verification of conditions like Lindeberg or Lyapunov for CLTs in non-i.i.d. settings to ensure reliability.[2]
Fundamentals
Definition
In statistics, an asymptotic distribution refers to the limiting probability distribution that a sequence of random variables or statistics approaches as the sample size n tends to infinity.[1] This limiting behavior provides a theoretical framework for understanding the approximate distribution of estimators or test statistics in large samples, where finite-sample exact distributions may be intractable.[3]Formally, consider a sequence of random variables X_n, such as sample means or test statistics derived from n observations. The asymptotic distribution is the probability distribution L of a random variable Z such that d\left( \frac{X_n - a_n}{b_n}, L \right) \to 0 as n \to \infty for some metric d on the space of probability measures, where a_n and b_n > 0 are deterministic centering and scaling sequences, respectively.[4] Equivalently, \frac{X_n - a_n}{b_n} \xrightarrow{d} Z, where \xrightarrow{d} denotes convergence in distribution (or weak convergence).[5] This convergence implies that the cumulative distribution function of the normalized X_n converges pointwise to that of Z at all continuity points of the latter.[6]Weak convergence underpins the asymptotic regime, differing from stronger modes like almost sure or convergence in probability, as it focuses solely on distributional limits without requiring pathwise agreement.[7] Large sample sizes ensure that the finite-sample distribution of X_n is well-approximated by this limiting distribution, enabling practical inferences despite deviations in small samples.[1]The concept traces its origins to Pierre-Simon Laplace's early 19th-century investigations into normal approximations for the binomial distribution, detailed in his 1812 treatise Théorie Analytique des Probabilités.[8] By the 20th century, Ronald A. Fisher and Jerzy Neyman extended asymptotic theory to broader statistical applications, including estimation and testing procedures.[9] The standard notation X_n \xrightarrow{d} X signifies that X_n converges in distribution to the random variable X.[6]A canonical illustration is the Central Limit Theorem, which demonstrates the asymptotic normality of standardized sample means from independent, identically distributed variables with finite variance.[3]
Properties
Asymptotic distributions exhibit invariance under continuous transformations, a key property that facilitates their manipulation in probabilistic arguments. Specifically, if a sequence of random variables X_n converges in distribution to a limiting random variable X, denoted X_n \xrightarrow{d} X, and g is a continuous function, then g(X_n) \xrightarrow{d} g(X).[10] This result, known as the continuous mapping theorem, holds because continuity of g ensures that the probability measure induced by g(X_n) converges weakly to that of g(X) in the space of probability measures equipped with a suitable metric, such as the Prohorov metric. A proof sketch relies on the portmanteau theorem for weak convergence: for any closed set F disjoint from the discontinuity set of g, \limsup P(g(X_n) \in F) \leq P(g(X) \in F), and similar bounds for open sets establish the convergence.Asymptotic distributions often preserve or approximate the moments of the limiting distribution under suitable regularity conditions, enabling the transfer of moment-based properties from finite samples to the limit. For instance, if X_n \xrightarrow{d} X and the sequence satisfies uniform integrability, then the means converge: E[X_n] \to E[X], and similarly for variances under second-moment integrability.[1] Cumulants, which generate moments additively, likewise approximate those of the limit when higher-order moments exist, providing a tool for analyzing skewness and kurtosis in approximations. The Lindeberg-Feller conditions, which ensure asymptotic normality for sums of independent random variables with stabilizing variances, exemplify this by guaranteeing that the limiting normal distribution's variance matches the sum of individual variances in the large-sample regime.[11]The rate of convergence to an asymptotic distribution quantifies how quickly the distribution of X_n approaches that of X, often measured via the supremum distance between cumulative distribution functions (CDFs). The Berry-Esseen theorem provides a uniform bound on this rate for sums of independent random variables converging to normality: if the third moments are finite, then \sup_x |F_n(x) - \Phi(x)| = O(1/\sqrt{n}), where F_n is the CDF of the standardized sum and \Phi is the standard normal CDF, with the constant depending on the third-moment ratio.[12] This O(1/\sqrt{n}) rate establishes the scale of approximation error, informing the reliability of asymptotic inferences for finite samples.When an asymptotic distribution exists, it is unique, a consequence of the theory of weak convergence in metric spaces. Weak convergence defines a topology on the space of probability measures, and under metrics like the Prohorov metric (which metrizes weak convergence on separable metric spaces) or the Skorohod metric (for Skorohod space of cadlag functions), the limit measure is uniquely determined by the convergent sequence.[13] This uniqueness ensures that different sequences converging to the same limit share the identical asymptotic distribution, underpinning the consistency of asymptotic results across varied probabilistic settings.
Key Theorems
Central Limit Theorem
The Central Limit Theorem (CLT) asserts that if X_1, X_2, \dots, X_n are independent and identically distributed random variables with mean \mu and finite positive variance \sigma^2, then the standardized sample mean \frac{\sqrt{n} (\bar{X}_n - \mu)}{\sigma} converges in distribution to a standard normal random variable as n \to \infty.[14] This result establishes the asymptotic normality of sums of i.i.d. random variables under mild moment conditions, providing a foundational bridge to normal approximations in probability and statistics.[15]A standard proof of the CLT relies on characteristic functions. The characteristic function of each X_i - \mu is \phi(t) = E[e^{it(X_i - \mu)}], and for the standardized sum, it becomes [\phi(t / \sqrt{n})]^n. Under finite variance, \log \phi(t) = - \sigma^2 t^2 / 2 + o(t^2) near zero, so [\phi(t / \sqrt{n})]^n \to e^{-t^2 / 2} as n \to \infty by the continuity theorem for characteristic functions, confirming convergence to N(0,1).[15] Alternative proofs use moment-generating functions when they exist, yielding similar expansions.[14]The CLT generalizes beyond i.i.d. cases. The Lyapunov CLT applies to independent random variables X_1, \dots, X_n with means \mu_i and variances \sigma_i^2, where the total variance s_n^2 = \sum_{i=1}^n \sigma_i^2 \to \infty; it requires that for some \delta > 0,\lim_{n \to \infty} \frac{1}{s_n^{2+\delta}} \sum_{i=1}^n E[|X_i - \mu_i|^{2+\delta}] = 0.Under this condition, \frac{\sum_{i=1}^n (X_i - \mu_i)}{s_n} \xrightarrow{d} N(0,1).[16] The Lindeberg CLT further extends to triangular arrays \{X_{n,i}: 1 \leq i \leq k_n\}, where row sums S_n = \sum_{i=1}^{k_n} X_{n,i} have variance s_n^2 \to \infty, and the Lindeberg condition holds: for every \epsilon > 0,\lim_{n \to \infty} \frac{1}{s_n^2} \sum_{i=1}^{k_n} E[X_{n,i}^2 \mathbf{1}_{|X_{n,i}| > \epsilon s_n}] = 0.This implies S_n / s_n \xrightarrow{d} N(0,1), enabling applications to non-stationary sequences.[17]Early examples illustrate the CLT's origins. The de Moivre–Laplace theorem of 1733 approximates the binomial distribution B(n,p) by the normal N(np, np(1-p)) for large n and fixed p \in (0,1), marking the first rigorous normal limit for sums of indicators.[18] This extends to Poisson limits, where the binomial converges to a Poisson distribution with parameter \lambda = np as n \to \infty and p \to 0, and for large \lambda, the Poisson itself admits a normal approximation via the CLT.[19]
Delta Method
The delta method is a fundamental tool in asymptotic statistics for obtaining the limiting distribution of a smooth function of an asymptotically normal random variable, building on the normality established by the central limit theorem.[20] Suppose \sqrt{n}(T_n - \theta) \xrightarrow{d} N(0, \sigma^2) for some estimator T_n of a parameter \theta, where \sigma^2 > 0. If g: \mathbb{R} \to \mathbb{R} is continuously differentiable at \theta with g'(\theta) \neq 0, then\sqrt{n} \bigl( g(T_n) - g(\theta) \bigr) \xrightarrow{d} N \bigl( 0, [g'(\theta)]^2 \sigma^2 \bigr).[20]The proof follows from a first-order Taylor expansion of g around \theta:g(T_n) = g(\theta) + g'(\theta) (T_n - \theta) + R_n,where the remainder R_n = o_p(|T_n - \theta|). Since \sqrt{n}(T_n - \theta) = O_p(n^{-1/2}), it holds that \sqrt{n} R_n = o_p(1), so \sqrt{n} (g(T_n) - g(\theta)) has the same asymptotic distribution as \sqrt{n} g'(\theta) (T_n - \theta), which is normal with the stated variance.[20]This result extends to the multivariate case. Let \mathbf{T}_n \in \mathbb{R}^k satisfy \sqrt{n} (\mathbf{T}_n - \boldsymbol{\theta}) \xrightarrow{d} N(\mathbf{0}, \boldsymbol{\Sigma}), where \boldsymbol{\Sigma} is the k \times k asymptotic covariance matrix. For a continuously differentiable function \mathbf{g}: \mathbb{R}^k \to \mathbb{R}^m with Jacobian matrix \mathbf{J}(\boldsymbol{\theta}) = \nabla \mathbf{g}(\boldsymbol{\theta}) of full rank,\sqrt{n} \bigl( \mathbf{g}(\mathbf{T}_n) - \mathbf{g}(\boldsymbol{\theta}) \bigr) \xrightarrow{d} N \bigl( \mathbf{0}, \mathbf{J}(\boldsymbol{\theta}) \boldsymbol{\Sigma} \mathbf{J}(\boldsymbol{\theta})^T \bigr).[20] The proof again uses a multivariate Taylor expansion, with the remainder term vanishing asymptotically under the same order conditions.[20]A classic example arises in deriving the asymptotic distribution of the sample variance for independent and identically distributed observations X_1, \dots, X_n with mean \mu, variance \sigma^2 > 0, and finite fourth moment. The sample mean \bar{X}_n satisfies \sqrt{n} (\bar{X}_n - \mu) \xrightarrow{d} N(0, \sigma^2) by the central limit theorem. The unbiased sample variance S_n^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X}_n)^2 can be expressed asymptotically using the function g(\bar{X}_n, m_2) = m_2 - (\bar{X}_n - \mu)^2 + o_p(n^{-1}), where m_2 = \mathbb{E}[(X_1 - \mu)^2], leading to \sqrt{n} (S_n^2 - \sigma^2) \xrightarrow{d} N(0, \mu_4 - \sigma^4) via the delta method, with \mu_4 = \mathbb{E}[(X_1 - \mu)^4].[21]Another illustrative application is the asymptotic normality of the log-odds estimator from a binomial proportion. Let \hat{p} = k/n be the sample proportion from n independent Bernoulli trials with success probability p \in (0,1), so \sqrt{n} (\hat{p} - p) \xrightarrow{d} N(0, p(1-p)). For the log-odds function g(\hat{p}) = \log(\hat{p}/(1 - \hat{p})), the derivative g'(p) = 1/(p(1-p)) yields\sqrt{n} \bigl( g(\hat{p}) - \log\bigl(\tfrac{p}{1-p}\bigr) \bigr) \xrightarrow{d} N \bigl( 0, \tfrac{1}{p(1-p)} \bigr)by the delta method, providing a basis for confidence intervals in logistic models.[22]
Slutsky's Theorem
Slutsky's theorem provides a fundamental result in asymptotic theory concerning the limiting distributions of functions involving sequences that converge in distribution and in probability. Specifically, if a sequence of random variables X_n converges in distribution to a random variable X, denoted X_n \xrightarrow{d} X, and another sequence Y_n converges in probability to a constant c, denoted Y_n \xrightarrow{p} c, then the product X_n Y_n converges in distribution to c X, that is, X_n Y_n \xrightarrow{d} c X. Similarly, if c \neq 0, the quotient X_n / Y_n converges in distribution to X / c, so X_n / Y_n \xrightarrow{d} X / c. More generally, for any continuous function g: \mathbb{R}^2 \to \mathbb{R}, the composition g(X_n, Y_n) converges in distribution to g(X, c), yielding g(X_n, Y_n) \xrightarrow{d} g(X, c).[23]The theorem was developed by Evgeny Slutsky in his seminal 1925 paper, which introduced concepts of stochastic limits and played a key role in bridging early developments in probability theory and mathematical statistics.[24]A proof of Slutsky's theorem typically proceeds by establishing joint convergence of the pair (X_n, Y_n) in distribution to (X, c), leveraging the fact that convergence in probability to a constant implies convergence in distribution to that degenerate limit; tightness of the sequence Y_n follows from this convergence, and uniform integrability ensures the necessary control for the limiting behavior under continuous mappings. This approach utilizes the continuous mapping theorem to extend the result to functions of the sequences.In applications, Slutsky's theorem is essential for deriving asymptotic distributions of normalized or studentized statistics, where a limiting distribution is scaled by an estimator converging in probability to a constant. For instance, consider the studentized sample mean t_n = \sqrt{n} (\bar{X}_n - \mu) / s_n, where \bar{X}_n is the sample mean, \mu is the population mean, and s_n is the sample standard deviation from an i.i.d. sample with finite variance \sigma^2 > 0. The central limit theorem implies \sqrt{n} (\bar{X}_n - \mu) \xrightarrow{d} N(0, \sigma^2), while the law of large numbers yields s_n \xrightarrow{p} \sigma. Applying Slutsky's theorem, it follows that t_n \xrightarrow{d} N(0, 1). This result underpins the asymptotic validity of many hypothesis tests and confidence intervals in large samples. Slutsky's theorem complements the delta method by accommodating random sequences in the scaling factor that converge in probability.[25]
Applications and Expansions
Statistical Inference
In statistical inference, asymptotic distributions provide the foundation for constructing approximate confidence intervals in large samples, particularly through the Wald interval, which leverages the normal approximation to the sampling distribution of an estimator. For instance, the Wald confidence interval for a population mean \mu based on the sample mean \bar{X}_n is given by \bar{X}_n \pm z_{\alpha/2} \sigma / \sqrt{n}, where z_{\alpha/2} is the (1 - \alpha/2)-quantile of the standard normal distribution and \sigma is the standard deviation; as the sample size n increases, the coverage probability of this interval approaches $1 - \alpha. This approach relies on the central limit theorem and delta method to establish the underlying asymptotic normality of the estimator.Asymptotic distributions also enable the development of hypothesis tests for large samples, such as the z-test for means or proportions, where the test statistic converges in distribution to a standard normal under the null hypothesis, allowing rejection regions based on critical values. Similarly, the chi-squared goodness-of-fit test uses Pearson's statistic, a quadratic form of observed minus expected frequencies, which asymptotically follows a chi-squared distribution with degrees of freedom equal to the number of categories minus one under the null; this facilitates testing whether categorical data conform to a specified distribution.For point estimation, asymptotic normality underpins the large-sample properties of estimators like those from the method of moments and maximum likelihood. Method-of-moments estimators equate sample moments to population moments and inherit asymptotic normality from the central limit theorem applied to the moments. Maximum likelihood estimators (MLEs) achieve asymptotic normality through the information equality, where \sqrt{n}(\hat{\theta}_n - \theta) \xrightarrow{d} N(0, I(\theta)^{-1}) and I(\theta) denotes the Fisher information matrix, ensuring the MLE is asymptotically unbiased, consistent, and efficient in large samples.The nonparametric bootstrap further justifies and approximates asymptotic distributions for complex statistics by resampling the data with replacement to mimic the sampling variability, providing empirical distributions that converge to the true asymptotic limits as proposed by Efron.[26] This method is particularly useful when analytical asymptotic forms are intractable, allowing inference for statistics beyond simple means or proportions.
Edgeworth Expansion
The Edgeworth expansion provides a refinement to the central limit theorem by incorporating higher-order cumulants to achieve a more accurate approximation of the distribution of standardized sums in finite samples. It extends the basic normal approximation by adding correction terms that account for skewness, kurtosis, and other shape characteristics of the underlying distribution.[27][28]For a standardized sum Z_n = \frac{S_n - n \mu}{\sigma \sqrt{n}}, where S_n = \sum_{i=1}^n X_i and the X_i are independent and identically distributed with mean \mu, variance \sigma^2, and cumulants \kappa_r for r \geq 3, the cumulative distribution function (CDF) admits the asymptotic expansionF_n(z) = \Phi(z) - \phi(z) \sum_{s=1}^m n^{-s/2} P_s(z) + O(n^{-(m+1)/2}),where \Phi(z) and \phi(z) are the standard normal CDF and probability density function (PDF), respectively, and the polynomials P_s(z) are of degree $3s constructed from the standardized cumulants \lambda_r = \kappa_r / \sigma^r using Hermite polynomials He_k(z). For instance, the leading correction term of order n^{-1/2} is P_1(z) = \frac{\lambda_3}{6} He_3(z) = \frac{\lambda_3}{6} (z^3 - 3z), capturing skewness effects.[27]The derivation begins with the characteristic function of Z_n, \psi_n(t) = E[e^{it Z_n}], which expands as \psi_n(t) = e^{-t^2/2} \left[ 1 + \sum_{r=3}^\infty \frac{\kappa_r (it)^r}{r! \sigma^r n^{(r-2)/2}} + o(n^{-(m-1)/2}) \right] using the cumulant-generating function, assuming finite moments up to order $2m+1. The PDF or CDF is then obtained via the inverse Fourier transform, yielding the series after integration by parts and expressing terms in orthogonal Hermite polynomials for efficiency.[29][30]Validity requires the existence of moments up to at least the third order for the leading skewness correction, with higher moments needed for subsequent terms; for example, finite fourth moment ensures the O(1/n) remainder. These conditions enable the expansion to improve upon the Berry–Esseen theorem's uniform error bound of O(1/\sqrt{n}) for distributions exhibiting skewness or excess kurtosis, reducing the approximation error to O(1/n) in many cases.[31][28]A representative example is the binomial distribution B(n, p) with small p, where the standardized sum has positive skewness \lambda_3 = (1-2p)/\sqrt{np(1-p)} \approx 1/\sqrt{np} > 0, leading to a right-skewed approximation. The Edgeworth correction adjusts the normal CDF to better match the discrete mass and tail probabilities compared to the plain normal.[32]
Local Asymptotic Normality
Local asymptotic normality (LAN) is a fundamental property in asymptotic statistical theory that describes the local behavior of a sequence of parametric statistical experiments around a true parameter value. A sequence of experiments is said to satisfy LAN at \theta_n if the log-likelihood ratio statistic takes the form\Lambda_n(\theta_n, \theta_n + h/\sqrt{n}) = h^T \Delta_n - \frac{1}{2} h^T I(\theta_n) h + o_p(1),where \Delta_n converges in distribution to N(0, I(\theta_n)), I(\theta_n) is the Fisher information matrix, and h is a fixed vector in \mathbb{R}^p. This expansion approximates the experiment locally by a Gaussian shift experiment, enabling precise asymptotic analysis of inference procedures in parametric models.Under LAN, the maximum likelihood estimator (MLE) \hat{\theta}_n exhibits desirable asymptotic properties, including \sqrt{n}(\hat{\theta}_n - \theta) \xrightarrow{d} N(0, I(\theta)^{-1}), which establishes its asymptotic normality and efficiency relative to the Cramér-Rao lower bound. Moreover, hypothesis tests based on the likelihood ratio or score statistics achieve asymptotic optimality, attaining the power envelope derived from the limiting Gaussian experiment. These implications extend the central limit theorem to likelihood-based inference by providing a uniform approximation over local parameter neighborhoods of order $1/\sqrt{n}.Le Cam's foundational theory links LAN to the concept of contiguity, where sequences of measures under local alternatives are contiguous with respect to the null sequence, ensuring that asymptotic distributions and moments can be interchanged across nearby parameters. This framework allows for robust derivations of limiting distributions for estimators and tests, applicable beyond independent identically distributed (i.i.d.) settings to dependent data under regularity conditions. LAN thus facilitates higher-order asymptotic refinements and equivalence of experiments in Le Cam's sense.A canonical example of LAN occurs in estimating the mean \mu of i.i.d. normal observations X_i \sim N(\mu, \sigma^2) with known \sigma^2. Here, the log-likelihood ratio \Lambda_n(\mu, \mu + h/\sqrt{n}) equals exactly h \bar{X}_n \sigma^{-2} - \frac{1}{2} h^2 \sigma^{-2}, where \bar{X}_n is the sample mean, matching the LAN form with I(\mu) = \sigma^{-2} and no remainder term, demonstrating that the normal model is LAN globally.
In statistics, an estimator \hat{\theta}_n of a parameter \theta is said to be consistent if \hat{\theta}_n \xrightarrow{p} \theta as n \to \infty, where \xrightarrow{p} denotes convergence in probability; this means that for every \epsilon > 0, P(|\hat{\theta}_n - \theta| > \epsilon) \to 0.[33] Consistency ensures that the estimator converges to the true parameter value in probability as the sample size grows, providing a foundational property for reliable inference.[34]Two common types of consistency are mean-squared consistency and uniform consistency. An estimator is mean-squared consistent if the mean squared error E[(\hat{\theta}_n - \theta)^2] \to 0 as n \to \infty, which implies consistency in probability under finite second moments.[35] Uniform consistency strengthens this by requiring supremum convergence over a parameter space: for every \epsilon > 0, \lim_{n \to \infty} \sup_{\theta} P(|\hat{\theta}_n - \theta| > \epsilon) = 0, ensuring the estimator performs well across the entire space rather than at a single point.[34]Proofs of consistency often rely on probabilistic tools like Chebyshev's inequality, which bounds the probability of deviation for estimators with controlled variance; for instance, if an unbiased estimator has variance tending to zero, Chebyshev's inequality directly yields consistency.[33] Slutsky-type arguments extend this to functions of consistent estimators: if \hat{\theta}_n \xrightarrow{p} \theta and g is continuous at \theta, then g(\hat{\theta}_n) \xrightarrow{p} g(\theta).[1]Classic examples illustrate these concepts. The sample mean \bar{X}_n is consistent for the population mean \mu under finite variance, as established by the weak law of large numbers, which follows from Chebyshev's inequality applied to the variance \sigma^2/n \to 0.[33] Similarly, the maximum likelihood estimator (MLE) is consistent under identifiability of the parameter (unique determination by the distribution) and regularity conditions on the likelihood (such as continuity and compactness of the parameter space), as proven in Wald's theorem.[36]
Asymptotic Efficiency
In asymptotic statistics, an estimator \hat{\theta}_n of a parameter \theta is said to be asymptotically efficient if it achieves the Cramér-Rao lower bound in the limit as the sample size n tends to infinity, meaning \sqrt{n}(\hat{\theta}_n - \theta) \xrightarrow{d} N(0, I(\theta)^{-1}), where I(\theta) denotes the Fisher information matrix. This property implies that the estimator converges at the optimal rate of O_p(1/\sqrt{n}) and attains the minimal possible asymptotic variance among all regular estimators.The asymptotic Cramér-Rao bound arises in the framework of local asymptotic normality (LAN), a condition under which the log-likelihood ratio behaves approximately like that of a normal experiment, allowing efficient estimators to saturate the bound. LAN, introduced by Le Cam, ensures that the information bound is locally achievable, providing a foundation for efficiency in multiparameter and non-i.i.d. settings.Under standard regularity conditions, such as differentiability of the log-likelihood and identifiability of \theta, the maximum likelihood estimator (MLE) is asymptotically efficient, as its asymptotic distribution matches the inverse Fisher information. In contrast, the method of moments estimator is generally asymptotically inefficient, exhibiting a larger asymptotic variance unless the population moments are linear functions of the parameters, in which case it coincides with the MLE.[37]Super-efficiency occurs in rare scenarios where an estimator outperforms the Cramér-Rao bound at specific parameter values, as seen with the James-Stein estimator in high-dimensional normal means problems, which dominates the MLE in integrated mean squared error but fails to achieve pointwise asymptotic efficiency. Asymptotic efficiency presupposes consistency, as inconsistent estimators cannot attain the required convergence rate.
Bahadur Efficiency
Bahadur efficiency provides a criterion for comparing the asymptotic performance of hypothesis tests by examining the exponential rate at which the type II error probability diminishes under the alternative hypothesis for a fixed type I error level. For tests of the null hypothesis H_0: \theta = \theta_0 against the one-sided alternative H_1: \theta > \theta_0, the Bahadur exact slope of a test statistic is defined as c(\theta) = -2 \lim_{n \to \infty} n^{-1} \log \beta_n(\theta), where \beta_n(\theta) denotes the type II error probability under the alternative parameter \theta > \theta_0.[38] Larger values of the slope indicate superior efficiency, as they correspond to a faster decay of \beta_n(\theta) to zero.[38]Under conditions of local asymptotic normality (LAN), the likelihood ratio test attains the maximal Bahadur slope, which equals twice the Kullback-Leibler divergence between the distribution under the alternative and that under the null.[39] This optimality highlights the likelihood ratio test's role in achieving the best possible error exponent in exponentially decaying type II errors.[38]A representative example is the comparison between the z-test and the t-test for the mean of a normal distribution. When the population variance is known, the z-test yields a higher Bahadur slope than the t-test, which relies on a sample variance estimate; however, the relative efficiency of the t-test approaches 1 as the sample size n grows large.[40] In goodness-of-fit testing for contingency tables, the likelihood ratio chi-squared statistic demonstrates greater Bahadur efficiency than Pearson's chi-squared statistic for certain alternatives, reflecting its closer alignment with the Kullback-Leibler divergence.[41]The concept of Bahadur efficiency was introduced by R. R. Bahadur in 1960, initially developed to address non-regular statistical problems such as change-point detection, where conventional asymptotic normality assumptions fail.[38]