Fact-checked by Grok 2 weeks ago

Asymptotic distribution

In statistics and probability theory, an asymptotic distribution, also known as a limiting distribution, refers to the hypothetical distribution that a sequence of random variables or estimators converges to as the sample size n approaches infinity. This concept provides a large-sample approximation to the exact finite-sample distribution, enabling practical inference when exact distributions are intractable. Asymptotic distribution theory underpins key results such as the (LLN), which establishes that sample averages converge in probability (or almost surely under stronger conditions) to the population mean for independent and identically distributed (i.i.d.) random variables with finite . The (CLT) extends this by showing that, under suitable moment conditions like finite variance, the standardized sample mean \sqrt{[n](/page/N+)}(\bar{X}_n - \mu)/\sigma converges in to a standard normal N(0,1) , justifying the use of normal approximations for confidence intervals and hypothesis tests in large samples. Complementary tools include , which preserves convergence in under continuous transformations involving sequences converging in probability to constants, and the , which derives the asymptotic normality of nonlinear functions of estimators via Taylor expansion. These principles are foundational in econometric and statistical modeling, facilitating the analysis of estimators like ordinary least squares (OLS) under assumptions of and asymptotic normality, even when data are not i.i.d., such as in stratified or clustered sampling. Notable applications span hypothesis testing, where test statistics achieve asymptotic distributions like chi-squared or t-distributions, and bootstrap methods that leverage asymptotic validity for resampling-based inference. While powerful, asymptotic results require verification of conditions like Lindeberg or Lyapunov for CLTs in non-i.i.d. settings to ensure reliability.

Fundamentals

Definition

In statistics, an asymptotic distribution refers to the limiting probability distribution that a sequence of random variables or statistics approaches as the sample size n tends to infinity. This limiting behavior provides a theoretical framework for understanding the approximate distribution of estimators or test statistics in large samples, where finite-sample exact distributions may be intractable. Formally, consider a sequence of random variables X_n, such as sample means or test statistics derived from n observations. The asymptotic distribution is the probability distribution L of a random variable Z such that d\left( \frac{X_n - a_n}{b_n}, L \right) \to 0 as n \to \infty for some d on the space of probability measures, where a_n and b_n > 0 are deterministic centering and scaling sequences, respectively. Equivalently, \frac{X_n - a_n}{b_n} \xrightarrow{d} Z, where \xrightarrow{d} denotes (or ). This convergence implies that the cumulative distribution function of the normalized X_n converges pointwise to that of Z at all continuity points of the latter. Weak convergence underpins the asymptotic regime, differing from stronger modes like almost sure or convergence in probability, as it focuses solely on distributional limits without requiring pathwise agreement. Large sample sizes ensure that the finite-sample distribution of X_n is well-approximated by this limiting distribution, enabling practical inferences despite deviations in small samples. The concept traces its origins to Pierre-Simon Laplace's early 19th-century investigations into normal approximations for the , detailed in his 1812 treatise Théorie Analytique des Probabilités. By the , Ronald A. Fisher and extended asymptotic theory to broader statistical applications, including estimation and testing procedures. The standard notation X_n \xrightarrow{d} X signifies that X_n converges in distribution to the X. A canonical illustration is the Central Limit Theorem, which demonstrates the asymptotic normality of standardized sample means from independent, identically distributed variables with finite variance.

Properties

Asymptotic distributions exhibit invariance under continuous transformations, a key property that facilitates their manipulation in probabilistic arguments. Specifically, if a sequence of random variables X_n converges in distribution to a limiting random variable X, denoted X_n \xrightarrow{d} X, and g is a continuous function, then g(X_n) \xrightarrow{d} g(X). This result, known as the continuous mapping theorem, holds because continuity of g ensures that the probability measure induced by g(X_n) converges weakly to that of g(X) in the space of probability measures equipped with a suitable metric, such as the Prohorov metric. A proof sketch relies on the portmanteau theorem for weak convergence: for any closed set F disjoint from the discontinuity set of g, \limsup P(g(X_n) \in F) \leq P(g(X) \in F), and similar bounds for open sets establish the convergence. Asymptotic distributions often preserve or approximate the moments of the limiting distribution under suitable regularity conditions, enabling the transfer of moment-based properties from finite samples to the limit. For instance, if X_n \xrightarrow{d} X and the sequence satisfies uniform integrability, then the means converge: E[X_n] \to E[X], and similarly for variances under second-moment integrability. Cumulants, which generate moments additively, likewise approximate those of the limit when higher-order moments exist, providing a tool for analyzing skewness and kurtosis in approximations. The Lindeberg-Feller conditions, which ensure asymptotic normality for sums of independent random variables with stabilizing variances, exemplify this by guaranteeing that the limiting normal distribution's variance matches the sum of individual variances in the large-sample regime. The rate of convergence to an asymptotic distribution quantifies how quickly the distribution of X_n approaches that of X, often measured via the supremum distance between cumulative distribution functions (CDFs). The Berry-Esseen theorem provides a uniform bound on this rate for sums of independent random variables converging to normality: if the third moments are finite, then \sup_x |F_n(x) - \Phi(x)| = O(1/\sqrt{n}), where F_n is the CDF of the standardized sum and \Phi is the standard normal CDF, with the constant depending on the third-moment ratio. This O(1/\sqrt{n}) rate establishes the scale of approximation error, informing the reliability of asymptotic inferences for finite samples. When an asymptotic distribution exists, it is unique, a consequence of the theory of weak convergence in metric spaces. Weak convergence defines a topology on the space of probability measures, and under metrics like the Prohorov metric (which metrizes weak convergence on separable metric spaces) or the Skorohod metric (for Skorohod space of cadlag functions), the limit measure is uniquely determined by the convergent sequence. This uniqueness ensures that different sequences converging to the same limit share the identical asymptotic distribution, underpinning the consistency of asymptotic results across varied probabilistic settings.

Key Theorems

Central Limit Theorem

The (CLT) asserts that if X_1, X_2, \dots, X_n are independent and identically distributed random variables with \mu and finite positive variance \sigma^2, then the standardized sample \frac{\sqrt{n} (\bar{X}_n - \mu)}{\sigma} converges in distribution to a standard random variable as n \to \infty. This result establishes the asymptotic normality of sums of i.i.d. random variables under mild moment conditions, providing a foundational bridge to approximations in probability and statistics. A standard proof of the CLT relies on characteristic functions. The characteristic function of each X_i - \mu is \phi(t) = E[e^{it(X_i - \mu)}], and for the standardized sum, it becomes [\phi(t / \sqrt{n})]^n. Under finite variance, \log \phi(t) = - \sigma^2 t^2 / 2 + o(t^2) near zero, so [\phi(t / \sqrt{n})]^n \to e^{-t^2 / 2} as n \to \infty by the continuity theorem for characteristic functions, confirming convergence to N(0,1). Alternative proofs use moment-generating functions when they exist, yielding similar expansions. The CLT generalizes beyond i.i.d. cases. The Lyapunov CLT applies to independent random variables X_1, \dots, X_n with means \mu_i and variances \sigma_i^2, where the total variance s_n^2 = \sum_{i=1}^n \sigma_i^2 \to \infty; it requires that for some \delta > 0, \lim_{n \to \infty} \frac{1}{s_n^{2+\delta}} \sum_{i=1}^n E[|X_i - \mu_i|^{2+\delta}] = 0. Under this condition, \frac{\sum_{i=1}^n (X_i - \mu_i)}{s_n} \xrightarrow{d} N(0,1). The Lindeberg CLT further extends to triangular arrays \{X_{n,i}: 1 \leq i \leq k_n\}, where row sums S_n = \sum_{i=1}^{k_n} X_{n,i} have variance s_n^2 \to \infty, and the Lindeberg condition holds: for every \epsilon > 0, \lim_{n \to \infty} \frac{1}{s_n^2} \sum_{i=1}^{k_n} E[X_{n,i}^2 \mathbf{1}_{|X_{n,i}| > \epsilon s_n}] = 0. This implies S_n / s_n \xrightarrow{d} N(0,1), enabling applications to non-stationary sequences. Early examples illustrate the CLT's origins. The de Moivre–Laplace theorem of 1733 approximates the binomial distribution B(n,p) by the normal N(np, np(1-p)) for large n and fixed p \in (0,1), marking the first rigorous normal limit for sums of indicators. This extends to Poisson limits, where the binomial converges to a Poisson distribution with parameter \lambda = np as n \to \infty and p \to 0, and for large \lambda, the Poisson itself admits a normal approximation via the CLT.

Delta Method

The delta method is a fundamental tool in asymptotic statistics for obtaining the limiting distribution of a smooth function of an asymptotically normal random variable, building on the normality established by the central limit theorem. Suppose \sqrt{n}(T_n - \theta) \xrightarrow{d} N(0, \sigma^2) for some estimator T_n of a parameter \theta, where \sigma^2 > 0. If g: \mathbb{R} \to \mathbb{R} is continuously differentiable at \theta with g'(\theta) \neq 0, then \sqrt{n} \bigl( g(T_n) - g(\theta) \bigr) \xrightarrow{d} N \bigl( 0, [g'(\theta)]^2 \sigma^2 \bigr). The proof follows from a Taylor expansion of g around \theta: g(T_n) = g(\theta) + g'(\theta) (T_n - \theta) + R_n, where the remainder R_n = o_p(|T_n - \theta|). Since \sqrt{n}(T_n - \theta) = O_p(n^{-1/2}), it holds that \sqrt{n} R_n = o_p(1), so \sqrt{n} (g(T_n) - g(\theta)) has the same asymptotic distribution as \sqrt{n} g'(\theta) (T_n - \theta), which is with the stated variance. This result extends to the multivariate case. Let \mathbf{T}_n \in \mathbb{R}^k satisfy \sqrt{n} (\mathbf{T}_n - \boldsymbol{\theta}) \xrightarrow{d} N(\mathbf{0}, \boldsymbol{\Sigma}), where \boldsymbol{\Sigma} is the k \times k asymptotic . For a continuously differentiable function \mathbf{g}: \mathbb{R}^k \to \mathbb{R}^m with matrix \mathbf{J}(\boldsymbol{\theta}) = \nabla \mathbf{g}(\boldsymbol{\theta}) of full rank, \sqrt{n} \bigl( \mathbf{g}(\mathbf{T}_n) - \mathbf{g}(\boldsymbol{\theta}) \bigr) \xrightarrow{d} N \bigl( \mathbf{0}, \mathbf{J}(\boldsymbol{\theta}) \boldsymbol{\Sigma} \mathbf{J}(\boldsymbol{\theta})^T \bigr). The proof again uses a multivariate Taylor expansion, with the remainder term vanishing asymptotically under the same order conditions. A classic example arises in deriving the asymptotic distribution of the sample variance for independent and identically distributed observations X_1, \dots, X_n with \mu, variance \sigma^2 > 0, and finite fourth moment. The sample \bar{X}_n satisfies \sqrt{n} (\bar{X}_n - \mu) \xrightarrow{d} N(0, \sigma^2) by the . The unbiased sample variance S_n^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X}_n)^2 can be expressed asymptotically using the function g(\bar{X}_n, m_2) = m_2 - (\bar{X}_n - \mu)^2 + o_p(n^{-1}), where m_2 = \mathbb{E}[(X_1 - \mu)^2], leading to \sqrt{n} (S_n^2 - \sigma^2) \xrightarrow{d} N(0, \mu_4 - \sigma^4) via the , with \mu_4 = \mathbb{E}[(X_1 - \mu)^4]. Another illustrative application is the asymptotic of the log-odds from a proportion. Let \hat{p} = k/n be the sample proportion from n independent Bernoulli trials with success probability p \in (0,1), so \sqrt{n} (\hat{p} - p) \xrightarrow{d} N(0, p(1-p)). For the log-odds function g(\hat{p}) = \log(\hat{p}/(1 - \hat{p})), the derivative g'(p) = 1/(p(1-p)) yields \sqrt{n} \bigl( g(\hat{p}) - \log\bigl(\tfrac{p}{1-p}\bigr) \bigr) \xrightarrow{d} N \bigl( 0, \tfrac{1}{p(1-p)} \bigr) by the , providing a basis for confidence intervals in logistic models.

Slutsky's Theorem

Slutsky's theorem provides a fundamental result in asymptotic theory concerning the limiting distributions of functions involving sequences that converge in distribution and in probability. Specifically, if a sequence of random variables X_n converges in distribution to a random variable X, denoted X_n \xrightarrow{d} X, and another sequence Y_n converges in probability to a constant c, denoted Y_n \xrightarrow{p} c, then the product X_n Y_n converges in distribution to c X, that is, X_n Y_n \xrightarrow{d} c X. Similarly, if c \neq 0, the quotient X_n / Y_n converges in distribution to X / c, so X_n / Y_n \xrightarrow{d} X / c. More generally, for any g: \mathbb{R}^2 \to \mathbb{R}, the g(X_n, Y_n) converges in distribution to g(X, c), yielding g(X_n, Y_n) \xrightarrow{d} g(X, c). The theorem was developed by Evgeny Slutsky in his seminal 1925 paper, which introduced concepts of stochastic limits and played a key role in bridging early developments in and . A proof of typically proceeds by establishing joint of the pair (X_n, Y_n) in to (X, c), leveraging the fact that in probability to a constant implies in to that degenerate ; tightness of the sequence Y_n follows from this , and ensures the necessary control for the limiting behavior under continuous mappings. This approach utilizes the to extend the result to functions of the sequences. In applications, is essential for deriving asymptotic distributions of normalized or studentized statistics, where a limiting distribution is scaled by an estimator converging in probability to a constant. For instance, consider the studentized sample mean t_n = \sqrt{n} (\bar{X}_n - \mu) / s_n, where \bar{X}_n is the sample mean, \mu is the population mean, and s_n is the sample standard deviation from an i.i.d. sample with finite variance \sigma^2 > 0. The implies \sqrt{n} (\bar{X}_n - \mu) \xrightarrow{d} N(0, \sigma^2), while the law of large numbers yields s_n \xrightarrow{p} \sigma. Applying , it follows that t_n \xrightarrow{d} N(0, 1). This result underpins the asymptotic validity of many hypothesis tests and confidence intervals in large samples. complements the delta method by accommodating random sequences in the scaling factor that converge in probability.

Applications and Expansions

Statistical Inference

In statistical inference, asymptotic distributions provide the foundation for constructing approximate confidence intervals in large samples, particularly through the Wald interval, which leverages the normal approximation to the sampling distribution of an estimator. For instance, the Wald confidence interval for a population mean \mu based on the sample mean \bar{X}_n is given by \bar{X}_n \pm z_{\alpha/2} \sigma / \sqrt{n}, where z_{\alpha/2} is the (1 - \alpha/2)-quantile of the standard normal distribution and \sigma is the standard deviation; as the sample size n increases, the coverage probability of this interval approaches $1 - \alpha. This approach relies on the central limit theorem and delta method to establish the underlying asymptotic normality of the estimator. Asymptotic distributions also enable the development of tests for large samples, such as the for means or proportions, where the converges in to a standard normal under the , allowing rejection regions based on critical values. Similarly, the chi-squared goodness-of-fit test uses Pearson's , a of observed minus expected frequencies, which asymptotically follows a with equal to the number of categories minus one under the null; this facilitates testing whether categorical data conform to a specified . For , asymptotic normality underpins the large-sample properties of estimators like those from the method of moments and maximum likelihood. Method-of-moments estimators equate sample moments to population moments and inherit asymptotic normality from the applied to the moments. Maximum likelihood estimators (MLEs) achieve asymptotic normality through the information equality, where \sqrt{n}(\hat{\theta}_n - \theta) \xrightarrow{d} N(0, I(\theta)^{-1}) and I(\theta) denotes the matrix, ensuring the MLE is asymptotically unbiased, consistent, and efficient in large samples. The nonparametric bootstrap further justifies and approximates asymptotic distributions for complex by resampling the data with replacement to mimic the sampling variability, providing empirical distributions that converge to the true asymptotic limits as proposed by Efron. This method is particularly useful when analytical asymptotic forms are intractable, allowing for statistics beyond simple means or proportions.

Edgeworth Expansion

The Edgeworth expansion provides a refinement to the by incorporating higher-order cumulants to achieve a more accurate approximation of the of standardized sums in finite samples. It extends the basic approximation by adding correction terms that account for , , and other shape characteristics of the underlying . For a standardized sum Z_n = \frac{S_n - n \mu}{\sigma \sqrt{n}}, where S_n = \sum_{i=1}^n X_i and the X_i are independent and identically distributed with mean \mu, variance \sigma^2, and cumulants \kappa_r for r \geq 3, the cumulative distribution function (CDF) admits the asymptotic expansion F_n(z) = \Phi(z) - \phi(z) \sum_{s=1}^m n^{-s/2} P_s(z) + O(n^{-(m+1)/2}), where \Phi(z) and \phi(z) are the standard normal CDF and probability density function (PDF), respectively, and the polynomials P_s(z) are of degree $3s constructed from the standardized cumulants \lambda_r = \kappa_r / \sigma^r using Hermite polynomials He_k(z). For instance, the leading correction term of order n^{-1/2} is P_1(z) = \frac{\lambda_3}{6} He_3(z) = \frac{\lambda_3}{6} (z^3 - 3z), capturing skewness effects. The derivation begins with the characteristic function of Z_n, \psi_n(t) = E[e^{it Z_n}], which expands as \psi_n(t) = e^{-t^2/2} \left[ 1 + \sum_{r=3}^\infty \frac{\kappa_r (it)^r}{r! \sigma^r n^{(r-2)/2}} + o(n^{-(m-1)/2}) \right] using the cumulant-generating function, assuming finite moments up to order $2m+1. The PDF or CDF is then obtained via the inverse , yielding the series after and expressing terms in orthogonal for efficiency. Validity requires the existence of moments up to at least the third order for the leading skewness correction, with higher moments needed for subsequent terms; for example, finite fourth moment ensures the O(1/n) remainder. These conditions enable the expansion to improve upon the Berry–Esseen theorem's uniform error bound of O(1/\sqrt{n}) for distributions exhibiting skewness or excess kurtosis, reducing the approximation error to O(1/n) in many cases. A representative example is the B(n, p) with small p, where the standardized sum has positive \lambda_3 = (1-2p)/\sqrt{np(1-p)} \approx 1/\sqrt{np} > 0, leading to a right-skewed approximation. The Edgeworth correction adjusts the CDF to better match the mass and tail probabilities compared to the plain .

Local Asymptotic Normality

Local asymptotic normality (LAN) is a fundamental property in asymptotic statistical theory that describes the local behavior of a of statistical experiments around a true value. A of experiments is said to satisfy LAN at \theta_n if the log-likelihood ratio statistic takes the form \Lambda_n(\theta_n, \theta_n + h/\sqrt{n}) = h^T \Delta_n - \frac{1}{2} h^T I(\theta_n) h + o_p(1), where \Delta_n converges in distribution to N(0, I(\theta_n)), I(\theta_n) is the matrix, and h is a fixed in \mathbb{R}^p. This expansion approximates the experiment locally by a Gaussian shift experiment, enabling precise of procedures in models. Under , the maximum likelihood estimator (MLE) \hat{\theta}_n exhibits desirable asymptotic properties, including \sqrt{n}(\hat{\theta}_n - \theta) \xrightarrow{d} N(0, I(\theta)^{-1}), which establishes its asymptotic and efficiency relative to the Cramér-Rao lower bound. Moreover, tests based on the likelihood or score achieve asymptotic optimality, attaining the power envelope derived from the limiting Gaussian experiment. These implications extend the to likelihood-based inference by providing a uniform approximation over local parameter neighborhoods of order $1/\sqrt{n}. Le Cam's foundational theory links LAN to the concept of contiguity, where sequences of measures under local alternatives are contiguous with respect to the null sequence, ensuring that asymptotic distributions and moments can be interchanged across nearby parameters. This framework allows for robust derivations of limiting distributions for estimators and tests, applicable beyond independent identically distributed (i.i.d.) settings to dependent data under regularity conditions. LAN thus facilitates higher-order asymptotic refinements and equivalence of experiments in Le Cam's sense. A example of occurs in estimating the \mu of i.i.d. normal observations X_i \sim N(\mu, \sigma^2) with known \sigma^2. Here, the log-likelihood ratio \Lambda_n(\mu, \mu + h/\sqrt{n}) equals exactly h \bar{X}_n \sigma^{-2} - \frac{1}{2} h^2 \sigma^{-2}, where \bar{X}_n is the sample , matching the LAN form with I(\mu) = \sigma^{-2} and no remainder term, demonstrating that the normal model is LAN globally.

In statistics, an \hat{\theta}_n of a \theta is said to be consistent if \hat{\theta}_n \xrightarrow{p} \theta as n \to \infty, where \xrightarrow{p} denotes convergence in probability; this s that for every \epsilon > 0, P(|\hat{\theta}_n - \theta| > \epsilon) \to 0. Consistency ensures that the estimator converges to the true parameter value in probability as the sample size grows, providing a foundational property for reliable inference. Two common types of consistency are mean-squared consistency and uniform consistency. An estimator is mean-squared consistent if the mean squared error E[(\hat{\theta}_n - \theta)^2] \to 0 as n \to \infty, which implies consistency in probability under finite second moments. Uniform consistency strengthens this by requiring supremum convergence over a parameter space: for every \epsilon > 0, \lim_{n \to \infty} \sup_{\theta} P(|\hat{\theta}_n - \theta| > \epsilon) = 0, ensuring the estimator performs well across the entire space rather than at a single point. Proofs of consistency often rely on probabilistic tools like , which bounds the probability of deviation for with controlled variance; for instance, if an unbiased has variance tending to zero, directly yields . Slutsky-type arguments extend this to functions of consistent : if \hat{\theta}_n \xrightarrow{p} \theta and g is continuous at \theta, then g(\hat{\theta}_n) \xrightarrow{p} g(\theta). Classic examples illustrate these concepts. The sample mean \bar{X}_n is consistent for the population mean \mu under finite variance, as established by the weak , which follows from applied to the variance \sigma^2/n \to 0. Similarly, the maximum likelihood (MLE) is consistent under of the parameter (unique determination by the distribution) and regularity conditions on the likelihood (such as and of the parameter space), as proven in Wald's .

Asymptotic Efficiency

In asymptotic statistics, an \hat{\theta}_n of a \theta is said to be asymptotically efficient if it achieves the Cramér-Rao lower bound in the limit as the sample size n tends to infinity, meaning \sqrt{n}(\hat{\theta}_n - \theta) \xrightarrow{d} N(0, I(\theta)^{-1}), where I(\theta) denotes the matrix. This property implies that the estimator converges at the optimal rate of O_p(1/\sqrt{n}) and attains the minimal possible asymptotic variance among all regular estimators. The asymptotic Cramér-Rao bound arises in the framework of local asymptotic normality (), a condition under which the log-likelihood ratio behaves approximately like that of a normal experiment, allowing efficient estimators to saturate the bound. , introduced by Le Cam, ensures that the information bound is locally achievable, providing a foundation for efficiency in multiparameter and non-i.i.d. settings. Under standard regularity conditions, such as differentiability of the log-likelihood and of , the maximum likelihood (MLE) is asymptotically efficient, as its asymptotic distribution matches the inverse . In contrast, the method of moments is generally asymptotically inefficient, exhibiting a larger asymptotic variance unless the moments are linear functions of the parameters, in which case it coincides with the MLE. Super-efficiency occurs in rare scenarios where an estimator outperforms the Cramér-Rao bound at specific parameter values, as seen with the James-Stein estimator in high-dimensional means problems, which dominates the MLE in integrated but fails to achieve pointwise asymptotic efficiency. Asymptotic efficiency presupposes , as inconsistent estimators cannot attain the required convergence rate.

Bahadur Efficiency

Bahadur efficiency provides a criterion for comparing the asymptotic performance of tests by examining the exponential rate at which the type II probability diminishes under the for a fixed type I level. For tests of the H_0: \theta = \theta_0 against the one-sided H_1: \theta > \theta_0, the Bahadur exact slope of a is defined as c(\theta) = -2 \lim_{n \to \infty} n^{-1} \log \beta_n(\theta), where \beta_n(\theta) denotes the type II probability under the parameter \theta > \theta_0. Larger values of the slope indicate superior efficiency, as they correspond to a faster of \beta_n(\theta) to zero. Under conditions of local asymptotic normality (LAN), the attains the maximal Bahadur slope, which equals twice the Kullback-Leibler divergence between the distribution under the alternative and that under the null. This optimality highlights the likelihood ratio test's role in achieving the best possible error exponent in exponentially decaying type II errors. A representative example is the comparison between the and the t-test for the of a . When the population variance is known, the yields a higher Bahadur slope than the t-test, which relies on a sample variance estimate; however, the relative efficiency of the t-test approaches 1 as the sample size n grows large. In goodness-of-fit testing for contingency tables, the likelihood ratio chi-squared statistic demonstrates greater Bahadur efficiency than Pearson's chi-squared statistic for certain alternatives, reflecting its closer alignment with the Kullback-Leibler divergence. The concept of Bahadur efficiency was introduced by R. R. Bahadur in 1960, initially developed to address non-regular statistical problems such as change-point detection, where conventional asymptotic assumptions fail.