Anderson–Darling test
The Anderson–Darling test is a statistical goodness-of-fit procedure designed to assess whether an observed sample of data is drawn from a specified probability distribution, such as the normal distribution, by comparing the empirical distribution function of the sample to the theoretical cumulative distribution function (CDF) of the hypothesized distribution.[1][2] Introduced by Theodore W. Anderson and Donald A. Darling in their 1952 paper on asymptotic theory for goodness-of-fit criteria based on stochastic processes, the test builds upon earlier methods like the Cramér-von Mises statistic but incorporates a weighting function that emphasizes deviations in the tails of the distribution, making it more powerful for detecting departures from the null hypothesis in those regions.[1][3] The test statistic, denoted as A^2, is computed asA^2 = -n - \frac{1}{n} \sum_{i=1}^n (2i - 1) \left[ \ln F(y_i) + \ln \left(1 - F(y_{n+1-i})\right) \right],
where n is the sample size, y_1 \leq y_2 \leq \cdots \leq y_n are the ordered observations, and F is the CDF of the hypothesized distribution; under the null hypothesis, A^2 follows an asymptotic distribution, with critical values often approximated or tabulated for finite samples.[2][1] Subsequent modifications by M. A. Stephens in the 1970s and 1980s provided practical tables of critical values and adjustments for specific distributions (e.g., normal, exponential, Weibull), enhancing its applicability for one-sample, two-sample, and k-sample tests in fields like quality control, reliability engineering, and environmental statistics.[2] Compared to the Kolmogorov-Smirnov test, which treats all deviations equally, the Anderson–Darling test's tail-weighting scheme offers greater sensitivity to outliers and non-normality in extremes, though it requires distribution-specific computations and can be computationally intensive for large datasets.[2][3]
Background and History
Origins and Development
The Anderson–Darling test is a statistical goodness-of-fit procedure designed to assess whether a sample of independent and identically distributed observations arises from a specified continuous probability distribution.[1] It evaluates the alignment between the empirical distribution of the sample and the hypothesized distribution, placing particular emphasis on detecting discrepancies in both the tails and the central regions of the distribution.[2] The test was proposed by Theodore W. Anderson and Donald A. Darling in 1952, marking a significant advancement in the asymptotic theory of goodness-of-fit criteria derived from stochastic processes.[1] Their work built upon earlier foundational tests, introducing a framework that incorporates flexible weight functions to enhance the test's discriminatory power.[1] The primary motivation for developing the Anderson–Darling test stemmed from the limitations of prior methods, such as the chi-squared test, Kolmogorov–Smirnov test, and Cramér–von Mises test, which often treated all deviations equally without emphasizing regions where mismatches are most critical.[1] By employing a weight function that assigns greater importance to the tails—specifically through the form ψ(t) = 1/[t(1-t)] for t in [0,1]—the test addresses these shortcomings, providing improved sensitivity to deviations in extreme values while also capturing central differences more effectively than unweighted alternatives.[1][2] In its early stages, the Anderson–Darling test found primary application within theoretical statistics, focusing on univariate continuous distributions to validate distributional assumptions in probabilistic modeling.[1] This theoretical emphasis laid the groundwork for subsequent extensions and practical implementations in fields requiring robust hypothesis testing.[2]Key Publications
The foundational publication on the Anderson–Darling test appeared in 1952, when T. W. Anderson and D. A. Darling introduced a general class of goodness-of-fit criteria based on stochastic processes in their paper "Asymptotic Theory of Certain 'Goodness of Fit' Criteria Based on Stochastic Processes," published in the Annals of Mathematical Statistics. This work derived the asymptotic theory for statistics measuring discrepancies between empirical and hypothesized cumulative distribution functions, with the Anderson–Darling statistic emerging as a weighted integral that emphasizes tail behavior; it built directly on the Cramér–von Mises criterion proposed by Harald Cramér in 1928 in Skandinavisk Aktuarietidskrift and further developed by Richard von Mises, as well as Karl Pearson's chi-squared goodness-of-fit test from 1900 in Philosophical Magazine.[1][4] In a subsequent paper, "A Test of Goodness of Fit," published in 1954 in the Journal of the American Statistical Association, Anderson and Darling elaborated on practical implementation, providing asymptotic distributions under the null hypothesis and numerical tables of critical values, particularly for testing normality. This follow-up addressed computational aspects and specific applications, solidifying the test's utility for empirical distribution function-based inference.[5] These two publications laid the groundwork for the Anderson–Darling test's widespread adoption in statistical practice, establishing its sensitivity to deviations in distribution tails compared to earlier methods like the Cramér–von Mises and chi-squared tests.Single-Sample Goodness-of-Fit Test
Test Statistic Definition
The Anderson–Darling test is a goodness-of-fit procedure applied to a single sample of size n drawn from an independent and identically distributed (i.i.d.) continuous distribution with cumulative distribution function (CDF) F.[1] The test assesses whether the sample conforms to the specified distribution F, which may be fully known (completely specified, as in the non-parametric case) or estimated from the data (parametric case).[1] The test statistic A_n^2, often denoted simply as A^2, is defined for ordered observations x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)} as A_n^2 = -n - \frac{1}{n} \sum_{i=1}^n (2i - 1) \left[ \ln F(x_{(i)}) + \ln \left(1 - F(x_{(n+1-i)})\right) \right]. [2] This expression provides an exact computational form for the statistic under the assumption of a continuous F.[2] In terms of the empirical CDF F_n(x), the Anderson–Darling statistic can be expressed as an integral A_n^2 = n \int_{-\infty}^{\infty} [F_n(x) - F(x)]^2 w(F(x)) \, dF(x), where the weighting function is w(u) = 1 / [u(1 - u)] for u \in (0,1).[1] This form highlights its relation to the Cramér–von Mises statistic, which uses a uniform weight w(u) = 1, but the Anderson–Darling weighting gives greater emphasis to discrepancies in the tails of the distribution (near u = 0 and u = 1), making it more sensitive to deviations there.[1]Computation and Interpretation
To compute the Anderson–Darling test statistic A_n^2 for a single sample of size n, begin by sorting the observations in ascending order to obtain Y_1 \leq Y_2 \leq \cdots \leq Y_n. Next, evaluate the cumulative distribution function (CDF) F of the hypothesized distribution at each ordered observation, yielding z_i = F(Y_i) for i = 1, \dots, n. The statistic is then calculated as A_n^2 = -n - \frac{1}{n} \sum_{i=1}^n (2i - 1) \left[ \ln(z_i) + \ln(1 - z_{n+1-i}) \right], where the sum involves weighted logarithmic terms that emphasize discrepancies in the tails of the distribution.[2][6] This computation assumes continuous distributions without ties. For data with ties or from discrete distributions, the standard formula can lead to biased results due to the non-uniqueness of ranks; adaptations are necessary, such as applying a continuity correction by evaluating the CDF at adjusted points (e.g., F(Y_i + c) where c is a small constant like $0.5/n) or using randomized tie-breaking procedures to approximate the continuous case. Scholz and Stephens (1987) provide detailed modifications for discrete settings, including versions that account for ties by averaging contributions across tied observations.[7] In practice, large values of A_n^2 indicate a poor fit to the hypothesized distribution F, as the statistic measures the weighted deviation between the empirical and theoretical CDFs, with greater sensitivity in the tails. The null hypothesis H_0 (that the data are drawn from F) is rejected at significance level \alpha if A_n^2 exceeds the corresponding critical value from the null distribution.[2] Consider a hypothetical example with a small sample of size n=4 purportedly from a uniform distribution on [0,1]: $0.2, 0.4, 0.7, 0.9. The sorted values are Y_1=0.2, Y_2=0.4, Y_3=0.7, Y_4=0.9, and the CDF evaluations are z_1=0.2, z_2=0.4, z_3=0.7, z_4=0.9.- For i=1: (2\cdot1-1) [\ln(0.2) + \ln(1-0.9)] = 1 \cdot [-1.60944 + (-2.30259)] = -3.91203
- For i=2: (2\cdot2-1) [\ln(0.4) + \ln(1-0.7)] = 3 \cdot [-0.91629 + (-1.20397)] = -6.36079
- For i=3: (2\cdot3-1) [\ln(0.7) + \ln(1-0.4)] = 5 \cdot [-0.35667 + (-0.51083)] = -4.33750
- For i=4: (2\cdot4-1) [\ln(0.9) + \ln(1-0.2)] = 7 \cdot [-0.10536 + (-0.22314)] = -2.29953