Fact-checked by Grok 2 weeks ago

Anderson–Darling test

The Anderson–Darling test is a statistical goodness-of-fit designed to assess whether an observed sample of data is drawn from a specified , such as the normal distribution, by comparing the of the sample to the theoretical (CDF) of the hypothesized distribution. Introduced by Theodore W. Anderson and Donald A. Darling in their paper on asymptotic theory for goodness-of-fit criteria based on stochastic processes, the test builds upon earlier methods like the Cramér-von Mises statistic but incorporates a weighting function that emphasizes deviations in the tails of the distribution, making it more powerful for detecting departures from the in those regions. The test statistic, denoted as A^2, is computed as
A^2 = -n - \frac{1}{n} \sum_{i=1}^n (2i - 1) \left[ \ln F(y_i) + \ln \left(1 - F(y_{n+1-i})\right) \right],
where n is the sample size, y_1 \leq y_2 \leq \cdots \leq y_n are the ordered observations, and F is the CDF of the hypothesized distribution; under the null hypothesis, A^2 follows an asymptotic distribution, with critical values often approximated or tabulated for finite samples.
Subsequent modifications by M. A. Stephens in the 1970s and 1980s provided practical tables of critical values and adjustments for specific distributions (e.g., , , Weibull), enhancing its applicability for one-sample, two-sample, and k-sample tests in fields like , , and environmental statistics. Compared to the Kolmogorov-Smirnov test, which treats all deviations equally, the Anderson–Darling test's tail-weighting scheme offers greater sensitivity to outliers and non-normality in extremes, though it requires distribution-specific computations and can be computationally intensive for large datasets.

Background and History

Origins and Development

The Anderson–Darling test is a statistical goodness-of-fit designed to assess whether a sample of independent and identically distributed observations arises from a specified continuous . It evaluates the alignment between the empirical distribution of the sample and the hypothesized distribution, placing particular emphasis on detecting discrepancies in both the tails and the central regions of the distribution. The test was proposed by Theodore W. Anderson and Donald A. Darling in , marking a significant advancement in the asymptotic theory of goodness-of-fit criteria derived from stochastic processes. Their work built upon earlier foundational tests, introducing a framework that incorporates flexible weight functions to enhance the test's discriminatory power. The primary motivation for developing the Anderson–Darling test stemmed from the limitations of prior methods, such as the , , and Cramér–von Mises test, which often treated all deviations equally without emphasizing regions where mismatches are most critical. By employing a that assigns greater importance to the tails—specifically through the form ψ(t) = 1/[t(1-t)] for t in [0,1]—the test addresses these shortcomings, providing improved sensitivity to deviations in extreme values while also capturing central differences more effectively than unweighted alternatives. In its early stages, the Anderson–Darling test found primary application within theoretical statistics, focusing on univariate continuous distributions to validate distributional assumptions in probabilistic modeling. This theoretical emphasis laid the groundwork for subsequent extensions and practical implementations in fields requiring robust hypothesis testing.

Key Publications

The foundational publication on the Anderson–Darling test appeared in 1952, when T. W. Anderson and D. A. Darling introduced a general class of goodness-of-fit criteria based on stochastic processes in their paper "Asymptotic Theory of Certain 'Goodness of Fit' Criteria Based on Stochastic Processes," published in the Annals of Mathematical Statistics. This work derived the asymptotic theory for statistics measuring discrepancies between empirical and hypothesized cumulative distribution functions, with the Anderson–Darling statistic emerging as a weighted integral that emphasizes tail behavior; it built directly on the Cramér–von Mises criterion proposed by Harald Cramér in 1928 in Skandinavisk Aktuarietidskrift and further developed by Richard von Mises, as well as Karl Pearson's chi-squared goodness-of-fit test from 1900 in Philosophical Magazine. In a subsequent paper, "A Test of ," published in 1954 in the Journal of the American Statistical Association, Anderson and Darling elaborated on practical implementation, providing asymptotic distributions under the and numerical tables of critical values, particularly for testing . This follow-up addressed computational aspects and specific applications, solidifying the test's utility for empirical function-based . These two publications laid the groundwork for the Anderson–Darling test's widespread adoption in statistical practice, establishing its sensitivity to deviations in distribution tails compared to earlier methods like the Cramér–von Mises and chi-squared tests.

Single-Sample Goodness-of-Fit Test

Test Statistic Definition

The Anderson–Darling test is a goodness-of-fit procedure applied to a single sample of size n drawn from an independent and identically distributed (i.i.d.) continuous distribution with cumulative distribution function (CDF) F. The test assesses whether the sample conforms to the specified distribution F, which may be fully known (completely specified, as in the non-parametric case) or estimated from the data (parametric case). The test statistic A_n^2, often denoted simply as A^2, is defined for ordered observations x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)} as A_n^2 = -n - \frac{1}{n} \sum_{i=1}^n (2i - 1) \left[ \ln F(x_{(i)}) + \ln \left(1 - F(x_{(n+1-i)})\right) \right]. This expression provides an exact computational form for the statistic under the assumption of a continuous F. In terms of the empirical CDF F_n(x), the Anderson–Darling statistic can be expressed as an integral A_n^2 = n \int_{-\infty}^{\infty} [F_n(x) - F(x)]^2 w(F(x)) \, dF(x), where the weighting function is w(u) = 1 / [u(1 - u)] for u \in (0,1). This form highlights its relation to the Cramér–von Mises statistic, which uses a weight w(u) = 1, but the Anderson–Darling weighting gives greater emphasis to discrepancies in the tails of the distribution (near u = 0 and u = 1), making it more sensitive to deviations there.

Computation and Interpretation

To compute the Anderson–Darling test statistic A_n^2 for a single sample of size n, begin by sorting the observations in ascending order to obtain Y_1 \leq Y_2 \leq \cdots \leq Y_n. Next, evaluate the (CDF) F of the hypothesized distribution at each ordered observation, yielding z_i = F(Y_i) for i = 1, \dots, n. The statistic is then calculated as A_n^2 = -n - \frac{1}{n} \sum_{i=1}^n (2i - 1) \left[ \ln(z_i) + \ln(1 - z_{n+1-i}) \right], where the sum involves weighted logarithmic terms that emphasize discrepancies in the tails of the distribution. This computation assumes continuous distributions without ties. For data with ties or from discrete distributions, the standard formula can lead to biased results due to the non-uniqueness of ranks; adaptations are necessary, such as applying a continuity correction by evaluating the CDF at adjusted points (e.g., F(Y_i + c) where c is a small constant like $0.5/n) or using randomized tie-breaking procedures to approximate the continuous case. Scholz and Stephens (1987) provide detailed modifications for discrete settings, including versions that account for ties by averaging contributions across tied observations. In practice, large values of A_n^2 indicate a poor fit to the hypothesized F, as the measures the weighted deviation between the empirical and theoretical CDFs, with greater in the tails. The H_0 (that the data are drawn from F) is rejected at significance level \alpha if A_n^2 exceeds the corresponding from the . Consider a hypothetical example with a small sample of size n=4 purportedly from a on [0,1]: $0.2, 0.4, 0.7, 0.9. The sorted values are Y_1=0.2, Y_2=0.4, Y_3=0.7, Y_4=0.9, and the CDF evaluations are z_1=0.2, z_2=0.4, z_3=0.7, z_4=0.9.
  • For i=1: (2\cdot1-1) [\ln(0.2) + \ln(1-0.9)] = 1 \cdot [-1.60944 + (-2.30259)] = -3.91203
  • For i=2: (2\cdot2-1) [\ln(0.4) + \ln(1-0.7)] = 3 \cdot [-0.91629 + (-1.20397)] = -6.36079
  • For i=3: (2\cdot3-1) [\ln(0.7) + \ln(1-0.4)] = 5 \cdot [-0.35667 + (-0.51083)] = -4.33750
  • For i=4: (2\cdot4-1) [\ln(0.9) + \ln(1-0.2)] = 7 \cdot [-0.10536 + (-0.22314)] = -2.29953
The sum is -3.91203 - 6.36079 - 4.33750 - 2.29953 = -16.90985. Thus, A_4^2 = -4 - \frac{1}{4} (-16.90985) = -4 + 4.22746 = 0.22746. This small value suggests a reasonable fit to the , consistent with the sample's even spacing.

Null Distribution and Inference

Asymptotic Properties

Under the , as the sample size n approaches , the Anderson–Darling statistic A_n^2 converges in distribution to \int_0^1 \frac{B(t)^2}{t(1-t)} \, dt, where B(t) denotes a standard process on [0, 1]. This limiting distribution lacks a simple closed form but admits an infinite series representation as a weighted of chi-squared random variables with one degree of freedom: \sum_{j=1}^\infty \frac{Z_j^2}{(j - 1/2)^2 \pi^2}, where the Z_j are standard normal random variables. The original tabulations of this distribution, essential for practical inference, were provided by Anderson and for direct computation of critical values and p-values. The derivation of this asymptotic result stems from the functional central limit theorem applied to the . Specifically, \sqrt{n} (F_n(x) - F_0(x)) converges weakly to the B(F_0(x)) in the Skorokhod space D[0,1], and the extends this convergence to the weighted integral functional defining A_n^2, with the denominator F_0(x)(1 - F_0(x)) inducing the specific form of the limit. This weighting scheme, which places greater emphasis on discrepancies near the distribution tails (where t(1-t) is small), enhances the test's sensitivity to tail deviations compared to unweighted statistics. The Anderson–Darling test is consistent against all fixed continuous alternatives, such that the probability of rejecting the null hypothesis tends to 1 as n \to \infty whenever the true distribution differs from F_0. This property arises from the divergence of A_n^2 to infinity under any such alternative, a general feature of empirical distribution function-based goodness-of-fit tests. The asymptotic distribution provides a reliable approximation for sample sizes n \geq 20, but biases in small samples can lead to conservative or liberal inference, often requiring adjusted critical values or simulation-based methods for n < 20.

Critical Values and p-Values

The critical values for the Anderson–Darling test statistic in the single-sample case are typically obtained from tabulated values or approximations, particularly for common distributions like the normal. For the normal distribution, Anderson and Darling (1954) provided initial tables of critical values based on asymptotic theory and simulations for various significance levels, such as 0.05 and 0.01, which can be interpolated for other distributions or sample sizes when exact matches are unavailable. For arbitrary cumulative distribution functions F, software implementations often employ to generate empirical critical values by simulating large numbers of samples (e.g., 10,000 or more) under the null hypothesis and determining the quantiles of the resulting statistic distribution. p-values for the test are computed either through simulation methods or asymptotic approximations to assess the evidence against the null hypothesis. A common simulation approach involves generating replicates from the hypothesized distribution F under the null, computing the Anderson–Darling statistic for each, and estimating the p-value as the proportion of simulated statistics exceeding the observed value; this is particularly useful for non-standard distributions or small sample sizes. For the normal distribution, Stephens (1974) derived asymptotic approximations for p-values, enabling direct computation without simulation for large samples. When the parameters of the hypothesized distribution F are unknown and must be estimated from the sample (e.g., mean and variance for normality), the standard test statistic is adjusted to account for the degrees of freedom lost in estimation, yielding a modified statistic A^*. For the normal distribution with both parameters estimated, Stephens (1974) recommends A^* = A \left(1 + \frac{0.75}{n} + \frac{2.25}{n^2}\right) for sample size n > 4, after which critical values or p-values are referenced to the asymptotic distribution of A. Similar adjustments exist for other distributions with one or two estimated parameters, often involving multiplicative correction factors derived from simulation studies. In practice, critical values and p-values are most efficiently obtained using statistical software packages that automate these computations. The R package nortest provides the ad.test() function for testing, incorporating Stephens' modifications and simulation-based p-values for general cases. Similarly, Python's library implements the anderson() function in .stats, which returns critical values and p-values for , , logistic, and Gumbel distributions, with options extendable via custom code for other F.

Applications to Specific Distributions

Normality Testing

The Anderson–Darling test for evaluates the goodness-of-fit of a sample to a by comparing the empirical cumulative distribution function (ECDF) of the data to the (CDF) of the normal distribution, with parameters estimated from the sample. To perform the test, compute the sample \bar{x} and standard deviation s from the data. Standardize the ordered observations x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)} to obtain z-scores z_i = \frac{x_{(i)} - \bar{x}}{s} for i = 1, \dots, n. The test statistic A^2 is then calculated as A^2 = -n - \frac{1}{n} \sum_{i=1}^n (2i-1) \left[ \ln \Phi(z_i) + \ln \left(1 - \Phi(z_{n+1-i})\right) \right], where \Phi denotes the standard normal CDF. This formulation weights discrepancies more heavily in the tails of the distribution, enhancing sensitivity to departures from normality in extreme values. When the mean and variance (k=2 parameters) are estimated from the sample, the null distribution of A^2 differs from the fully specified case, requiring adjusted critical values and p-value approximations. Stephens (1974) provides tables of critical values for A^2 specifically for normality testing with estimated parameters, applicable for sample sizes up to n=1000 and significance levels such as 0.01, 0.025, 0.05, and 0.10; for example, the 5% critical value is approximately 0.779 for n=40. For larger samples or precise p-values, approximations like A^{2*} = A^2 \left(1 + \frac{0.75}{n} + \frac{2.25}{n^2}\right) can be used to better align with the asymptotic distribution. These adjustments account for the estimation uncertainty, ensuring valid inference under the null hypothesis of normality. The test demonstrates strong characteristics, particularly outperforming the in detecting deviations from , such as or in extremes, due to its inverse CDF weighting scheme that emphasizes regions. While –Wilk may have higher overall for moderate sample sizes under central deviations, Anderson–Darling's makes it preferable for applications where outliers or heavy are of concern, as evidenced in simulation studies across various non-normal alternatives. As an illustrative example, consider birth weights (in grams) of 44 newborns from a hospital on December 18, 1997: 3837, 3334, 3554, 3838, 3625, ..., 3103 (full dataset yields sample mean \bar{x} \approx 3276 and s \approx 548). After to z-scores and computation of the , A^2 \approx 1.717. The adjusted A^{2*} \approx 1.748 yields a of approximately 0.0002, which is less than 0.05; thus, the of normality is rejected, indicating the data exhibit non-normal characteristics, likely due to right-skewness in lower weights.

Exponential and Other Distributions

The Anderson–Darling test can be applied to assess goodness-of-fit for the by substituting the F(x) = 1 - e^{-\lambda x} (for x \geq 0) into the , where the parameter \lambda is typically estimated from the sample as \hat{\lambda} = [1](/page/1) / \bar{x} with \bar{x} denoting the sample mean. This estimation adjusts the to account for parameter uncertainty, yielding a modified statistic whose under the differs from the completely specified case. Critical values for this test with estimated parameters have been derived through simulations, as tabulated in Stephens (1979) for sample sizes up to 5,000 and significance levels of 0.01, 0.025, 0.05, 0.10, and 0.25. For shape-scale parametric families such as the Weibull and s, the Anderson–Darling test is adapted by estimating the shape and scale parameters (e.g., via maximum likelihood) and incorporating the corresponding CDF into the statistic computation, which highlights deviations particularly in the tails due to the test's weighting scheme. In the Weibull case, with CDF F(x) = 1 - e^{-(x/\beta)^\alpha} for shape \alpha and scale \beta, the test shows heightened sensitivity to alternatives with heavier tails, such as the , as demonstrated in power studies where it outperforms the against such departures. Similarly, for the , parameter estimation involves fitting the underlying normal parameters, and the test's tail emphasis makes it effective for detecting or mismatches, with critical values provided via simulation-based adjustments in Stephens (1976). Applications to the uniform and gamma distributions illustrate contrasts between fully specified and estimated-parameter scenarios. For the standard on [0,1], which is fully specified with no parameters to estimate, the uses the identity CDF F(x) = x, and exact critical values are available from asymptotic approximations or tables without adjustments. In contrast, for the with k and \theta estimated from the data, the test employs the incomplete gamma CDF, and Monte Carlo-derived critical values account for effects, showing superior power against specific alternatives like the Weibull (with >1) compared to the Cramér–von Mises test in targeted s. Power comparisons indicate that the Anderson–Darling test generally exhibits higher detection rates for and gamma alternatives deviating toward lighter tails, as quantified in Monte Carlo evaluations across sample sizes from 20 to 100. When precomputed critical value tables are unavailable for a given or parameter configuration, simulation-based methods—such as generating large numbers of samples under the to empirically derive p-values—are recommended for inference, aligning with standard practices for EDF statistics.

Multisample and Non-Parametric Tests

k-Sample Test Statistic

The k-sample Anderson–Darling test extends the single-sample goodness-of-fit to assess the homogeneity of multiple independent samples drawn from continuous , testing the H_0 that all samples originate from the same unspecified continuous F. Under H_0, the test assumes independent samples with positive sizes n_i for i = 1, \dots, k and no ties in the observations, ensuring the distributions are continuous. The test statistic A_{kN}, where N = \sum_{i=1}^k n_i is the total sample size, is defined as A_{kN} = N \int_{B_N} \left[ \sum_{i=1}^k \frac{n_i}{N} \left( F_{n_i}(x) - H_N(x) \right)^2 \right] \frac{1}{H_N(x) \left( 1 - H_N(x) \right)} \, dH_N(x), with B_N = \{ x \in \mathbb{R} : H_N(x) < 1 \}, F_{n_i}(x) denoting the empirical (ECDF) of the ith sample, and H_N(x) the pooled ECDF of all k samples. This formulation weights the squared deviations between each group's ECDF and the overall pooled ECDF by the factor $1/[H_N(x)(1 - H_N(x))], which emphasizes discrepancies in the tails of the distributions, analogous to the weighting in the single-sample Anderson–Darling test. For computation with a pooled ordered sample Z_1 < \dots < Z_N (assuming no ties), the is approximated by the discrete A_{kN} = \sum_{j=1}^N \frac{1}{N} \left[ \sum_{i=1}^k \frac{n_i m_{ij}^2}{N - j + 1} - \left( \sum_{i=1}^k \frac{n_i m_{ij}}{N - j + 1} \right)^2 \right], where m_{ij} is the number of observations in the ith sample less than or equal to Z_j. This approach pools all samples, ranks the combined observations, and calculates the weighted of squared differences for each group relative to the pooled ranks, providing a nonparametric measure sensitive to differences across groups. The statistic was proposed by Scholz and Stephens in 1987 as a of earlier two-sample versions.

Applications and Comparisons

The k-sample Anderson–Darling test finds practical application in for comparing survival distributions across multiple groups, particularly when dealing with right-censored data and testing for non-proportional hazards, such as evaluating differences between treatment groups where short-term adverse effects may contrast with long-term benefits. In , it is used to assess homogeneity of measurements from different sources, for instance, testing whether paper smoothness data from multiple laboratories originate from the same distribution to determine if samples can be pooled for further analysis. Power studies demonstrate that the k-sample Anderson–Darling test often outperforms the in detecting location-scale differences, as evidenced by simulations where it achieved higher power against various alternatives, including scale changes. The test is implemented in statistical software such as through the kSamples package, where the ad.test function computes the statistic and p-values for k independent samples, accommodating unequal sample sizes via weighted empirical distribution functions. While direct built-in support in is limited, custom implementations are feasible using procedures like PROC IML to replicate the rank-based computation. Despite its strengths, the k-sample Anderson–Darling test is sensitive to outliers due to its emphasis on tail regions in the weighting scheme, potentially leading to inflated statistics in the presence of extreme values. It may also exhibit reduced power for deviations concentrated in the central portion of the distribution, where alternatives like the perform better.

Advantages, Limitations, and Extensions

Comparison to Other Tests

The exhibits greater power than the , particularly in detecting deviations in the tails of the distribution, due to its weighting function that emphasizes extreme values through the term [F(x)(1 - F(x))]^{-1}, whereas the KS test relies on the supremum distance of the and is more sensitive to central deviations. simulations demonstrate that the AD test requires smaller sample sizes to achieve equivalent power levels; for instance, in testing with a standardized deviation of 0.5, the AD test needs only 43 samples for 80% power at α=0.05, compared to 52 for the KS test. While the KS test is simpler in formulation and distribution-free, the AD test's tail sensitivity makes it preferable for applications where extreme discrepancies are critical. In comparison to the Cramér–von Mises (CvM) test, the AD test applies a non- weighting that places heavier emphasis on the tails, enhancing its ability to detect extreme deviations, whereas the CvM test uses weighting across the for a more balanced assessment. Power studies for sample sizes of 15 and 25 show the AD test outperforming the CvM test against alternatives like and Weibull distributions, with empirical power values for the AD statistic reaching 0.888 versus 0.301 for CvM at α=0.05 in certain cases. This tail-focused design renders the AD test more sensitive overall, though the CvM test may suffice for deviation detection. For normality testing, the test is generally more suitable for larger samples (n ≥ 50), where it provides robust power comparable to leading methods, while the –Wilk (SW) test is optimal for small samples (n ≤ 50) due to its correlation-based approach that excels in detecting outliers and . Simulations across sample sizes from 10 to 2000 confirm the SW test as the most powerful overall, followed closely by the AD test for n ≥ 100, with both outperforming the test; however, all tests, including AD, exhibit low power (<40%) for very small n ≤ 30. The AD test offers omnibus power, being consistent against a broad range of alternatives without parametric restrictions, which supports its versatility in goodness-of-fit assessments. A key disadvantage is the absence of a closed-form distribution for its statistic, necessitating distribution-specific critical values or approximations, which increases computational demands compared to simpler tests like . One prominent adaptation of the Anderson–Darling test addresses right-censored data, prevalent in and reliability studies, where observations may be incomplete due to censoring mechanisms. In this variant, the is modified by incorporating the Kaplan-Meier estimator for the and adjusting the weighting scheme to reflect the censoring probabilities, ensuring the test remains sensitive to deviations in the tails while accounting for incomplete observations. This approach, building on foundational work for related statistics, enables goodness-of-fit assessment under random right censoring without assuming the censoring distribution. Multivariate extensions of the Anderson–Darling focus on assessing multivariate , often via methods that reduce dimensionality by projecting data onto univariate directions and aggregating the resulting . One approach computes an Anderson–Darling-type based on order statistics from the multivariate sample, providing an approximate for the composite hypothesis of with estimated and . More recent -based variants average univariate Anderson–Darling over random or principal directions, enhancing robustness in moderate dimensions while approximating the multivariate . Post-2000 developments include bootstrap-enhanced versions of the Anderson–Darling test to improve accuracy in small samples, where asymptotic approximations may falter. These methods resample the to estimate the of the , yielding more reliable p-values and critical values, especially for testing with n < 50. Additionally, integrations with techniques, such as embeddings or via projections, extend the test to high-dimensional settings by embedding distributions into reproducing Hilbert spaces and applying weighted discrepancy measures akin to Anderson–Darling. Recent adaptations as of 2024 include discrete versions for distributions like the geometric, further broadening applicability.

References

  1. [1]
    Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on ...
    June, 1952 Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes. T. W. Anderson, D. A. Darling · DOWNLOAD PDF + SAVE TO MY ...
  2. [2]
    1.3.5.14. Anderson-Darling Test - Information Technology Laboratory
    The Anderson-Darling test (Stephens, 1974) is used to test if a sample of data came from a population with a specific distribution. It is a modification of the ...
  3. [3]
    Approximation of modified Anderson–Darling test statistics for ...
    Aug 30, 2013 · The Anderson–Darling (AD) test was proposed by Anderson and Darling (1952) to improve discrimination at the tails of a distribution by giving a ...
  4. [4]
    [PDF] Karl Pearson a - McGill University
    Karl Pearson a a University College, London. Online Publication Date: 01 July 1900. To cite this Article Pearson, Karl(1900)'X. On the criterion that a given ...Missing: original | Show results with:original
  5. [5]
    A Test of Goodness of Fit - Taylor & Francis Online
    Apr 11, 2012 · A Test of Goodness of Fit. T. W. Anderson Columbia University and University of Michigan. &. D. A. Darling Columbia University and University ...
  6. [6]
    A Test of Goodness of Fit - jstor
    [1] Anderson, T. W., and Darling, D. A., 'Asymptotic theory of certain 'good- ness of fit' criteria based on stochastic processes," Annals of Mathematical.Missing: paper | Show results with:paper
  7. [7]
    [PDF] K-Sample Anderson-Darling Tests F. W. Scholz
    Jun 28, 2007 · The use of these tests is two-fold: (a) in a one-way analysis of variance to establish differences in the sampled populations without making any ...
  8. [8]
    [PDF] A Test of Goodness of Fit - Semantic Scholar
    A Test of Goodness of Fit · T. Anderson, D. Darling · Published 1 December 1954 · Mathematics · Journal of the American Statistical Association.
  9. [9]
    EDF Statistics for Goodness of Fit and Some Comparisons
    Abstract. This article offers a practical guide to goodness-of-fit tests using statistics based on the empirical distribution function (EDF).
  10. [10]
    anderson — SciPy v1.16.2 Manual
    For the Anderson-Darling test, the critical values depend on which distribution is being tested against. This function works for normal, exponential, logistic, ...
  11. [11]
    7.2.1.3. Anderson-Darling and Shapiro-Wilk tests
    The Anderson-Darling test (Stephens, 1974) is used to test if a sample of data comes from a specific distribution. It is a modification of the Kolmogorov- ...Missing: steps | Show results with:steps
  12. [12]
    One Sample Anderson-Darling Test - Real Statistics Using Excel
    Describes how to perform a one-sample Anderson-Darling test for normal, exponential, gamma, Weibull, etc. distributions in Excel.Missing: numerical | Show results with:numerical
  13. [13]
    Testing for normality in regression models: mistakes abound (but ...
    Apr 30, 2025 · The Anderson–Darling test is conceptually similar but uses a weighting function to place more weight on the tails of the distribution, so ...
  14. [14]
  15. [15]
    Anderson-Darling Test for Normality - SPC for Excel
    It is a statistical test of whether or not a dataset comes from a certain probability distribution, e.g., the normal distribution. The test involves calculating ...
  16. [16]
    The Anderson-Darling Statistic - DTIC
    The report gives a review of A squared and tables for testing the following distributions - normal, exponential, gamma, extreme-value and Weibull, and logistic ...Missing: total | Show results with:total
  17. [17]
    A review of tests for exponentiality with Monte Carlo comparisons - NIH
    In this paper, 91 different tests for exponentiality are reviewed. Some of the tests are universally consistent while others are against some special classes ...
  18. [18]
    K-Sample Anderson–Darling Tests - Taylor & Francis Online
    Mar 12, 2012 · Related Research Data. Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes ... Cited by lists all citing ...
  19. [19]
  20. [20]
    An extension of the Anderson-Darling k-sample test to arbitrary ...
    Aug 6, 2025 · These tests exhibited a better power as compared to other tests that we tried, such as the Kruskal-Wallis test (Kruskal & Wallis, 1952) and the ...
  21. [21]
    [PDF] kSamples: K-Sample Rank Tests and their Combinations
    The kSamples package compares k samples using rank tests like Anderson-Darling, Kruskal-Wallis, and Jonckheere-Terpstra, and their combinations across blocks.Missing: study | Show results with:study
  22. [22]
    adtest - Anderson-Darling test - MATLAB - MathWorks
    Test statistic for the Anderson-Darling test ... sensitive to outliers and better at detecting departure from normality in the tails of the distribution.
  23. [23]
    [PDF] THE TWO-SAMPLE ANDERSON-DARLING TEST AS AN ...
    This paper introduces the two-sample Anderson-Darling (AD) test of goodness of fit as a tool for comparing distributions, response time distributions in ...Missing: original | Show results with:original<|control11|><|separator|>
  24. [24]
    [PDF] Determining the Statistical Power of the Kolmogorov-Smirnov and ...
    Dec 20, 2016 · Below, the powers of the KS and AD tests are calculated via Monte Carlo simulation for varying sample sizes and distribution parameters. Normal, ...
  25. [25]
    [PDF] Anderson-Darling and Cramer-Von Mises Based Goodness-of-Fit ...
    weight function gives more emphasis to differences in the tails of the distribution. As a result, A2 behaves similarly to W2, but is generally more powerful ...
  26. [26]
    [PDF] Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors ...
    Anderson and Darling (1954) defined the statistic for this test as,. Wn2 = nf:[Fn(x)-F*(x)]2 ip (F*(X))dF*(x). (3) where 1/J is a nonnegative weight function ...
  27. [27]
    Tests for multivariate normality—a critical review with emphasis on ...
    Dec 1, 2020 · Kim and Park (2018) propose a non-invariant test based on univariate Anderson–Darling-type statistics that are averaged out over the d ...<|control11|><|separator|>
  28. [28]
    [PDF] Bootstrap And Other Tests For Goodness Of Fit
    In case of Anderson –Darling test both the exact and bootstrap based test power are almost similar. 2. The difference between the powers of two tests gets ...
  29. [29]
    [PDF] Two Sample Testing in High Dimension via Maximum Mean ...
    In this article, we focus on the test based on maximum mean discrepancy (MMD, here- after) [Gretton et al. (2012)], which is defined as the largest difference ...