Fact-checked by Grok 2 weeks ago

Kolmogorov–Smirnov test

The Kolmogorov–Smirnov test (KS test) is a nonparametric statistical test used to determine whether a sample of follows a specified continuous (one-sample KS test) or whether two independent samples are drawn from the same continuous distribution (two-sample KS test). The test evaluates the maximum vertical distance between the empirical cumulative distribution function (ECDF) of the sample(s) and the theoretical (CDF), providing a measure of discrepancy that is sensitive to differences anywhere in the distribution. Developed independently by Russian mathematicians Andrey Nikolaevich Kolmogorov and Nikolai Vasilyevich Smirnov, the one-sample version was introduced by Kolmogorov in 1933 as a method to assess the goodness-of-fit of empirical distributions to theoretical ones. Smirnov extended the framework in 1939 to the two-sample case, allowing comparison of distributions from two populations without assuming a specific form. The , denoted as D, is defined for the one-sample case as D = \sup_x |F_n(x) - F(x)|, where F_n is the ECDF and F is the reference CDF, and for the two-sample case as D = \sup_x |F_{1,n}(x) - F_{2,m}(x)|, with F_{1,n} and F_{2,m} as the ECDFs of the two samples. Under the of no difference, the of D is of the underlying continuous , enabling exact p-values via tabulated critical values or asymptotic approximations for large samples. Key advantages of the KS test include its distribution-free nature under the , applicability to any continuous without prior from the , and its ability to detect deviations in , , or across the entire of the . However, it assumes continuous and can be less powerful against specific alternatives compared to tests; it is also sensitive to the presence of ties in and requires the reference to be fully specified. Critical values and levels have been extensively tabulated, with asymptotic results facilitating computation for sample sizes n > 40. In practice, the KS test is widely applied in goodness-of-fit testing, such as verifying or uniformity in datasets. For instance, it is commonly used to assess whether observed data conform to an expected model before applying methods. It has applications in fields like , hydrology, and biological sciences. Extensions and modifications, including weighted versions and multivariate analogs, have been developed to address limitations in higher dimensions or specific scenarios.

Introduction

Purpose and variants

The Kolmogorov–Smirnov test serves as a nonparametric primarily used to evaluate goodness-of-fit in the one-sample case, where it determines whether a sample's empirical aligns with a specified theoretical , and in the two-sample case, where it assesses whether two independent samples arise from the same underlying . This test quantifies discrepancies between without requiring parametric assumptions about the form of the distributions, making it suitable for broad applications in distributional inference. As a nonparametric procedure, the test imposes minimal assumptions on the data-generating process, relying only on the of the distributions for the validity of its under the ; it does not presuppose membership in any specific parametric family, such as normality or exponentiality. This distribution-free property under the null enables its use across diverse datasets where parametric forms are unknown or suspect. Key variants include the standard one-sample test, which compares a single empirical distribution to a fully specified reference, and the two-sample test, which compares two empirical distributions directly; a notable adaptation is the , which modifies the one-sample version to account for estimated parameters, such as mean and variance in testing. Under the , the distributions are identical, while the alternative posits a difference in their cumulative distribution functions. Developed in the early , the test emerged as a foundational tool for distribution-free , with the one-sample form introduced by Kolmogorov in 1933 and the two-sample extension by Smirnov in 1939.

Historical development

The Kolmogorov–Smirnov test emerged in the early as a nonparametric method for assessing the fit between empirical and theoretical distributions. In 1933, introduced the one-sample variant in his paper "Sulla determinazione empirica di una legge di distribuzione," where he defined the test for continuous distributions and established its , laying the groundwork for modern goodness-of-fit testing. This contribution was contemporaneous with and influenced by the Glivenko-Cantelli theorem, proved by Valery Glivenko in the same year, which demonstrates the almost sure of the to the true as sample size increases. Nikolai V. Smirnov built upon Kolmogorov's foundation with key extensions in the late 1930s and 1940s. In 1939, Smirnov developed the two-sample test in his work "On the Estimation of the Discrepancy Between Empirical Curves of Distribution for Two Independent Samples," enabling comparisons between distributions from two independent samples without assuming a specific form. He further advanced the framework in 1948 by publishing tables for estimating the goodness of fit of empirical distributions, which provided practical computational aids for applying the test in finite samples. These developments formalized the test's variants, shifting it from abstract probability theory toward applied statistics. Post-World War II, the test gained broader acceptance as statistical computing advanced, with its incorporation into software packages by the 1960s enhancing accessibility for researchers. Refinements in the , notably by F. J. Massey, included detailed tables of critical values for the goodness-of-fit statistic, improving the test's usability for small to moderate sample sizes. By the 1970s, the Kolmogorov–Smirnov test had transitioned into a standard practical tool, widely applied in for verifying process distributions and in astronomy for comparing observational data sets to theoretical models.

One-sample test

Statistic definition

The one-sample Kolmogorov–Smirnov test assesses whether a random sample is drawn from a specified continuous by comparing its empirical cumulative distribution function (ECDF) to the theoretical (CDF). Let X_1, \dots, X_n be a random sample from a distribution with hypothesized CDF F. The ECDF for the sample is defined as F_n(x) = \frac{1}{n} \sum_{i=1}^n \mathbf{1}\{X_i \leq x\}, where \mathbf{1}\{\cdot\} is the . The test statistic D_n is the supremum of the between the ECDF and the hypothesized CDF, D_n = \sup_x |F_n(x) - F(x)|, which quantifies the maximum vertical distance between the ECDF and the theoretical CDF. This statistic, introduced by Kolmogorov in , measures overall discrepancies in the distribution without assuming a form beyond the specified F. The is that the sample arises from the distribution with CDF F, with the alternative that it does not. To compute D_n, sort the observations to obtain the order statistics X_{(1)} \leq \dots \leq X_{(n)}, then evaluate the differences at the points just before and after each X_{(i)}, as the supremum occurs at these jumps in the ECDF. Specifically, compute D_n^+ = \max_{1 \leq i \leq n} \left( \frac{i}{n} - F(X_{(i)}) \right), \quad D_n^- = \max_{1 \leq i \leq n} \left( F(X_{(i)}) - \frac{i-1}{n} \right), and D_n = \max(D_n^+, D_n^-). For illustration, consider a sample of 1000 observations generated from a standard and tested against the normal CDF (α = 0.05, critical value ≈ 0.043). The computed D_n \approx 0.024 does not exceed the critical value, failing to reject the of . In contrast, the same sample size from a yields D_n \approx 0.535, leading to rejection.

Asymptotic distribution

Under the null hypothesis that a sample of size n is drawn from the continuous F, the one-sample Kolmogorov–Smirnov statistic D_n satisfies \sqrt{n} \, D_n \xrightarrow{d} \sup_{t \in [0,1]} |B(t)|, where B is a standard on [0,1]. This convergence holds as n \to \infty. The of this limiting is given by the Kolmogorov : K(\lambda) = 1 - 2 \sum_{k=1}^{\infty} (-1)^{k-1} e^{-2 k^2 \lambda^2}, \quad \lambda \geq 0. This is independent of the underlying CDF F. For inference at level \alpha = 0.05, the asymptotic of \sqrt{n} \, D_n is approximately 1.36; thus, the for D_n itself is about $1.36 / \sqrt{n}. Critical values for finite samples can be obtained from tables. In finite samples, particularly when n < 40, the asymptotic approximation may be inaccurate, and exact p-values are computed using the null distribution via enumeration or Monte Carlo simulations with at least 10,000 replicates. The proof relies on Donsker's invariance principle, where the empirical process \sqrt{n} (F_n - F) converges in distribution to a Brownian bridge B, such that the supremum norm converges to that of |B|.

Parameter estimation effects

When the parameters of the hypothesized distribution, such as the mean μ and standard deviation σ for a , are estimated from the sample data, the null distribution of the Kolmogorov–Smirnov statistic D_n shifts. Specifically, the test statistic D_n^* = \sup_x |F_n(x) - F(x; \hat{\theta})|, where \hat{\theta} denotes the estimated parameters, tends to be smaller under the null hypothesis because the fitted distribution is adjusted to better match the empirical distribution function F_n. If the standard critical values assuming known parameters are used, the test becomes conservative, resulting in Type I error rates lower than the nominal level. The Lilliefors test addresses this issue for testing normality with estimated mean and variance. It modifies the one-sample KS test by employing critical values obtained from Monte Carlo simulations of the null distribution under parameter estimation, rather than relying on the standard Kolmogorov tables. These simulated critical values are smaller than the standard ones to account for the bias introduced by estimation, ensuring the test maintains the desired significance level; for instance, for a sample size of 20 and α = 0.05, the Lilliefors critical value is approximately 0.190, compared to 0.294 for known parameters. Lilliefors (1967) provided tables for critical values up to sample sizes of 300 at common significance levels. For more general cases, particularly location-scale families, the test statistic is computed using the estimated parameters, but appropriate critical values must be derived via Monte Carlo simulations tailored to the specific distribution and number of estimated parameters. This approach simulates samples from the fitted distribution and computes the empirical distribution of D_n^* to obtain p-values or thresholds that correct for the estimation effect. The Pelz-Good algorithm offers an efficient computational method for approximating the cumulative distribution function of the standard KS statistic (with known parameters), which can be integrated into parametric bootstrap procedures for p-value estimation when parameters are unknown. By generating simulations from the estimated distribution and applying the algorithm to compute tail probabilities, it accounts for the reduced degrees of freedom due to parameter estimation, enabling accurate inference without exhaustive full simulations. Pelz and Good (1976) developed this series expansion-based approximation for the lower tail areas, improving efficiency for moderate to large sample sizes. As a practical guideline, the asymptotic Kolmogorov distribution provides a reasonable approximation for the test with estimated parameters when the sample size n > 50, but for smaller n, simulated or tabulated critical values are essential to avoid excessive conservatism. For example, in testing an with estimated rate parameter \hat{\lambda} = 1 / \bar{X}, the critical value for D_n at α = 0.05 and n = 20 is approximately 0.22–0.25, a decrease of about 10–20% compared to the value of 0.294 for known λ, reflecting the need for adjustment to achieve nominal Type I error control.

Discrete case adjustments

The asymptotic theory underlying the Kolmogorov–Smirnov (KS) test assumes a continuous null (CDF) F, under which the empirical CDF F_n converges to a on [0,1], enabling the use of Kolmogorov's limiting distribution for p-values. For null distributions, however, ties in the data cause F_n(x) - F(x) to exhibit jumps and plateaus, preventing the supremum from achieving the full range of the continuous case and resulting in a distribution for the D_n = \sup_x |F_n(x) - F(x)| that stochastically dominates the continuous one. This leads to conservative p-values when using continuous tables, meaning the test has lower to detect deviations, with actual type I error rates below the nominal level (e.g., observed rejection rates of about 2-3% at the 5% level for data). To address these issues, modifications to the or its have been proposed. One approach uses a modified D_n^* = \sup_x |F_n(x^-) - F(x)|, where x^- denotes the left , focusing on deviations just before jumps in the discrete CDF to better approximate continuous behavior. Alternatively, a can be applied by adjusting the empirical ranks, such as comparing F_n\left(\frac{i - 0.5}{n}\right) to F(X_{(i)}) for ordered observations X_{(i)}, which shifts the steps to midpoints of intervals and reduces , though it may slightly alter properties. These adjustments maintain the nonparametric of the test while improving its applicability to data. For exact inference in the discrete case, the posits that the sample follows a with cell probabilities given by the jumps in F. An exact is then P(D_m \geq D_{n, \obs} \mid H_0), computed by enumerating or recursively summing over all possible multinomial outcomes with at least as extreme as observed. Conover's efficiently calculates this for one-sided tests with n \leq 30 by leveraging the structure of partial sums, providing exact critical values; for two-sided tests, conservative bounds are used due to symmetry challenges. This method has been implemented in statistical software for small samples, ensuring proper control of the type I error. In cases of mixed continuous-discrete null distributions, where F has both jumps and continuous parts, techniques such as binning the continuous components into fine intervals or using weighted deviations can approximate the test. More precisely, recursive methods compute the distribution of D_n by integrating over continuous segments and handling masses separately, often incorporating a like adding 0.5 to jumps for . These approaches yield p-values without relying on asymptotic approximations, though they require specifying the full CDF. For example, consider testing a sample against a known null distribution with parameter \lambda = 2. The supremum D_n is evaluated at integer points where jumps occur, but to adjust for the flat intervals between integers, one averages deviations over each or applies the to the ranks before computing F_n. Using Conover's exact method on a small sample (e.g., n=20) might yield a of 0.12 for observed D_n = 0.15, whereas the unadjusted continuous would conservatively report around 0.20. This highlights the adjustment's role in restoring for count data common in Poisson testing. Despite these advances, methods remain computationally intensive for large n (e.g., beyond 100) or distributions with many categories, as the multinomial grows factorially. Approximations like simulation under the null can mitigate this but introduce variability; for highly discrete cases with sparse categories, the chi-squared goodness-of-fit test is often recommended as a more efficient alternative due to its asymptotic and better power in binned settings.

Two-sample test

Statistic definition

The two-sample Kolmogorov–Smirnov test assesses whether two independent samples are drawn from the same continuous distribution by comparing their empirical cumulative distribution functions (ECDFs). Let X_1, \dots, X_{n_1} be a random sample from an unknown distribution with CDF F, and Y_1, \dots, Y_{n_2} be an independent random sample from another unknown distribution with CDF G. The ECDF for the first sample is defined as F_{n_1}(x) = \frac{1}{n_1} \sum_{i=1}^{n_1} \mathbf{1}\{X_i \leq x\}, where \mathbf{1}\{\cdot\} is the , and similarly for the second sample, F_{n_2}(x) = \frac{1}{n_2} \sum_{j=1}^{n_2} \mathbf{1}\{Y_j \leq x\}. The test statistic D_{n_1,n_2} is the supremum of the absolute difference between these ECDFs, D_{n_1,n_2} = \sup_x |F_{n_1}(x) - F_{n_2}(x)|, which quantifies the maximum vertical distance between the two step functions. This statistic, introduced by Smirnov in 1939, measures overall discrepancies in distribution shape, location, and scale without assuming a specific parametric form. For asymptotic approximations, an effective sample size m = \frac{n_1 n_2}{n_1 + n_2} is used to scale the statistic as \sqrt{m} D_{n_1,n_2}, which converges in distribution to the Kolmogorov distribution under the when n_1, n_2 \to \infty. To compute D_{n_1,n_2}, combine and sort the observations from both samples to form the order statistics, then evaluate the ECDF differences at these points, as the supremum occurs at jumps in either ECDF. For illustration, consider two samples of size 8 from ({2, 4, 6, 8, 10, 12, 14, 16}) and size 7 from ({1, 3, 5, 7, 9, 11, 13}), which can be viewed as scaled uniforms for demonstration; the ECDFs yield D_{8,7} \approx 0.357 as the maximum . The is that the two samples arise from the same continuous distribution (F = G), with the alternative that the distributions differ (F \neq G).

Asymptotic distribution

Under the that two samples of sizes n_1 and n_2 are drawn from the same continuous F, the two-sample Kolmogorov–Smirnov statistic D_{n_1,n_2} satisfies \sqrt{m} \, D_{n_1,n_2} \xrightarrow{d} \sup_{t \in [0,1]} |B_1(t) - B_2(t)|, where m = n_1 n_2 / (n_1 + n_2) is the effective sample size and B_1, B_2 are Brownian bridges on [0,1]. This convergence holds as \min(n_1, n_2) \to \infty. The of this limiting is given by the Kolmogorov : K(\lambda) = 1 - 2 \sum_{k=1}^{\infty} (-1)^{k-1} e^{-2 k^2 \lambda^2}, \quad \lambda \geq 0. Surprisingly, this is identical to that of the limiting supremum in the one-sample Kolmogorov–Smirnov test, despite involving two independent samples. For inference at significance level \alpha = 0.05, the asymptotic of \sqrt{m} \, D_{n_1,n_2} is approximately 1.36; thus, the critical value for D_{n_1,n_2} itself is about $1.36 / \sqrt{m}. For unequal sample sizes, critical values can be obtained from tables based on the effective sample size m. In finite samples, particularly when n_1 + n_2 < 40, the asymptotic approximation may be inaccurate, and exact p-values are computed using the permutation distribution under the null, which enumerates all possible label assignments to the combined sample. For larger but still moderate samples, Monte Carlo simulations with at least 10,000 replicates provide reliable p-values by approximating the null distribution. The proof relies on the convergence of the empirical processes \sqrt{n_1} (F_{n_1} - F) and \sqrt{n_2} (F_{n_2} - F) to independent Brownian bridges, such that their appropriately scaled difference converges to a Gaussian process whose supremum has the ; specifically, the law of \sup |B_1 - B_2| (after variance adjustment) coincides with that of \sup |B| for a single B.

Theoretical foundations

Kolmogorov distribution properties

The Kolmogorov distribution arises as the limiting distribution of the Kolmogorov–Smirnov statistic in large samples, specifically as the distribution of the random variable K = \sup_{0 \le t \le 1} |B(t)|, where B(t) = W(t) - t W(1) is a standard Brownian bridge and W is a standard Wiener process. The cumulative distribution function of K is given explicitly by the infinite series K(\lambda) = \Pr(K \le \lambda) = \sum_{k=-\infty}^{\infty} (-1)^k e^{-2 k^2 \lambda^2}, \quad \lambda > 0. This converges rapidly for \lambda > 0 due to the in the terms, allowing practical truncation after a few terms for numerical evaluation. The first two moments of K are \mathbb{E}[K] \approx 0.8693 and \mathrm{Var}(K) \approx 0.3070, with higher moments obtainable through of the derived from the . For large \lambda, the tail probability satisfies \Pr(K > \lambda) \sim 2 e^{-2 \lambda^2}, reflecting the dominant contribution from the first term of the series and providing key insights into large deviation behavior. The Kolmogorov distribution coincides with that of the supremum of a centered on [0,1] with the covariance function \mathbb{E}[B(s)B(t)] = \min(s,t) - st, and it relates to the limits of tied-down random walks or processes in discrete approximations. Numerical evaluation of the CDF employs recursive relations or direct summation of the series for efficiency, with precomputed tables available for critical values corresponding to significance levels as low as \alpha = 0.001.

Convergence and limiting behavior

The provides the foundational result for the uniform consistency of the . Specifically, for independent and identically distributed (i.i.d.) random variables X_1, \dots, X_n drawn from a F, the supremum \sup_x |F_n(x) - F(x)| \to 0 as n \to \infty, where F_n is the . This strong holds under the i.i.d. assumption without further restrictions on F. Building on this, Donsker's invariance principle describes the weak convergence of the properly scaled empirical process to a . Under the i.i.d. condition, the process \sqrt{n} (F_n(x) - F(x)) converges in distribution to a B^0(x) = B(x) - x B(1) in the Skorohod space D[0,1] equipped with the Skorohod topology, where B is a standard . This functional central limit theorem underpins the asymptotic distribution of the Kolmogorov–Smirnov statistic D_n = \sup_x |F_n(x) - F(x)|, which satisfies \sqrt{n} D_n \Rightarrow \sup_x |B^0(x)|. The validity of these asymptotic results requires i.i.d. samples from F for the one-sample case; for the two-sample test, the samples must be and drawn from the same underlying F. Additionally, of F ensures the exact limiting Kolmogorov distribution without adjustments for ties or discontinuities. Stronger quantitative bounds on the approximation are given by the Komlós–Major–Tusnády strong approximation theorem, which constructs a where |\sqrt{n} D_n - \sup_x |B^0(x)|| = O\left( \frac{\log n}{\sqrt{n}} \right) with probability approaching 1 as n \to \infty, under the i.i.d. assumption with finite variance. This rate quantifies the error in approximating the KS statistic by its Brownian bridge limit and improves upon weaker results. Extensions to non-i.i.d. settings, such as dependent data in time series, necessitate modifications like blocking techniques or to restore approximate independence and enable asymptotic validity, though these require case-specific conditions on the dependence structure. Regarding the , for a fixed significance level \alpha, the quantity n D_n^2 converges in distribution to a form related to the squared supremum of the , which exhibits slower convergence rates compared to parametric tests that achieve O(1/n) .

Extensions

Confidence bands for distributions

The Kolmogorov–Smirnov test exhibits a duality with confidence bands for the (CDF). Specifically, failing to reject the H_0: F = F_0 (where F is the true CDF and F_0 is a specified CDF) at significance level \alpha implies, with confidence $1 - \alpha, that the true CDF F satisfies \sup_x |F(x) - F_0(x)| \leq d_{\alpha,n}, where d_{\alpha,n} is the of the KS D_n. This inversion of the test provides a nonparametric consisting of all CDFs G such that \sup_x |G(x) - F_0(x)| \leq d_{\alpha,n}. For an unknown distribution without a specified F_0, the duality yields confidence bands centered on the empirical CDF F_n(x). The (1 - \alpha) confidence band is given by \left[ \max\left(0, F_n(x) - \frac{c_\alpha}{\sqrt{n}}\right), \min\left(1, F_n(x) + \frac{c_\alpha}{\sqrt{n}}\right) \right], where c_\alpha is the (1 - \alpha) of the (the limiting distribution of \sqrt{n} D_n). For example, c_{0.05} \approx 1.36 provides an asymptotic 95% confidence band. Smirnov's method constructs these bands with uniform width for the one-sample case assuming known parameters, relying on the exact of the KS statistic under . When parameters are estimated from the data (e.g., via maximum likelihood), the of the KS statistic changes, requiring the band to be widened by a simulation-derived factor to maintain coverage; this adjustment, often implemented via methods, accounts for the estimation uncertainty and is detailed in approaches like the Lilliefors modification. In survival analysis, these bands set nonparametric limits on reliability functions (survival curves, $1 - F(x)), aiding inference on failure probabilities without parametric assumptions. The bands provide simultaneous coverage over all x \in \mathbb{R}, ensuring the true CDF lies within the band with probability $1 - \alpha. For known F_0 and continuous distributions, coverage is exact; with unknown distributions, it is asymptotic as n \to \infty. For discrete distributions, the bands tend to be conservative due to ties in the empirical CDF, leading to inflated critical values; adjustments like or corrections mitigate this. Alternatives, such as Hall–Wellner bands, weight deviations by \sqrt{F_n(x)(1 - F_n(x))} to improve uniformity and power in the tails, though they require more computation.

Multivariate versions

The multivariate Kolmogorov–Smirnov test extends the univariate test to compare probability distributions in d \geq 2 dimensions, addressing scenarios where data points are vectors in \mathbb{R}^d. The multivariate empirical cumulative distribution function (ECDF) for a sample of n i.i.d. observations X_1, \dots, X_n \in \mathbb{R}^d is defined as F_n(\mathbf{x}) = \frac{1}{n} \sum_{i=1}^n \prod_{j=1}^d I(X_{i,j} \leq x_j), where \mathbf{x} = (x_1, \dots, x_d) and I(\cdot) is the ; this counts the proportion of points whose components are all less than or equal to the corresponding components of \mathbf{x}, analogous to the univariate case but over orthants. For the one-sample goodness-of-fit test against a hypothesized with CDF F, the is D_n = \sup_{\mathbf{x} \in \mathbb{R}^d} |F_n(\mathbf{x}) - F(\mathbf{x})|, measuring the maximum vertical deviation between the ECDF and the target CDF. However, computing this supremum is computationally intractable due to the infinite domain of \mathbb{R}^d and the need to evaluate the difference everywhere, which becomes prohibitive even for moderate d. Practical implementations approximate the supremum using methods like evaluating the statistic over a finite grid of points or at marginal maxima derived from the data, as developed in early multidimensional extensions. For instance, proposed an exact computational approach for the two-sample case in two or three dimensions by sorting the data and checking deviations across all relevant orthants defined by the combined sample points, avoiding full grid searches. An alternative is the analog, which integrates the squared deviations over the space rather than taking the supremum, though it requires similar approximations in higher dimensions. In the two-sample setting, with samples of sizes n_1 and n_2 having ECDFs F_{n_1} and F_{n_2}, the statistic is D_{n_1,n_2} = \sup_{\mathbf{x} \in \mathbb{R}^d} |F_{n_1}(\mathbf{x}) - F_{n_2}(\mathbf{x})|, again evaluated over orthants to detect differences in multidimensional distributions. Under the null hypothesis of identical distributions, the asymptotic distribution involves a multivariate Brownian bridge, but no closed-form expression exists for d > 1, complicating exact p-value computation. Key challenges in multivariate applications include the curse of dimensionality, where the effective sample size diminishes rapidly as d increases, leading to low and unreliable approximations even for d=3. Consequently, tests often rely on bootstrap methods to estimate s, resampling from the combined or individual samples to simulate the , though this adds significant computational cost. For example, to test bivariate normality on 2D scatterplot from a sample of size n=100, one might compute D_n by discretizing a over the observed of each dimension (e.g., 50x50 points) and evaluating deviations from the bivariate normal CDF, then using for the ; significant deviations would indicate non-, such as clustering or outliers in specific orthants.

Higher-dimensional generalizations

Higher-dimensional generalizations of the Kolmogorov–Smirnov (KS) test extend beyond finite-dimensional vectors to infinite-dimensional or non-Euclidean spaces, such as function spaces or graph structures, by adapting the supremum distance to measure discrepancies in empirical and target distributions. In , where observations are curves or functions in a H, the test compares projected empirical distribution functions to a distribution. The statistic is typically defined as T_{n,\pi} = \sup_t |F_{n,\pi}(t) - F_{0,\pi}(t)|, where \pi: H \to \mathbb{R} is a (e.g., onto principal components derived from a Karhunen–Loève expansion), F_{n,\pi} is the empirical of the projected data, and F_{0,\pi} is the projected null distribution; this approach leverages random or all-projections to capture multidimensional variability while maintaining computational tractability. Such tests assess goodness-of-fit for functional models by aggregating projection-based KS statistics, with asymptotic properties derived from empirical process theory. For or distributions, the KS test is adapted by treating graphs as realizations of random variables over edge probabilities or subgraph counts, with the statistic computed as the supremum of differences in empirical cumulative distributions of these features. This measures similarity between network ensembles, such as testing whether two sets of graphs arise from the same , and has applications in for detecting structural shifts. The approach embeds properties into a distributional framework, enabling non-parametric comparisons without assuming specific parametric forms. As an alternative in high dimensions, the metric provides a related supremum-based measure that avoids the curse of dimensionality plaguing grid-based extensions; defined as d^2(P, Q) = 2\mathbb{E}[\|X - Y\|] - \mathbb{E}[\|X - X'\|] - \mathbb{E}[\|Y - Y'\|], it quantifies differences via expected distances and offers faster computation through direct sample estimates without projections or kernels. This makes it suitable for high-dimensional settings where traditional variants degrade in power. Theoretically, these generalizations embed distributions into a (RKHS) \mathcal{H}_k, mapping measures P to kernel mean embeddings \mu_P = \int k(\cdot, y) \, dP(y), with the as d_{k,\Lambda}(P_n, Q_m) = \sup_{\lambda \in \Lambda} \|\mu_{P_n} - \mu_{Q_m}\|_{\mathcal{H}_{k_\lambda}} over a of kernels; under the null follows functional Donsker theorems, ensuring \sqrt{nm/(n+m)} \, d_{k,\Lambda}(P_n, Q_m) converges to the supremum of a in the RKHS unit ball. This framework unifies infinite-dimensional two-sample testing with universal Donsker classes for empirical processes. Post-2010 developments include bootstrap methods for computation in scenarios, where KS statistics monitor evolving distributions online by resampling from sliding windows to approximate null distributions under non-stationarity. In , these tests detect distribution shifts, such as in for traffic control, where KS distances between vehicle flow empirical CDFs quantify deviations causing performance drops (e.g., a 0.02 increase linked to 3.7% throughput loss). An illustrative example is testing uniformity on the hypersphere S^{d-1} for d \geq 3, where projections onto random directions yield one-dimensional statistics K_n = \sup_x |F_n(x) - F_0(x)| on the projected uniforms, aggregated over multiple directions to assess global deviations; angular measures, like cosine-based pairwise \cos^{-1}(U_i^\top U_j), further adapt the supremum to for rotation-invariant testing.

Practical considerations

Implementation algorithms

The Kolmogorov–Smirnov (KS) statistic for the one-sample test is computed by first sorting the sample data in ascending order to obtain the order statistics X_{(1)} \leq X_{(2)} \leq \cdots \leq X_{(n)}. The empirical cumulative distribution function (ECDF) is then evaluated at these points, and the test statistic D_n is the supremum of the absolute differences between the ECDF and the hypothesized cumulative distribution function F, specifically D_n = \sup_x |F_n(x) - F(x)|, where F_n(x) = i/n for X_{(i)} \leq x < X_{(i+1)}. In practice, this supremum is achieved at the order statistics, so D_n is calculated as the maximum of |i/n - F(X_{(i)})| and |(i-1)/n - F(X_{(i)})| over i = 1, \dots, n. For the two-sample test, the samples from the two groups are sorted separately to form order statistics, and the ECDFs F_{n_1} and F_{n_2} are computed for samples of sizes n_1 and n_2. The statistic D_{n_1,n_2} is the supremum of |F_{n_1}(x) - F_{n_2}(x)| over x, which can be efficiently found by merging the two sorted lists and evaluating the differences at the combined order statistics, accounting for jumps in each ECDF. This merged approach avoids redundant computations by traversing the combined sequence once. The time complexity of computing the KS statistic involves an initial sorting step, which requires O(n \log n) operations for a sample of size n in the one-sample case, or O((n_1 + n_2) \log (n_1 + n_2)) for the two-sample case. Following sorting, the search for the supremum difference proceeds in linear time, O(n), by iterating over the order statistics. P-values for the KS test are calculated differently based on sample size. For large n, asymptotic approximations to the Kolmogorov distribution K(\lambda) are used, where \lambda = D \sqrt{n} (adjusted for two-sample as \lambda = D \sqrt{n_1 n_2 / (n_1 + n_2)}), providing the tail probability P(K > \lambda). For smaller samples or when parameters are estimated, exact p-values can be obtained via dynamic programming methods, such as the Durbin matrix algorithm, which computes the distribution recursively in O(n^2) time. An efficient approximation is the Pelz-Good asymptotic series, which expands the distribution function using a series of terms for improved accuracy over basic asymptotics, particularly for moderate n. When exact or asymptotic methods are computationally intensive, simulation offers a flexible alternative for estimation. This involves generating B (typically $10^4 to $10^5) independent samples under the , computing the KS statistic for each, and estimating the as the proportion of simulated statistics exceeding the observed D. The process is parallelizable across simulations to reduce time. Special handling is required for edge cases to ensure and validity. In discrete distributions with ties, the standard KS test assumes and may produce biased results; ties can be addressed by averaging the ECDF ranks at tied points or by adding small random noise to break ties while preserving the distribution. Numerical stability issues arise when F(x) is near 0 or 1, potentially causing floating-point errors in differences; these are mitigated by using adjusted ECDF evaluations, such as (i - 0.5)/n instead of i/n, to avoid boundary artifacts. The following pseudocode illustrates the computation of the one-sample KS statistic:
function ks_one_sample_statistic(sample, F):
    n = length(sample)
    sort sample to get X_sorted  # O(n log n)
    D = 0
    for i in 1 to n:
        # Evaluate at jumps
        d1 = abs(i / n - F(X_sorted[i]))
        d2 = abs((i - 1) / n - F(X_sorted[i]))
        # Optional adjustment for continuity correction
        d3 = abs((i - 0.5) / n - F(X_sorted[i]))
        D = max(D, d1, d2, d3)
    return D  # O(n) after sorting

Software and libraries

The Kolmogorov–Smirnov (KS) test is implemented in various statistical software packages and programming libraries, facilitating its application in data analysis workflows across different environments. In the R programming language, the base stats package provides the ks.test() function, which supports both one-sample and two-sample KS tests. For the one-sample case, it compares an empirical distribution to a specified cumulative distribution function (CDF), such as the normal distribution, with options to adjust for estimated parameters using the Lilliefors correction. The syntax is exemplified as ks.test(x, y = "pnorm", mean = 0, sd = 1), where x is the data vector and y specifies the theoretical distribution. For advanced handling of discrete distributions, the dgof package extends KS testing with adjustments for ties and lattice distributions, while the boot package enables simulation-based p-value estimation through bootstrapping. Python's library offers robust implementations via the scipy.stats module, including kstest() for one-sample tests against a specified or empirical CDF, and ks_2samp() for two-sample comparisons. These functions integrate seamlessly with arrays, supporting efficient computation on large datasets, and allow custom CDFs for flexible testing. For instance, kstest(rvs, 'norm', args=(0, 1)) tests assuming mean 0 and standard deviation 1. Starting with version 1.10.0, enhancements to bootstrap methods improve accuracy for calculations in complex scenarios. MATLAB includes built-in functions kstest() and kstest2() in its Statistics and Toolbox, performing one- and two-sample KS tests, respectively, with access to precomputed tables for asymptotic distributions. These tools output test statistics, p-values, and intervals, suitable for both interactive and scripted analysis. Other environments provide specialized support: Julia's HypothesisTests.jl package implements KS tests through functions like ExactKS for one-sample and ApproximateTwoSampleKS for two-sample cases, emphasizing . In , the PROC NPAR1WAY procedure conducts KS tests as part of nonparametric analyses. For users of , third-party add-ins like the Real Statistics Resource Pack enable basic one-sample KS testing via menu-driven interfaces or VBA functions. Considerations for implementation include the balance between open-source options like , , and —which offer extensive customization and community support—and proprietary tools like and , which provide integrated environments with validated compliance for enterprise use. Version-specific updates, such as those in , underscore the importance of maintaining current installations to leverage algorithmic refinements.

Power and limitations

The Kolmogorov–Smirnov () test demonstrates moderate statistical for detecting location shifts in continuous , where it outperforms the due to the latter's reliance on binning, which reduces effectiveness for small sample sizes and continuous data. Simulation studies show that varies with sample size, alternative distribution, and significance level. However, the KS test has low against changes in variance or deviations involving heavy tails, as it is primarily sensitive to discrepancies near the center of the distribution rather than the extremes. In comparisons with other goodness-of-fit tests, the KS statistic exhibits lower than the Anderson-Darling test, particularly for tail deviations, where the latter weights observations more heavily in the tails and requires smaller sample sizes (e.g., n= vs. n=17 for 80% at a standardized shift of δ=0.9 in distributions). Relative to the , the KS approach is preferable for small n (e.g., n<50) because it avoids arbitrary binning and provides exact distribution-free critical values under continuity assumptions. from 1980s studies, such as those by Stephens, quantified these power curves through extensive tables and simulations for exponentiality and tests, confirming the KS test's relative strengths and weaknesses across alternatives. Key limitations include high sensitivity to ties in discrete data, which renders the test underpowered and requires conservative adjustments or specialized implementations to maintain validity. The test assumes and identically distributed (i.i.d.) observations, leading to invalid results and inflated Type I errors for clustered or dependent data, such as . Additionally, when distribution parameters are estimated from the data, the test becomes conservative, necessitating simulation-based critical values rather than asymptotic approximations. Common pitfalls involve misinterpreting p-values without verifying or over-relying on asymptotic distributions for small samples (n<20), where exact methods or are recommended to avoid underpowered inferences. To address these issues, weighted variants of the KS test, such as Kuiper's test for circular data, improve against specific alternatives like rotational shifts by emphasizing weighting across the . Bootstrap resampling enhances overall by incorporating parameter uncertainty and providing more accurate p-values, particularly for finite samples or non-i.i.d. settings. Recent studies in the 2020s highlight significant power loss in high-dimensional extensions of the KS test, where the curse of dimensionality dilutes sensitivity, prompting developments in sliced or projected variants for multivariate applications.

References

  1. [1]
    1.3.5.16. Kolmogorov-Smirnov Goodness-of-Fit Test
    The Kolmogorov-Smirnov test (Chakravart, Laha, and Roy, 1967) is used to decide if a sample comes from a population with a specific distribution. ... where n(i) ...Missing: history applications
  2. [2]
    [PDF] An Appreciation of Kolmogorov's 1933 Paper - DTIC
    Jun 15, 1992 · In 1933, A. N. Kolmogorov (1933a) published a short but landmark paper in the Italian Actuarial Journal. He formally defined the empirical.
  3. [3]
    7.2.1.2. Kolmogorov- Smirnov test - Information Technology Laboratory
    The Kolmogorov-Smirnov (K-S) test was originally proposed in the 1930's in papers by Kolmogorov (1933) and Smirnov (1936). Unlike the Chi-Square test, which ...Missing: original | Show results with:original
  4. [4]
    [PDF] Nonparametric Goodness-of-Fit Tests for Discrete Null Distributions
    As with the Kolmogorov-Smirnov test, the forms of the test statistics are unchanged, and the null distributions of the test statistics are again hypothesis- ...
  5. [5]
    Beware the Kolmogorov-Smirnov test!
    It is a nonparametric hypothesis test that measures the probability that a chosen univariate dataset is drawn from the same parent population as a second ...
  6. [6]
    [PDF] Computing the Kolmogorov-Smirnov Distribution When the ...
    In this paper, we provide a fast and accurate method to compute the (complementary). CDF of the KS statistic when F(x) is discontinuous, and thus obtain exact p ...Missing: original | Show results with:original
  7. [7]
    [PDF] Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors ...
    This paper* compares the power of four formal tests of normality: Shapiro-Wilk (SW) test, Kolmogorov-Smirnov (KS) test, Lillie/ors (LF) test and Anderson- ...
  8. [8]
    Smirnov, N.V. (1939) On the Estimation of the Discrepancy between ...
    The objective of this study is to propose the Parametric Seven-Number Summary (PSNS) as a significance test for normality and to verify its accuracy and power.
  9. [9]
    The Kolmogorov-Smirnov Test for Goodness of Fit
    The test is based on the maximum difference between an empirical and a hypothetical cumulative distribution.Missing: history | Show results with:history
  10. [10]
    [PDF] A Statistical Analysis of the Nulling Pulsar Population - arXiv
    Jan 29, 2021 · Astronomers have been using the K-S test, for two-samples as well as for goodness-of-fit, for decades now (e.g., Peacock 1983). This is mainly ...
  11. [11]
    Kolmogorov-Smirnov 2-Sample Goodness of Fit Test
    Test Statistic: The Kolmogorov-Smirnov two sample test statistic is defined as. D = | E 1 ( i ) − E 2 ( i ) |. where E1 and E2 are the empirical distribution ...
  12. [12]
    The significance probability of the smirnov two-sample test
    In 1939 N. V. Smirnov proposed the following rank-order test for the two-sample problem. Let x 1 .... , xm and Yl ..... Yn be samples of independent ...
  13. [13]
    How do I calculate the effect size for the Kolmogorov-Smirnov Z ...
    Mar 2, 2011 · Yes. D=Z/√n for the one-sample test. D=Z/√n1n2n1+n2 for the two-sample test. D should also be the "Most Extreme Differences - Absolute" ...Is there a rule of thumb regarding effect size and the two sample KS ...Test for the difference between two two-sample Kolmogorov ...More results from stats.stackexchange.comMissing: effective | Show results with:effective
  14. [14]
    Two-Sample Kolmogorov-Smirnov Test - Real Statistics Using Excel
    The two-sample Kolmogorov-Smirnov test is used to test whether two samples come from the same distribution. The procedure is very similar to the One-Sample ...
  15. [15]
    On the Kolmogorov-Smirnov Limit Theorems for Empirical Distributions
    June, 1948 On the Kolmogorov-Smirnov Limit Theorems for Empirical Distributions. W. Feller · DOWNLOAD PDF + SAVE TO MY LIBRARY. Ann. Math. Statist. 19(2): 177 ...
  16. [16]
    [PDF] Critical Values for the Two-sample Kolmogorov-Smirnov test (2-sided)
    Table gives critical D-values for α = 0.05 (upper value) and α = 0.01 (lower value) for various sample sizes. * means you cannot reject H0 regardless of ...
  17. [17]
    Two Sample Kolmogorov-Smirnov Table - Real Statistics Using Excel
    Provides a table of critical values for the Two-Sample Kolmogorov-Smirnov test. Values for alpha = .01, .02, .05, .10 and .20 are included.
  18. [18]
    Exact and randomization distributions of Kolmogorov-Smirnov tests ...
    The aim of this paper is to compare several test procedures for the two- or three-sample case. These comprise the Birnbaum-Hall test and the k-sample ...
  19. [19]
  20. [20]
    Goodness-of-Fit Test for Non-Stationary and Strongly Dependent ...
    In this article we improve a goodness-of-fit test, of the Kolmogorov-Smirnov type, for equally distributed- but not stationary-strongly dependent data.
  21. [21]
    22.3 - A Confidence Band | STAT 415
    Another application of the Kolmogorov-Smirnov statistic is in forming a confidence band for an unknown distribution function.
  22. [22]
    [PDF] Statistics of the Kolmogorov-Smirnov Type (Conover Chapter Six)
    Oct 25, 2018 · That is the topic of the tests we consider now, starting with variants of the Kolmogorov-Smirnov test. 1 The Kolmogorov Test. 1.1 Test Statistic ...
  23. [23]
    Kolmogorov-Smirnov Test - an overview | ScienceDirect Topics
    The test is named after Cramér (1928) and von Mises (1931), who suggested a goodness-of-fit statistic ∫[FN(x) ≠ F(x)]2 dH(x), were FN is an empirical ...Missing: history | Show results with:history
  24. [24]
    Kolmogorov-Smirnov Tests when Parameters are Estimated ... - jstor
    The paper considers Kolmogorov-Smirnov tests of goodness of fit in the presence of unknown nuisance parameters. It is shown that for a wide class of cases ...Missing: band | Show results with:band
  25. [25]
    [PDF] v1703375 Confidence Bands for the Weibull Distribution
    Based on a random sample from the Weibull population with unknown shape and scale parameters, one- and two-sided confidence bands on the entire cumulative ...Missing: survival | Show results with:survival
  26. [26]
    [PDF] Confidence Bands for a Survival Curve from Censored Data - WJ Hall
    Jun 7, 2005 · The Kolmogorov bands have been extended to completely truncated or censored samples, in which detailed data are available only up to a fixed ...
  27. [27]
    A multivariate Kolmogorov-Smirnov test of goodness of fit
    This paper presents a distribution-free multivariate Kolmogorov-Smirnov goodness-of-fit test. The test uses a statistic which is built using Rosenblatt's ...
  28. [28]
    multidimensional version of the Kolmogorov–Smirnov test
    We discuss a generalization of the classical Kolmogorov–Smirnov test, which is suitable to analyse random samples defined in two or three dimensions.
  29. [29]
    fasano.franceschini.test: An Implementation of a Multivariate KS Test ...
    Dec 17, 2023 · The Kolmogorov–Smirnov (KS) test is a nonparametric statistical test used to test for differences between univariate probability distributions.<|control11|><|separator|>
  30. [30]
    [PDF] A review of goodness-of-fit tests for models involving functional data
    May 27, 2021 · Both types of tests, all-projections and finite-random-projections, may feature several distances for T, such as Kolmogorov–Smirnov or. Cramér– ...
  31. [31]
  32. [32]
    [PDF] A statistical test for network similarity - arXiv
    Aug 20, 2025 · Finally, we measure the (dis)similarity between each graph (random variable set) using the Kolmogorov-Smirnov metric [1, 25].
  33. [33]
    [PDF] Testing for Equivalence of Network Distribution Using Subgraph ...
    May 28, 2019 · Specifically, we use sub- graph counts to test the hypotheses that all networks in a sample are generated either from a given distribution, ...
  34. [34]
    [PDF] High-Dimensional, Two-Sample Testing 1 Introduction 2 Metrics
    A related distance is the energy distance (Szekeley 1989, 2002) defined by d2(P, Q)=2E[||X − Y ||] − E[||X − X0||] − E[||Y − Y 0||]. The advantage of the energy ...
  35. [35]
  36. [36]
    [PDF] A uniform kernel trick for high and infinite-dimensional two-sample ...
    First, we obtain a Donsker property for unions of unit balls in RKHS that could be of independent interest. We establish the asymptotic validity under the null ...
  37. [37]
    [PDF] Fast calculation of p-values for one-sided Kolmogorov-Smirnov type ...
    Apr 25, 2023 · Abstract. A novel method for computing exact p-values of one-sided statistics from the Kolmogorov-. Smirnov family is presented.
  38. [38]
    None
    ### Summary of Recent Developments in KS for Machine Learning Distribution Shift Detection Post-2010
  39. [39]
    [PDF] An overview of uniformity tests on the hypersphere - arXiv
    Apr 3, 2018 · We review in this article a reasonably exhaustive collection of uniformity tests for assessing uniformity in the hypersphere. Specifically, we ...
  40. [40]
    [PDF] Accelerated Computation of a High Dimensional Kolmogorov ... - arXiv
    Jun 25, 2021 · We show that the ddKS methods perform well on small sample sizes as well as on small distribution differences. 2 DDKS TEST STATISTIC. The d- ...
  41. [41]
    [PDF] Computing the Two-Sided Kolmogorov-Smirnov Distribution
    We propose an algorithm to compute the cumulative distribution function of the two- sided Kolmogorov-Smirnov test statistic Dn and its complementary ...
  42. [42]
    Kolmogorov-Smirnov Tests - R
    A two-sample (Smirnov) test of the null hypothesis that x and y were drawn from the same distribution is performed.
  43. [43]
    Two-sample goodness-of-fit tests when ties are present
    The paper contains asymptotic results for two-sample Kolmogorov-Smirnov, Cramér-von Mises, Anderson-Darling and other related tests.
  44. [44]
    Comparison of Some Common Tests for Normality
    Chi-square test and Kolmogorov-Smirnov test have low power for the continuous alternative distributions considered. Chi-square test is the most powerful of the ...
  45. [45]
    [PDF] Determining the Statistical Power of the Kolmogorov-Smirnov and ...
    Dec 20, 2016 · All AD test simulations use an identical sample size for each iteration (n1=n2), whereas the KS simulation uses a sample size offset by 1 (n1=n2 ...Missing: effective | Show results with:effective
  46. [46]
    Statistical tests for whether a given set of independent, identically ...
    We discuss several tests for determining whether a given set of independent and identically distributed (iid) draws does not come from a specified probability ...
  47. [47]
    Kolmogorov-Smirnov and Kuiper's Tests of Time Variability - CSC
    The one-sided K-S test is used to compare a data set to a known cumulative distribution function, while the two-sided K-S test compares two different data sets.Missing: weighted improvements
  48. [48]
    [PDF] Bootstrap And Other Tests For Goodness Of Fit
    Kolmogorov and Smirnov ( 1933,1948 ) developed a one sample goodness of fit test based on empirical distribution function(EDF). Kolmogorov-Smirnov(K-S) ...Missing: enhancement | Show results with:enhancement
  49. [49]
    [PDF] A Proposed High Dimensional Kolmogorov-Smirnov Distance
    This test is widely used to compare two samples to determine whether they come from the same one-dimensional distribution.Missing: alternative | Show results with:alternative