Wilcoxon signed-rank test
The Wilcoxon signed-rank test is a nonparametric statistical hypothesis test designed to evaluate whether the median of the differences between two related samples, matched pairs, or repeated measurements on a single sample is significantly different from zero.[1] Developed by American statistician Frank Wilcoxon in 1945, it provides a robust alternative to the parametric paired t-test, particularly when the data violate assumptions of normality or when dealing with ordinal or non-normal continuous data.[2] The test is especially useful for small sample sizes and focuses on the ranks of the differences rather than their raw values, making it less sensitive to outliers.[3] The test assumes that the observations are paired and drawn randomly from the same population, that the dependent variable is continuous or ordinal, and that the distribution of the differences is symmetric around the median.[4][5] It does not require normality, equal variances, or independence between pairs beyond the matching structure, which enhances its applicability in fields like medicine, psychology, and environmental science for analyzing pre-post treatment effects or matched experimental designs.[3][6]Background
Definition and Purpose
The Wilcoxon signed-rank test is a non-parametric statistical method used to assess whether the median of a population of paired differences or a single sample is equal to zero or a specified value, particularly in scenarios where data do not meet the normality assumptions required for parametric tests. It serves as a rank-based alternative to the paired t-test, applicable when the underlying distribution is symmetric around the median, allowing for robust hypothesis testing in the presence of outliers or non-normal data.[1] By focusing on the ranks of the absolute differences rather than the raw values, the test provides evidence against the null hypothesis of no median shift, assuming symmetry of the distribution around the median, providing a more powerful option than simpler methods like the sign test, which only considers the direction of differences.[7] The core purpose of the test is to determine if there is a significant shift in the median for paired observations, such as before-and-after measurements on the same subjects, or for a single sample against a hypothesized median, without relying on strong distributional assumptions beyond symmetry.[8] This non-parametric approach ranks the absolute deviations from the median and incorporates the signs of the differences to compute a test statistic, thereby capturing both the magnitude and direction of deviations in a manner that is distribution-free under the null. The symmetry assumption around the median ensures that positive and negative ranks are equally likely under the null hypothesis of no median shift.[9] Introduced by Frank Wilcoxon in his seminal 1945 paper, the test has become a standard tool in statistical analysis for its efficiency in handling ordinal or continuous data with potential skewness.[2]Historical Development
The Wilcoxon signed-rank test was developed by Frank Wilcoxon, an American chemist employed at the American Cyanamid Company, and introduced in his 1945 paper "Individual Comparisons by Ranking Methods," published in the Biometrics Bulletin.[2] This work presented the test as a non-parametric alternative to the paired t-test, particularly suited for analyzing differences in paired observations where normality assumptions might not hold, with an original emphasis on applications in agricultural yield comparisons and industrial quality control experiments.[2][10] Building on earlier non-parametric concepts like the sign test, which dated back to the 18th century but ignored the magnitude of differences, Wilcoxon's method incorporated ranks of the absolute differences to enhance sensitivity and power.[11] The test's design reflected the growing need for robust statistical tools in experimental sciences during the mid-20th century, where data often deviated from parametric ideals.[10] Following its inception, the Wilcoxon signed-rank test saw rapid adoption and integration into the expanding field of non-parametric statistics. Sidney Siegel's influential 1956 textbook Nonparametric Statistics for the Behavioral Sciences significantly popularized the procedure, standardizing its notation (using T for the test statistic) and embedding it within comprehensive frameworks for rank-based inference across disciplines like psychology and biology.[12][10] By the late 1950s, refinements such as improved tables for small-sample exact distributions and normal approximations for larger samples further solidified its role as a cornerstone of non-parametric hypothesis testing.[11]Applications and Hypotheses
One-Sample Case
The one-sample Wilcoxon signed-rank test evaluates whether the median of a population from a single sample equals a hypothesized value μ₀, serving as a nonparametric alternative to the one-sample t-test when normality assumptions are violated.[1] The null hypothesis H₀ posits that the population median is exactly μ₀, while the alternative hypotheses can be two-sided (median ≠ μ₀) or one-sided (median > μ₀ or median < μ₀), depending on the research question.[13] This formulation allows testing for shifts in central tendency without relying on parametric assumptions about the underlying distribution shape, beyond basic continuity.[14] To apply the test, the observed data points x_i are first transformed into differences d_i = x_i - μ₀ for i = 1 to n, centering the sample around the hypothesized median and reducing the problem to assessing whether the median of these differences is zero.[15] The procedure then proceeds by ranking the absolute values of these nonzero differences |d_i|, preserving the original signs of the d_i to form signed ranks that capture both magnitude and direction of deviations from μ₀.[16] This signed ranking approach emphasizes the ordinal structure of the deviations, making the test robust to outliers compared to mean-based methods.[1] A key prerequisite for the one-sample Wilcoxon signed-rank test is that, under the null hypothesis, the population distribution is symmetric around the median μ₀, ensuring that the test validly assesses the location parameter of interest.[17] Without this symmetry assumption, the test may instead evaluate a different location measure, such as the pseudo-median, rather than the true median.[1] This setup parallels the paired data application of the test, which similarly centers on differences but derives them from within-pair observations rather than a fixed reference value.[2]Paired Data Case
The Wilcoxon signed-rank test for paired data evaluates whether two related samples exhibit a median difference of zero, making it suitable for assessing location shifts in non-normal distributions.[1] This application focuses on paired observations, such as measurements from the same subjects under two conditions, where the paired differences d_i = x_i - y_i are analyzed for evidence of systematic change.[18] Under the null hypothesis H_0, the median of the paired differences is zero, implying that the distribution of differences is symmetric around zero.[1][19] Alternative hypotheses include a two-sided test where the median difference ≠ 0, or one-sided tests where the median difference > 0 or < 0.[18] This framework assumes the differences follow a continuous distribution symmetric about the median but does not require normality, providing robustness against outliers compared to parametric alternatives like the paired t-test.[1] The test is commonly applied in before-after studies or matched-pairs designs, such as evaluating treatment effects through pre- and post-intervention measurements on the same individuals.[20][16] In these contexts, it emphasizes detecting shifts in central tendency without distributional assumptions, as originally proposed for comparing paired observations via ranked differences.[2] Conceptually, the test statistic arises from ranking the absolute differences |d_i| (excluding zeros), retaining the signs of the original d_i, and computing the sum of ranks for positive differences (W^+) and negative differences (W^-).[1] The smaller of these sums typically serves as the test statistic, reflecting deviations from the expected balance under H_0.[18] This approach can be seen as an extension of the one-sample case, treating the pairs as deviations from a constant value of zero.[19]Procedure
Calculating the Test Statistic
The Wilcoxon signed-rank test statistic is computed from paired observations or one-sample data centered at a hypothesized value, following a structured procedure that emphasizes the ranking of absolute differences while preserving the direction of each difference. This method was originally proposed by Wilcoxon for analyzing matched pairs to detect shifts in location without assuming normality.[21] The calculation begins with preparing the data by computing the differences d_i for each pair (or deviation from the hypothesized median in the one-sample case), where i = 1, 2, \dots, N and N is the total number of observations. Zero differences are discarded, as they provide no information on the direction of shift, leaving n non-zero differences (with full handling of zeros and their impact deferred to specialized adjustments). The absolute values |d_i| are then ranked in ascending order from 1 to n, where tied values receive the average of the ranks they would occupy; for example, if two |d_i| tie for ranks 3 and 4, each is assigned rank 3.5.[22][1] Next, the ranks r_i are signed according to the original sign of d_i: positive if d_i > 0, negative if d_i < 0. The test statistic W is typically the sum of the positive signed ranks, given by the formula W = \sum_{i: d_i > 0} r_i, where r_i is the rank of |d_i|. Some implementations use the minimum of the sum of positive ranks W^+ and the sum of negative ranks W^- (noting that W^+ + W^- = \frac{n(n+1)}{2}) to enhance symmetry in certain analyses, but the standard W focuses on the positive sum for one-sided testing. This signed ranking captures both the magnitude and direction of differences, providing a robust measure of evidence against the null hypothesis of symmetry around zero.[22][1] For tied absolute differences, the average rank assignment ensures unbiased ordering while maintaining the total rank sum as \frac{n(n+1)}{2}, though exact adjustments for ties in inference are addressed separately. This ranking approach prioritizes relative magnitudes over raw values, making the statistic distribution-free under the null.[22]Handling Zeros and Ties
In the Wilcoxon signed-rank test, pairs with zero differences (d_i = 0) are excluded from the ranking procedure, as they contribute neither positive nor negative ranks to the test statistic. This exclusion reduces the effective sample size n to the number of non-zero differences, affecting the sample size used in distribution tables or approximations for inference. In the context of paired data, such zeros represent instances of no change between observations, which can be analyzed separately using the sign test to evaluate the directionality among the non-zero pairs.[23] Tied values among the absolute differences (|d_i|) are handled by assigning average ranks to the affected observations. The average rank for a group of k tied values occupying positions from j to j + k - 1 is given by (j + (j + k - 1))/2, or more generally, the sum of the consecutive ranks divided by k. This approach maintains the integrity of the ranking process by distributing the ranks evenly, avoiding arbitrary assignments that could bias the test.[24] Excluding zeros diminishes the test's power, as fewer observations inform the rank sum, potentially leading to less sensitive detection of deviations from the null hypothesis. Ties require an adjustment to the variance of the test statistic, typically a decrease compared to untied data, with the average rank correction accounting for this effect. Researchers are advised to report the adjusted n after exclusions to clarify the effective sample contributing to the results.[25]Inference
Null Distribution
Under the null hypothesis that the distribution of the paired differences is symmetric about zero (or a specified median), the Wilcoxon signed-rank test statistic W (the sum of the ranks assigned to the positive differences) follows a discrete probability distribution. This distribution arises because, under the null, the signs of the differences are independent of their absolute values and ranks, with each non-zero difference equally likely to be positive or negative. Consequently, the possible values of W are symmetric around the mean, reflecting the equal probability of mirrored rank sums for positive and negative signs.[26] The expected value of W under the null is E(W) = \frac{n(n+1)}{4}, where n denotes the number of non-zero differences. The variance is \operatorname{Var}(W) = \frac{n(n+1)(2n+1)}{24} when there are no ties among the absolute differences. In the presence of ties, the variance requires adjustment via a correction factor that accounts for the reduced variability in the ranks, as detailed in standard nonparametric references.[26][27] For small sample sizes, typically up to n \approx 20 to $25, the exact null distribution is computed by enumerating all $2^n possible sign assignments to the fixed ranks of the absolute differences, each with probability $1/2^n, and determining the resulting distribution of W. These distributions have been tabulated in early works to facilitate critical value determination, providing the foundation for exact p-value calculations without relying on approximations.[2]Exact and Approximate Tests
The exact test for the Wilcoxon signed-rank test involves computing the p-value directly from the null distribution of the test statistic W, which is obtained by enumerating all possible sign assignments to the ranks under the null hypothesis of symmetry about zero. For small sample sizes, this permutation-based approach yields the precise probability, where the one-sided p-value is the proportion of null distribution values less than or equal to the observed w (or greater than or equal to w for the other tail), and the two-sided p-value is calculated as $2 \times \min(P(W \leq w), P(W \geq w)).[1][28] For larger samples, an approximate test uses the normal distribution to assess significance, standardizing the test statistic as Z = (W - \mu)/\sigma, where \mu = n(n+1)/4 is the expected value and \sigma = \sqrt{n(n+1)(2n+1)/24} is the standard deviation under the null hypothesis. The p-value is then obtained from the standard normal distribution using this Z-score, often via tables or software. A continuity correction can be applied optionally to improve accuracy for discrete data, yielding Z' = Z \pm 0.5, particularly when n > 20.[1]/13%3A_Nonparametric_Tests/13.04%3A_Wilcoxon_Signed-Rank_Test) The exact test is preferred for small n (typically n \leq 20) to avoid errors from approximation, while the normal approximation is suitable and computationally efficient for larger n, with modern software capable of handling both methods seamlessly.[1][28]Practical Aspects
Effect Size
The effect size for the Wilcoxon signed-rank test quantifies the magnitude of the median difference between paired samples, providing a measure of practical significance independent of sample size. A commonly used effect size is the standardized rank sum, denoted as r, calculated as r = \frac{Z}{\sqrt{n}}, where Z is the standardized test statistic from the normal approximation and n is the number of pairs (excluding ties and zeros). This measure is analogous to Cohen's d in parametric tests, representing the effect in standard deviation units adjusted for the rank-based nature of the test. The interpretation of r follows guidelines similar to those for correlation coefficients: an absolute value |r| < 0.3 indicates a small effect, $0.3 \leq |r| < 0.5 a medium effect, and |r| \geq 0.5 a large effect.[29] The sign of r reflects the direction of the median shift, positive if the first sample tends to exceed the second. This effect size is computed post-test using the observed Z value, making it straightforward to obtain after performing the Wilcoxon signed-rank procedure. An alternative effect size is the Vargha-Delaney A statistic, which estimates the probability that a randomly selected observation from the first sample exceeds its paired counterpart in the second sample, derived from the ranks of the absolute differences.[30] For the paired case, A is calculated as the sum of the ranks for positive differences divided by the total sum of ranks for non-zero differences, providing a non-parametric measure of stochastic superiority robust to the distribution shape. Values of A range from 0 to 1, with 0.5 indicating no effect; interpretations include small effects for A near 0.56, medium near 0.64, and large near 0.71 or higher.[30] These effect sizes are inherently non-parametric, offering robustness to outliers and non-normal distributions by relying on ranks rather than raw data moments. They emphasize the practical importance of median shifts in applications like before-after studies or matched pairs designs, complementing the test's p-value focused on statistical significance.Software Implementations
The Wilcoxon signed-rank test is implemented in various statistical software packages, facilitating its application in data analysis workflows. These implementations typically support both exact permutation-based p-values for small samples and normal approximations for larger datasets, with options to handle paired data and alternative hypotheses. In the R programming language, thewilcox.test() function from the base stats package performs the Wilcoxon signed-rank test for paired samples by setting the paired = TRUE argument.[31] It computes the test statistic W as the sum of ranks for positive differences and provides options for exact p-values via the exact = TRUE parameter (default for small n), confidence intervals with conf.int = TRUE, and specification of alternative hypotheses such as "two.sided", "greater", or "less".[31]
Python's SciPy library implements the test through the scipy.stats.wilcoxon() function, which takes two arrays of paired observations and returns the test statistic and p-value.[32] Key parameters include alternative for one- or two-sided tests and zero_method to handle zero differences (options: "zsplit", "pratt", or "wilcox", with "wilcox" as default, assigning half-ranks to zeros).[32] The function uses exact computation for small samples (n ≤ 50 by default) and switches to a normal approximation otherwise.[32]
In IBM SPSS Statistics, the Wilcoxon signed-rank test is available under Nonparametric Tests > Related Samples, where users select paired variables for analysis. The procedure outputs the test statistic W, asymptotic p-value, and an effect size estimate (such as r), with automatic handling of ties by averaging ranks. For small samples, exact tests can be requested via the Exact Tests module.
SAS implements the test in PROC UNIVARIATE, which by default includes the Wilcoxon signed-rank test under "Tests for Location" for one-sample data or differences of paired samples.[33] The procedure computes the signed rank statistic S and provides both exact and asymptotic p-values, with ties handled automatically by averaging ranks across tied values.[33] PROC NPAR1WAY can be used for related extensions but is primarily for independent samples.[34]
Most statistical software defaults to a normal approximation for the test statistic when the effective sample size exceeds 20 (after excluding zeros and ties), as this provides computational efficiency while maintaining accuracy; users must explicitly specify exact methods for smaller samples to obtain permutation-based results.[35]