Fact-checked by Grok 2 weeks ago

Shapiro–Wilk test

The Shapiro–Wilk test is a designed to determine whether a given sample of is drawn from a normally distributed , with the stating that the follows a . Developed by Samuel S. Shapiro and Martin B. Wilk in 1965, it evaluates the goodness-of-fit by comparing the ordered sample values to the expected values from a . The test produces a W that ranges between 0 and 1, where values close to 1 indicate strong evidence of normality, and smaller values suggest deviations such as or ; rejection of the null occurs if W falls below a at a chosen significance level (e.g., 0.05). The test statistic is computed as W = \frac{\left( \sum_{i=1}^{n} a_i x_{(i)} \right)^2}{\sum_{i=1}^{n} (x_i - \bar{x})^2}, where x_{(i)} are the ordered observations from smallest to largest, \bar{x} is the sample mean, and the coefficients a_i are pre-determined constants derived from the means, variances, and covariances of order statistics from standard normal samples of size n. These constants and critical values were originally obtained through Monte Carlo simulations for sample sizes up to 50, with later extensions allowing use for larger n up to 5,000 in modern software implementations. The method is particularly suited for complete, independent samples and assumes the data can be scaled to have mean 0 and variance 1 under the null. One of the key strengths of the Shapiro–Wilk test is its superior statistical power compared to other normality tests, such as the and , especially for detecting departures from normality in small to moderate sample sizes (typically n < 50). It excels at identifying non-normality due to asymmetry or heavy tails, making it a preferred choice in fields like biostatistics, engineering, and quality control where parametric assumptions underpin analyses such as or . Empirical studies have confirmed its robustness across various distributions, often outperforming competitors in power while maintaining appropriate type I error rates. Despite its advantages, the Shapiro–Wilk test has limitations, including reduced power for very small samples (n < 3), where it may fail to detect non-normality reliably, and increased sensitivity in large samples (n > 200), where even minor deviations from perfect normality can lead to false rejections of the . It is also computationally intensive for large n due to the need for ordering and coefficient calculations, though this is mitigated by statistical software like , , and . For multivariate data or tied observations, extensions or alternative tests may be necessary.

Overview

Definition and Purpose

The is a designed to assess whether a given random sample is drawn from a . It operates by comparing the ordered sample values to the expected values of order statistics from a standard , quantifying the degree of through a . This method was introduced as an analysis of variance-based approach specifically for complete samples, making it particularly sensitive to deviations in and . The primary purpose of the Shapiro–Wilk test is to serve as a diagnostic tool in statistical analysis, verifying the assumption that underpins many procedures, such as t-tests, analysis of variance (ANOVA), and . By identifying non-normal distributions early, it helps researchers decide whether to proceed with methods or opt for robust alternatives or transformations. The test is especially valuable for preliminary data screening in fields like , , and social sciences, where is often assumed but rarely guaranteed. At its core, the test evaluates two hypotheses: the (H₀) posits that the population from which the sample is drawn follows a , while the (H₁) asserts that it does not. Rejection of H₀ occurs when the falls below a chosen significance level (typically 0.05), indicating evidence of non-. Developed for univariate , the test performs optimally for small to moderate sample sizes ranging from n = 3 to 50, though extensions such as the Royston formulation allow reliable application up to n = 2000.

History and Development

The Shapiro–Wilk test was developed in 1965 by Samuel S. Shapiro, affiliated with , and Martin B. Wilk, affiliated with Bell Telephone Laboratories, Inc., as a response to the limitations of prior tests, such as the chi-squared goodness-of-fit method, which often lacked power for small sample sizes. Their approach sought to create a more sensitive test by deriving an optimal of order statistics, leveraging the expected values and variances from the moments of the normal distribution to better detect deviations from in complete samples. The test was formally introduced in their seminal paper, "An Analysis of Variance Test for (Complete Samples)," published in , Volume 52, Issues 3-4, pages 591-611. In this work, Shapiro and Wilk presented the W alongside precomputed coefficients and critical value tables for sample sizes ranging from 3 to 50, a practical necessity given the computational limitations of the time that precluded real-time calculation of the required normal order statistics. Subsequent extensions addressed the original test's scope limitations. In 1982, J. P. Royston proposed an algorithm extending the to larger samples up to n=2000, using approximations to maintain accuracy while enabling broader applicability. Royston further refined these approximations in 1992, providing efficient computational methods that facilitated the test's integration into statistical software packages, leading to its widespread adoption by the 1990s.

Theoretical Foundation

Test Statistic

The Shapiro–Wilk test statistic, denoted W, quantifies the degree of normality in a sample by comparing the ordered observations to their expected values under a . It is formally defined as W = \frac{\left( \sum_{i=1}^n a_i x_{(i)} \right)^2}{\sum_{i=1}^n (x_i - \bar{x})^2}, where x_{(1)} \leq \cdots \leq x_{(n)} are the ordered sample values, \bar{x} is the sample mean, and a_i (for i = 1, \dots, n) are predetermined constants derived from the properties of statistics. The derivation of W relies on the high between the ordered sample from a population and the corresponding expected statistics. Specifically, the coefficients a_i are obtained by solving for the that best approximates the expected order statistics, given by \mathbf{a} = \mathbf{V}^{-1} \mathbf{m} / (\mathbf{m}^T \mathbf{V}^{-1} \mathbf{m})^{1/2}, where \mathbf{m} is the vector of expected values of the order statistics from a standard , and \mathbf{V} is the of those order statistics. This construction ensures that the numerator represents a variance estimate aligned with , while the denominator is the total sample variance, maximizing the test's power to detect deviations from . The a_i are precomputed for sample sizes up to n = 50 (and approximated for larger n) to facilitate practical application. Under the null hypothesis of normality, W approaches 1, as the ordered sample closely matches the expected normal pattern; values substantially less than 1 indicate inconsistencies, such as or anomalies, with W bounded between 0 and 1. The optimality of the a_i coefficients for the normal distribution enhances the test's sensitivity compared to moment-based methods, which rely on or estimates and are less efficient for small samples.

Hypotheses and Assumptions

The Shapiro–Wilk test evaluates the hypothesis that a given sample originates from a . The H_0 posits that the observations are drawn from a with unspecified \mu and variance \sigma^2. The H_1 states that the sample does not come from such a , indicating some form of deviation from without specifying the nature of the departure (e.g., or ). The test rejects H_0 in favor of H_1 when the provides sufficient evidence against at a chosen level. For the test to be valid, the sample must consist of independent and identically distributed (i.i.d.) random variables from the target , ensuring that the observations are randomly selected without systematic biases or dependencies. The are required to be univariate and continuous, as the test relies on order statistics that assume distinct values; ties or can distort the results, and missing values are not accommodated in the standard formulation. Additionally, the sample size n must be at least 3, with optimal performance and exact critical values available for n between 3 and 50; larger samples may require approximations, and the test becomes highly sensitive to minor deviations; some implementations limit computation to n \leq 5000 (e.g., in ). Under H_0, the test is distribution-free with respect to the specific parameters \mu and \sigma^2, as the sampling distribution of the test statistic depends only on normality, not the location or scale. However, the procedure assumes the absence of outliers arising from data collection errors, as such anomalies can mimic non-normality; it is particularly sensitive to violations of the independence assumption, which could arise from clustered or serially correlated data.

Implementation

Calculation Steps

The computation of the Shapiro–Wilk test statistic W proceeds through a series of steps that transform the original sample into ordered values and apply specialized coefficients to assess deviation from . These steps are designed for algorithmic implementation, though manual execution is practical only for small sample sizes due to the complexity of generating the required coefficients.
  1. Sort the sample : Arrange the n observations in non-decreasing to obtain the ordered sample x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}. This step aligns the with the expected statistics under .
  2. Compute the sample mean and sum of squared deviations: Calculate the sample mean \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i and the total sum of squared deviations from the mean, \sum_{i=1}^n (x_i - \bar{x})^2, which serves as the denominator in the . These quantities provide a measure of the sample variance.
  3. Obtain the coefficients a_i: The coefficients a_i (for i = 1, \dots, n) are determined based on the expected values and covariance structure of order statistics from a standard . For sample sizes n \leq 50, these can be directly sourced from precomputed tables provided in the original formulation. For larger n, the coefficients must be calculated algorithmically, typically by solving for the a as a = V^{-1} m / \sqrt{m^T V^{-1} m}, where m is the of expected normal order statistics and V is their ; this process is computationally intensive for n > 50 due to the O(n^3) complexity of matrix inversion, though optimizations like reduce it to O(n^2) by factoring V = LL^T and solving linear systems iteratively.
  4. Calculate the test statistic W: Form the numerator as \left( \sum_{i=1}^n a_i x_{(i)} \right)^2 and divide by the sum of squared deviations computed in step 2 to yield W = \frac{\left( \sum_{i=1}^n a_i x_{(i)} \right)^2}{\sum_{i=1}^n (x_i - \bar{x})^2}. The resulting W ranges from 0 to 1, with values near 1 indicating closer conformity to . Note that the coefficients satisfy \sum a_i = 0, ensuring the numerator equivalently measures \sum a_i (x_{(i)} - \bar{x}).

Critical Values and Software

The Shapiro–Wilk test relies on precomputed critical values for the test statistic W to determine significance at common levels such as \alpha = 0.05 and $0.01, with tables available for sample sizes up to n = 50 as originally provided in the foundational work by Shapiro and Wilk. These tables include coefficients for computing W and corresponding critical thresholds, allowing manual comparison where W below the critical value indicates rejection of normality. For larger samples, exact critical values become impractical due to computational demands, so approximations or simulation-based methods are employed to derive p-values instead. P-value computation involves comparing the observed W to its distribution under the of , often using exact methods for small n or asymptotic approximations for larger samples. A key advancement is Royston's algorithm, which extends the test to n up to 5000 by approximating the distribution of W through transformations to chi-squared variables, enabling efficient calculation without exhaustive tabulation. Post-1995 refinements to this algorithm, implemented in various packages, support reliable s for samples as large as 5000 via these approximations. The test is rejecting the null hypothesis if the p-value is less than the chosen significance level \alpha, such as 0.05, indicating evidence of non-normality. Implementations are widely available in statistical software, facilitating practical use. In R, the shapiro.test() function takes a numeric vector as input and returns the W statistic along with the p-value, with support for n up to 4999 using Royston's method. In Python, scipy.stats.shapiro() from the SciPy library accepts an array-like input and outputs the test statistic and p-value, applicable for n up to approximately 5000, though it issues warnings for very large samples due to precision limits. SAS implements the test via PROC UNIVARIATE with the NORMAL option, providing W, p-value, and plots for datasets of varying sizes. In SPSS, the Shapiro–Wilk test is accessible through the Explore procedure or syntax for small to moderate samples (typically n < 50), outputting W and the p-value, but switches to alternative normality tests like Kolmogorov–Smirnov for larger n.

Interpretation and Application

Evaluating Results

The Shapiro-Wilk test evaluates through its test statistic W, which ranges from 0 to 1, and an associated p-value. A W value close to 1 indicates strong evidence of , as it measures the correlation between the ordered sample and expected normal order statistics. The p-value assesses the hypothesis of ; if p > α (commonly α = 0.05), one fails to reject the , suggesting the data appear normally distributed, whereas if p ≤ α, the is rejected, indicating significant deviation from and prompting consideration of non-parametric alternatives. Decision rules for interpreting results emphasize integrating the test with visual diagnostics for robust confirmation. For instance, a Q-Q plot should be examined alongside the test output: points aligning closely with the reference line support when combined with a non-significant , while deviations in the plot may highlight issues even if the is borderline. This combined approach mitigates reliance on the test alone, particularly for small samples where power may be limited. In practice, the Shapiro-Wilk test is frequently applied as a pre-test for in methods, such as verifying assumptions in by assessing residuals or in hypothesis testing like t-tests on biological or experimental data. For example, in analyzing residuals from a , a non-significant result allows proceeding with ordinary estimation, whereas rejection might lead to transformations or robust alternatives. Similarly, for a sample of 20 exam scores yielding W = 0.95 and p = 0.12 (at α = 0.05), normality is assumed, permitting the use of a t-test for comparing group means. The test controls the Type I error rate at the chosen α level under the of . When applied repeatedly across multiple datasets or variables, adjustments such as the (dividing α by the number of tests) are recommended to maintain overall error control and reduce false positives.

Limitations and Considerations

The Shapiro–Wilk test is most reliable for small sample sizes ranging from 3 to 50 observations, where exact critical values and p-values can be computed without approximation. For samples larger than 50, the test relies on approximations; modern implementations, such as in and , use extensions valid up to n=5000. However, as sample size increases beyond moderate levels (e.g., n > 200), the test becomes highly sensitive to even minor deviations from , frequently rejecting the for practically insignificant departures. For very large samples (n > 5000), visual methods or alternative tests like the Kolmogorov-Smirnov, which may be less sensitive, are often preferred. The test exhibits high sensitivity to outliers, which can disproportionately influence the test statistic and lead to false rejections of even when the core data distribution is approximately . It is also notably affected by ties in the data; a significant number of tied values often causes the test to reject the irrespective of the data's overall conformity to a . While effective at detecting departures such as heavy tails or , the test assumes independent and identically distributed observations without extreme anomalies that could mask these issues. As a univariate , the Shapiro–Wilk test assesses only for a single and cannot directly evaluate multivariate distributions. Furthermore, it provides no insight into the specific nature of non-normality, such as whether deviations stem from , , or other characteristics. P-values from the test can become unreliable if key assumptions, like data independence, are violated. Practical considerations include always pairing the Shapiro–Wilk test with visual diagnostics, such as quantile-quantile (Q-Q) plots or histograms, to confirm findings and mitigate the risk of over-interpretation. In large samples, where the test's may flag trivial issues, reliance on it alone should be avoided, as many methods remain robust to mild violations.

Performance and Comparisons

The power of the Shapiro–Wilk test refers to the probability of correctly rejecting the of when the data are generated from a non-normal . This property is typically evaluated through simulations, involving the generation of numerous samples from specified non-normal distributions and computing the rejection rate at a given level, such as α = 0.05, across thousands of replications. simulations have frequently shown that the Shapiro–Wilk test possesses high power among common normality tests for small sample sizes ranging from 3 to 50, though rankings vary across studies, with particularly strong performance against symmetric alternatives, including and platykurtic distributions. For instance, in simulations using 10,000 replications and sample sizes of 5 to 100, the test demonstrated superior detection rates for symmetric short-tailed deviations compared to alternatives like the Kolmogorov-Smirnov or Anderson-Darling tests. Several factors influence the test's power, including sample size extremes, where performance diminishes for very small (n < 5) or very large (n > 100) samples; it is recommended for small to moderate samples. Additionally, power varies depending on the type of deviation, with good performance against both symmetric and asymmetric alternatives like the exponential or chi-squared distributions in many cases. Quantitative results from these simulations indicate that, for moderate deviations from at n = 20 and α = 0.05, the test achieves power of approximately 0.20 against the and around 0.53 for the ; power approaches 1.0 for stronger deviations as sample size increases. Comparative studies further highlight that the Shapiro–Wilk test outperforms the chi-squared goodness-of-fit test for small samples against symmetric non-normal distributions, with higher power levels though exact differences vary. Exhaustive comparisons note variability, with some tests like Hosking's outperforming it in certain small-sample scenarios across distributions.

Comparisons to Other Tests

The Shapiro–Wilk test demonstrates superior power compared to the for detecting non-normality, especially in small sample sizes (n < 50), due to its optimal utilization of sample order statistics to compare observed data against expected normal quantiles. In contrast, the , which assesses the maximum difference between empirical and theoretical cumulative distribution functions, performs better for larger samples when parameters are known, as it is distribution-free under those conditions; when parameters are estimated from the data, the Lilliefors variant is used for adjusted critical values. Similar to the Shapiro–Wilk test, the Anderson–Darling test exhibits high power in simulation studies, but it assigns greater emphasis to deviations in the tails of the distribution through its weighted empirical distribution function, enhancing sensitivity to extreme values. The is often preferred for exact critical values in small samples (n ≤ 50), where precomputed tables ensure accurate p-values without approximation. The Jarque–Bera test, which evaluates normality based on sample skewness and kurtosis moments, shows lower power than the Shapiro–Wilk test for small to moderate samples (n < 50) but outperforms it for large samples (n > 200), leveraging asymptotic properties for efficient detection of moment-based deviations. Overall, simulation-based meta-analyses rank the Shapiro–Wilk test among the most powerful for general-purpose testing across various distributions and sample sizes up to 2000, particularly recommending it for n < 50, while suggesting ensemble approaches combining multiple tests to enhance detection reliability in practice.

Extensions and Approximations

Large Sample Approximations

The original Shapiro–Wilk test faces computational challenges for sample sizes beyond n=50, primarily due to the need for inverting a large covariance matrix to derive the optimal coefficients a_i. To address this, Royston (1982) developed an extension using polynomial expansions to approximate the moments of the distribution of the W statistic, enabling reliable computation for n up to 2000. This method, implemented in Algorithm AS 181 (also Royston, 1982), employs iterative procedures to estimate the a_i coefficients without performing a full matrix inversion, significantly improving efficiency for moderate to large samples. For even larger samples, asymptotic approximations become essential. In practice, Royston's implementation uses a more precise transformation to compute the p-value by mapping W to a standard normal variable via the formula Z = \Phi^{-1}(p) \approx \frac{(1 - W)^{1/4} - \mu}{\sigma}, where \Phi is the cumulative distribution function of the standard normal, and \mu and \sigma are mean and standard deviation parameters approximated by polynomials in \log n. This approach ensures accurate p-values for n between 100 and 2000. Further refinements came with Royston (1992), who introduced a simpler, direct approximation for the a_i coefficients applicable to any sample size n ≥ 3, eliminating the need for precomputed tables or complex iterations and extending usability in software to n exceeding 5000.

Multivariate and Other Variants

The multivariate generalization of the , proposed by Villaseñor Alva and González Estrada in 2009, extends the univariate W statistic to assess normality in p-dimensional data. This approach computes an empirical correlation between the ordered of the sample observations and the expected ordered values derived from the with p degrees of freedom, yielding a test statistic that detects deviations from multivariate normality while preserving the power advantages of the original test. The method is particularly effective for moderate dimensions but demands careful handling of the sample covariance matrix to avoid singularity, typically requiring sample sizes substantially exceeding the dimension p for reliable performance. A specialized variant for multivariate skew-normal distributions was introduced by González-Estrada, Villaseñor Alva, and Acosta-Pech in 2022, adapting the framework to test against skewed alternatives. This procedure involves an initial transformation of the data to approximate a normal form, followed by application of the generalized statistic to evaluate goodness-of-fit for the multivariate skew-normal model, offering improved detection of skewness in higher dimensions compared to standard normality tests. Robust extensions address sensitivity to outliers, as demonstrated in Coin's 2008 modification inspired by the forward search paradigm of Atkinson, Riani, and Cerioli. By incrementally incorporating observations while monitoring the Shapiro–Wilk statistic along the search trajectory, this variant isolates the influence of contaminants, enhancing robustness without sacrificing power against non-normal departures in clean data. Such approaches are valuable in contaminated datasets, where traditional tests may falsely reject normality due to isolated extreme values. Other applications of Shapiro–Wilk variants include testing normality of residuals in time series analysis, where the test is applied post-model fitting (e.g., ARIMA) to validate assumptions of Gaussian errors, and exploratory adaptations for circular data by projecting angular observations onto a linear scale before testing. These extensions maintain the test's efficiency for small to moderate samples but face challenges in high dimensions or structured data. Despite their conceptual strengths, multivariate and robust Shapiro–Wilk variants have seen limited adoption owing to computational demands, particularly in estimating Mahalanobis distances and order statistics for p > 5. Software implementations are available in , such as the mvShapiroTest package for the 2009 generalization and the MVN package supporting Royston's related multivariate extension based on Shapiro–Wilk principles.

References

  1. [1]
    7.2.1.3. Anderson-Darling and Shapiro-Wilk tests
    For more information about the Shapiro-Wilk test the reader is referred to the original Shapiro and Wilk (1965) paper and the tables in Pearson and Hartley ( ...<|control11|><|separator|>
  2. [2]
    An analysis of variance test for normality (complete samples)
    S. S. SHAPIRO, M. B. WILK; An analysis of variance test for normality (complete samples)†, Biometrika, Volume 52, Issue 3-4, 1 December 1965, Pages 591–611.
  3. [3]
    Normality Tests for Statistical Analysis: A Guide for Non-Statisticians
    Some researchers recommend the Shapiro-Wilk test as the best choice for testing the normality of data (11).
  4. [4]
    [PDF] Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors ...
    Results show that Shapiro-Wilk test is the most powerful normality test, followed by Anderson-Darling test,. Lillie/ors test and Kolmogorov-Smirnov test.
  5. [5]
    Descriptive Statistics and Normality Tests for Statistical Data - PMC
    The Shapiro–Wilk test is more appropriate method for small sample sizes (<50 samples) although it can also be handling on larger sample size while Kolmogorov– ...
  6. [6]
    An Analysis of Variance Test for Normality (Complete Samples) - jstor
    The main intent of this paper is to introduce a new statistical procedure for testing a complete sample for normality. The test statistic is obtained by ...
  7. [7]
    Normality Tests (Shapiro-Wilk, Shapiro-Franca, Royston) - StatsDirect
    StatsDirect requires a random sample of between 3 and 2,000 for the Shapiro-Wilk test, or between 5 and 5,000 for the Shapiro-Francia test. The omnibus chi- ...Missing: optimal | Show results with:optimal
  8. [8]
    [PDF] An Analysis of Variance Test for Normality (Complete Samples)
    Sep 13, 2005 · The results of this study are given in Shapiro & Wilk (1964a), only a brief extract is included in the present paper. The null distribution used ...
  9. [9]
    [PDF] APPENDIX D - Statistical Tables
    Coefficients ai for the Shapiro-Wilk test for normality. (Source: Shapiro and Wilk, 1965). Page 6. Appendix D.Missing: precomputed history
  10. [10]
    Approximating the Shapiro-Wilk W-test for non-normality
    Royston, J. P. (1982a) An extension of Shapiro and Wilk'sW test for normality to large samples.Applied Statistics,31, 115–124. Google Scholar. Royston, J. P. ...
  11. [11]
    6.3 - Tests for Error Normality | STAT 462
    W is compared against tabulated values of this statistic's distribution. Small values of W will lead to rejection of the null hypothesis. The Shapiro-Wilk test ...
  12. [12]
    [PDF] tables for shapiro–wilk w statistic according to royston approximation
    An extension of Shapiro and Wilk's W test for normality to large samples. Appl. Statist. 31, 115–124. Royston P.R. (1992). Approximating the Shapiro–Wilk W–test ...
  13. [13]
    Shapiro-Wilk Table - Real Statistics Using Excel
    With Shapiro-Wilk Original Test formula in your site, I will do Shapiro-Wilk Test. ... The reference is the original paper by Shapiro, S.S. & Wilk, M.B. (1965).Missing: derivation | Show results with:derivation
  14. [14]
    [PDF] Algorithm AS 181: The W Test for Normality J. P. Royston Applied ...
    May 21, 2007 · The W test, based on Shapiro and Wilk's statistic, is a superior omnibus test for normality, calculated for sample sizes between 3 and 2000.
  15. [15]
    Shapiro-Wilk Normality Test - R
    The algorithm used is a C translation of the Fortran code described in ⁠Royston (1995). The calculation of the p value is exact for n = 3 n = 3 n=3 ...Missing: coefficients | Show results with:coefficients
  16. [16]
    shapiro.test function - RDocumentation
    shapiro.test: Shapiro-Wilk Normality Test. Description. Performs the Shapiro-Wilk test of normality.Missing: implementation | Show results with:implementation
  17. [17]
    shapiro — SciPy v1.16.2 Manual
    Perform the Shapiro-Wilk test for normality. The Shapiro-Wilk test tests the null hypothesis that the data was drawn from a normal distribution.ShapiroShapiro-Wilk test for normality1.12.01.13.11.13.0
  18. [18]
    How to Perform a Shapiro-Wilk Test in SAS - Statology
    The Shapiro-Wilk test is used to determine whether or not a dataset follows a normal distribution. The following step-by-step example shows how to perform a ...
  19. [19]
    SPSS Shapiro-Wilk Test - The Ultimate Guide
    Aug 21, 2025 · The Shapiro-Wilk test examines if a variable is normally distributed in some population. Like so, the Shapiro-Wilk serves the exact same purpose as the ...Shapiro-Wilk Test - Null... · Running the Shapiro-Wilk Test...
  20. [20]
    Shapiro-Wilk test is missing in SPSS | SPSS Statistics
    Dec 10, 2020 · The answer for this question lies in the sample size. Shapiro-Wilk works with sample sizes to a particular threshold above which SPSS does not calculate SW.
  21. [21]
    Testing for Normality using SPSS Statistics
    The Shapiro-Wilk Test is more appropriate for small sample sizes (< 50 samples), but can also handle sample sizes as large as 2000. For this reason, we will ...<|separator|>
  22. [22]
    What should I check for normality: raw data or residuals?
    Jun 18, 2011 · I've learnt that I must test for normality not on the raw data but their residuals. Should I calculate residuals and then do the Shapiro–Wilk's W test?Does shapiro.test() automatically work out residuals on your data?R: test normality of residuals of linear model - which residuals to useMore results from stats.stackexchange.com
  23. [23]
    Multiple comparisons with normality tests - Cross Validated
    Jun 21, 2015 · The answer depends on what risks you are trying to minimize. In a typical situation with multiple testing you are trying to avoid false-positive findings.Testing for normality and Bonferroni correction - Cross ValidatedIs Shapiro–Wilk the best normality test? Why might it be better than ...More results from stats.stackexchange.com
  24. [24]
    Extension of Shapiro and Wilk's W Test for Normality to Large Samples
    Shapiro and Wilk's (1965) W statistic arguably provides the best omnibus test of normality, but is currently limited to sample sizes between 3 and 50.
  25. [25]
    Assessing the Assumption of Normality
    Its important to note that there are limitations to the Shapiro-Wilk test. As the dataset being evaluated gets larger, the Shapiro-Wilk test becomes more ...
  26. [26]
    An Approximate Analysis of Variance Test for Normality
    Apr 5, 2012 · This article presents a modification of the Shapiro-Wilk W statistic for testing normality which can be used with large samples.<|control11|><|separator|>
  27. [27]
    To test or not to test: Preliminary assessment of normality when ...
    Jun 19, 2012 · A sample or a pair of samples is not normally distributed just because the result of the Shapiro-Wilk test suggests it. From a formal ...Strategy Ii · Discussion · Type I Error And Power Of...<|control11|><|separator|>
  28. [28]
    Testing for normality in regression models: mistakes abound (but ...
    Apr 30, 2025 · For the simulated datasets drawn from normal and interval distributions, only about 5% were found to be non-normal (Shapiro–Wilk p‐value < 0.05) ...
  29. [29]
    Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors ...
    This paper compares the power of four formal tests of normality: Shapiro-Wilk (SW) test, Kolmogorov-Smirnov (KS) test, Lilliefors (LF) test and Anderson- ...Missing: seminal | Show results with:seminal
  30. [30]
    An Exhaustive Power Comparison of Normality Tests - MDPI
    Shapiro–Wilk (SW) In 1965, Shapiro and Wilk formed the original test [22]. The statistic of the test is defined as: W = ( ∑ i = 1 n a i x ( i ) ) 2 ( ∑ i = 1 n ...<|control11|><|separator|>
  31. [31]
    A comparison of normality testing methods by empirical power and ...
    This study compares normality testing methods which have been verified excellent based on power, considering significance levels, sample sizes, and alternative ...2. Normality Testing Methods · 2.1. 1. Lilliefors' Revised... · 3. The Simulation Results
  32. [32]
    Comparison of Some Common Tests for Normality
    Shapiro-Wilk test is the most powerful amongst the four normality tests for continuous –type alternative distributions while Chi-square test outperforms the ...<|control11|><|separator|>
  33. [33]
    Razali, N. and Wah, Y. (2011) Power Comparisons of Shapiro-Wilk ...
    Razali, N. and Wah, Y. (2011) Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. Journal of Statistical Modeling ...Missing: KS | Show results with:KS
  34. [34]
    Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov and ...
    May 14, 2019 · This paper compares the power of three formal tests of normality: Shapiro-Wilk test, Kolmogorov-Smirnov and Jarque-Bera test.
  35. [35]
    Remark on Algorithm AS 181: The W-Test for Normality
    Royston. ,. J. P.. (. 1982b. ) An extension of Shapiro and Wilk's W test for normality to large samples . Appl. Statist. ,. 31. ,. 115. –. 124 . Google Scholar.
  36. [36]
    A Generalization of Shapiro–Wilk's Test for Multivariate Normality
    A goodness-of-fit test for multivariate normality is proposed which is based on Shapiro–Wilk's statistic for univariate normality and on an empirical ...
  37. [37]
    Testing normality in the presence of outliers
    May 1, 2007 · Testing normality in the presence of outliers. Original Article; Published: 01 May 2007. Volume 17, pages 3–12, (2008); Cite this article.