Fact-checked by Grok 2 weeks ago

Chi-squared test

The Chi-squared test, also known as , is a non-parametric that determines whether there is a significant association between categorical variables or if observed categorical data frequencies deviate substantially from those expected under a specified . Developed by mathematician in 1900, it provides a criterion for evaluating the fit of sample data to a theoretical model without assuming , marking a foundational advancement in modern . The test computes a based on the \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}, where O_i represents observed frequencies and E_i expected frequencies, which approximately follows a under the for large sample sizes. Pearson's innovation addressed the need for a general method to test deviations in correlated systems of variables, originating from his work on random sampling and probable errors in biological and social data. Initially introduced for goodness-of-fit analysis—assessing if data conform to a hypothesized like the normal or —the test has since expanded to include the test of , which examines associations between two categorical variables in a , and the test of homogeneity, which compares distributions across multiple populations. For the independence test, degrees of freedom are calculated as (r-1)(c-1), where r and c are the number of rows and columns in the table, enabling p-value computation to reject or retain the of no association. Key assumptions include random sampling, independence of observations, and sufficiently large expected frequencies (typically at least 5 per cell to ensure the chi-squared approximation holds, as per Cochran's rule). Violations, such as small sample sizes, may necessitate alternatives like Fisher's exact test. Widely used in fields like biology, sociology, and medicine for analyzing survey data, genetic inheritance, and clinical trials, the chi-squared test remains a cornerstone of categorical data analysis due to its simplicity and robustness.

Introduction

Definition and Purpose

The is a that employs the to assess the extent of discrepancies between observed frequencies and expected frequencies in categorical data. It evaluates whether these differences are likely due to random variation or indicate a significant deviation from the . Under the , the follows an asymptotic , allowing for the computation of p-values to determine . The primary purposes of the chi-squared test are to examine between two or more categorical in tables and to test the goodness-of-fit of observed data to a specified theoretical . In the test of , it determines whether the of one depends on the levels of another, such as assessing associations in survey responses across demographic groups. For goodness-of-fit, it verifies if empirical frequencies align with expected proportions under models like uniformity or specific probability distributions. The test statistic is given by \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}, where O_i represents the observed frequencies and E_i the expected frequencies for each category i. Developed in early 20th-century statistics for analyzing categorical data, the chi-squared test is non-parametric, imposing no assumptions on the underlying distribution of the data itself, but relying on the asymptotic chi-squared distribution of the statistic under the null hypothesis. This makes it versatile for applications where data are counts or proportions without normality requirements.

Assumptions and Prerequisites

The chi-squared test requires that observations are , meaning each point is collected without influencing others, to ensure the validity of the underlying . This assumption holds when the sample is drawn as a from the , avoiding any systematic dependencies or clustering in the . Additionally, the test is designed for categorical , where variables are discrete or binned into mutually exclusive categories, rather than continuous measurements that have not been discretized. A critical assumption concerns sample size adequacy: the expected frequencies in at least 80% of the cells should be 5 or greater, with no expected frequencies less than 1, to justify the asymptotic approximation to the under the . Violations of this rule, particularly in small samples, can lead to unreliable p-values, necessitating alternatives such as exact tests like . Prior to applying the chi-squared test, users should possess foundational knowledge in , including concepts like expected values and distributions, as well as the hypothesis testing framework—encompassing and hypotheses, , and interpretation of p-values at chosen levels (e.g., α = 0.05). These prerequisites enable proper setup of the test for applications such as assessing in contingency tables.

Historical Development

Karl Pearson's Formulation

In 1900, Karl Pearson introduced the chi-squared test in a paper published in the Philosophical Magazine, presenting it as a criterion to determine whether observed deviations from expected probabilities in a system of correlated variables could reasonably be ascribed to random sampling. This formulation addressed key limitations in prior approaches to analyzing categorical data, particularly in biological contexts where earlier methods struggled to quantify discrepancies between empirical observations and theoretical expectations. Pearson's work was motivated by the need for a robust tool to evaluate patterns in genetics, building on challenges posed by datasets like those from Gregor Mendel's experiments on pea plant inheritance, which highlighted inconsistencies in fitting discrete distributions to observed frequencies. Pearson derived the test statistic as a sum of squared deviations between observed and expected frequencies, divided by the expected frequencies to account for varying scales across categories; this measure captured the overall discrepancy in a single value, inspired by the summation of squared standardized normals from multivariate normal theory. He symbolized the statistic with the Greek letter χ²—pronounced "chi-squared"—reflecting its connection to the squared form of the character χ, a notation that has persisted in statistical literature. Initially, Pearson applied the test to biological data on inheritance patterns, such as ratios in genetic crosses, enabling researchers to assess whether empirical results aligned with hypothesized Mendelian proportions under random variation. A pivotal aspect of Pearson's contribution was establishing the of the χ² under the of good fit, linking it to a with k , where k equals the number of categories minus the number of parameters estimated from the . This theoretical foundation allowed for probabilistic , with larger values of the indicating poorer fit and lower probabilities of the arising by chance alone. By formalizing this approach, Pearson provided the first systematic method for goodness-of-fit testing in categorical settings, profoundly influencing the development of modern in and beyond.

Subsequent Contributions and Naming

Following Karl Pearson's initial formulation, Ronald A. Fisher advanced the chi-squared test in the 1920s by rigorously establishing its asymptotic under the and extending its application to testing in . In his 1922 paper, derived the appropriate for the —(r-1)(c-1) for an r × c —correcting earlier inconsistencies in Pearson's approach and enabling more accurate calculations for assessing deviations from . The nomenclature distinguishes "" as the statistical procedure itself, crediting its originator, from the "," which describes the limiting of the . This arises from Pearson's adoption of the χ² ( squared) for the , while provided the foundational proof of its to the , solidifying the theoretical basis. In the 1930s, the chi-squared test became integrated into the Neyman-Pearson framework for hypothesis testing, which emphasized specifying alternative hypotheses, controlling both Type I and Type II error rates, and using p-values to quantify evidence against the . This incorporation elevated the test's role in formal inferential procedures, aligning it with broader developments in statistical . By the , the chi-squared test achieved widespread recognition in , as seen in Fisher's 1936 application to evaluate the goodness-of-fit of Gregor Mendel's experimental ratios to Mendelian expectations, revealing improbably precise results suggestive of data adjustment. In social sciences, it facilitated analysis of associations in categorical survey data, with standardization occurring through its prominent inclusion in influential textbooks like E.F. Lindquist's 1940 Statistical Analysis in Educational Research, which exemplified its use in fields such as and .

The Pearson Chi-squared Statistic

Mathematical Formulation

The chi-squared test evaluates hypotheses concerning the distribution of categorical data. The null hypothesis H_0 asserts that the observed frequencies conform to expected frequencies under a specified theoretical distribution (goodness-of-fit test) or that categorical variables are independent (test of independence), while the alternative hypothesis H_A posits deviation from this fit or presence of dependence. The Pearson chi-squared statistic is given by \chi^2 = \sum_i \frac{(O_i - E_i)^2}{E_i}, where the sum is over all categories i, O_i denotes the observed in category i, and E_i is the expected under H_0. This formulation measures the discrepancy between observed and expected values, normalized by the expected frequencies to account for varying category sizes. The statistic was originally proposed by in 1900 as a measure of for frequency distributions. The statistic derives from the multinomial likelihood under H_0, where the data follow a with probabilities yielding the expected frequencies E_i. The log-likelihood ratio test statistic G^2 = 2 \sum_i O_i \log(O_i / E_i) provides an alternative measure, but under large samples, a second-order () of the log-likelihood around the null yields the Pearson form \chi^2 asymptotically. For the goodness-of-fit test, the expected frequencies are E_i = n p_i, where n is the sample size and p_i are the theoretical probabilities for each category under H_0. In the test of independence for an r \times c , the expected frequency for (i,j) is E_{ij} = (r_i c_j) / N, where r_i is the for row i, c_j the for column j, and N the grand . Under H_0 and large sample sizes, \chi^2 approximately follows a with appropriate df, and the p-value is P(\chi^2_{df} > \chi^2_{\text{obs}}), where \chi^2_{\text{obs}} is the computed .

Asymptotic Properties and Degrees of Freedom

Under the and for sufficiently large sample sizes, the Pearson chi-squared converges in to a central with a specified number of , providing the theoretical basis for testing. This asymptotic property, established by Pearson in his foundational work, allows the use of chi-squared critical values to assess the significance of observed deviations from expected frequencies. The for the depend on the test context. In the test of for an r \times c , the are (r-1)(c-1), reflecting the number of independent cells after accounting for row and column marginal constraints. For the goodness-of-fit test involving k categories where the expected frequencies are fully specified, the are k - 1; if m parameters of the hypothesized are estimated from the , this adjusts to k - 1 - m. This asymptotic chi-squared distribution arises from the applied to the multinomial sampling model underlying the test. Under the , the standardized differences (O_i - E_i)/\sqrt{E_i} for each category i are approximately independent standard normal random variables for large expected frequencies E_i, so their squares sum to a . To conduct the test, the observed chi-squared statistic is compared to the \chi^2_{\alpha, df} from the with df at significance level \alpha; the is rejected if the statistic exceeds this value. Critical values are available in standard tables or computed via statistical software functions. The validity of the chi-squared approximation strengthens as the expected frequencies increase, typically recommended to be at least 5 in most cells to ensure reliable inference.

Primary Applications

Test of Independence for Categorical Data

The chi-squared test of independence assesses whether there is a statistically significant between two categorical variables, using an r × c that displays observed frequencies Oij for each combination of row category i (where i = 1 to r) and column category j (where j = 1 to c). This setup arises from cross-classifying a sample of N observations into the table cells based on their values for the two variables. The H0 posits that the row variable and column variable are , implying that the distribution of one variable does not depend on the levels of the other; the Ha suggests dependence or between them. Under the null hypothesis, expected frequencies Eij for each cell are computed as the product of the row total for i and the column total for j, divided by the overall sample size N: E_{ij} = \frac{(\sum_j O_{ij}) \times (\sum_i O_{ij})}{N}. These expected values represent what would be anticipated if the variables were truly , preserving the marginal totals of the observed table. The test then evaluates deviations between observed and expected frequencies using the Pearson chi-squared statistic, which approximates a with (r-1)(c-1) for sufficiently large samples (typically when all expected frequencies exceed 5). Interpretation involves computing the p-value from the chi-squared distribution of the ; if the p-value is less than the chosen level α (commonly 0.05), the is rejected in favor of the alternative, indicating evidence of dependence between the variables. To quantify the strength of any detected association beyond mere , measures such as can be applied, defined as the of the chi-squared divided by N times the minimum of (r-1) and (c-1), yielding a value between 0 (no association) and 1 (perfect association). This test is particularly common in analyzing survey data, such as examining the relationship between (rows) and preference (columns) in studies. If the test rejects independence, post-hoc analysis of cell contributions aids in identifying which specific combinations drive the result. Standardized Pearson residuals, calculated as (Oij - Eij) / √Eij, highlight deviations; residuals with absolute values exceeding about 2 (corresponding to a roughly 5% tail probability under the null) suggest cells where observed frequencies differ markedly from expectations, signaling localized associations. These residuals follow an approximate under the null, facilitating targeted interpretation while accounting for varying expected sizes.

Goodness-of-Fit Test

The chi-squared goodness-of-fit test evaluates whether the distribution of observed categorical data aligns with a predefined theoretical , providing a measure of discrepancy between observed and expected frequencies. This test is particularly valuable when assessing if sample outcomes conform to expected probabilities derived from theoretical models, such as , , or multinomial distributions. Introduced as part of the broader chi-squared framework by , it serves as a foundational tool in for distribution validation. In the standard setup, the data is partitioned into k mutually exclusive categories, yielding observed counts Oi for each category i = 1, 2, ..., k. The corresponding expected counts are then calculated as E_i = n p_i, where n is the total sample size and p_i represents the theoretical probability for category i, often set to $1/k for uniformity or derived from parametric models like the . For instance, in testing dice fairness, each of the six faces would have an expected probability of $1/6 under the assumption of uniformity. The test proceeds by computing the chi-squared statistic from these frequencies, as detailed in the mathematical formulation section. The (H0) asserts that the observed data arises from the specified theoretical , implying no significant deviation between observed and expected frequencies. The (Ha) posits that the data does not follow this , indicating a mismatch that could arise from systematic biases or non-random processes. Unique applications include verifying uniformity in generators or gaming devices like , as well as checking adherence to multinomial models in fields such as or . When the theoretical probabilities p_i involve parameters estimated directly from the sample data—such as the in a fit—the must be adjusted to account for this estimation, given by df = k - 1 - m, where m is the number of parameters fitted. This adjustment ensures the 's validity by reducing the effective freedom to reflect the information used in parameter estimation. The chi-squared goodness-of-fit is commonly applied in to assess process consistency, such as verifying that defect rates or product categorizations match expected distributional norms in .

Computational Methods

Step-by-Step Calculation

To perform a manual calculation of the chi-squared test statistic, begin by organizing the into a for tests of or a frequency table for goodness-of-fit tests, recording the observed frequencies O_i in each cell or category. Next, compute the expected frequencies E_i for each cell or category, which depend on the specific application: for a test of in categorical , these are derived from the marginal totals and overall sample size as outlined in the mathematical formulation; for a goodness-of-fit test, they are obtained by multiplying the total sample size by the hypothesized proportions for each category. Then, calculate the test statistic using the formula \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}, where the sum is taken over all cells or categories, providing a measure of deviation between observed and expected frequencies. Determine the degrees of freedom (df) based on the test type—for independence, df = (r - 1)(c - 1) where r is the number of rows and c is the number of columns in the contingency table; for goodness-of-fit, df = k - 1 - m where k is the number of categories and m is the number of parameters estimated from the data—and use a chi-squared distribution table or software to find the critical value at a chosen significance level (e.g., α = 0.05) or the p-value, referencing the asymptotic properties of the statistic for large samples. Compare the computed \chi^2 to the : if \chi^2 exceeds the (or if the < α), reject the null hypothesis H_0 of independence or good fit; otherwise, fail to reject it. Report the results in the standard format, such as "\chi^2(df) = value, p = value," to summarize the finding. Note that the chi-squared approximation is reliable only when expected frequencies meet certain conditions, such as E_i \geq 5 in at least 80% of cells with no E_i < 1, as smaller values can lead to inaccurate and may require alternative tests.

Software and Implementation Notes

The chi-squared test is implemented in various statistical software packages, facilitating both tests of independence and goodness-of-fit. In R, the chisq.test() function from the base stats package handles both types of tests on count data. For a contingency table test of independence, users provide a matrix of observed frequencies, optionally applying via the correct parameter (default: TRUE for 2x2 tables). For goodness-of-fit, the function accepts a vector of observed counts and a vector of expected probabilities via the p parameter. An example for independence is:
r
observed <- matrix(c(10, 20, 30, 40), nrow=2)
result <- chisq.test(observed, correct=FALSE)
print(result)
This outputs the chi-squared statistic, p-value, degrees of freedom, and components like observed and expected counts. In Python, the SciPy library provides scipy.stats.chi2_contingency() for tests of independence on contingency tables, returning the statistic, p-value, degrees of freedom, and expected frequencies. The function applies Yates's correction by default but allows disabling it; since version 1.11.0, a method parameter supports Monte Carlo simulation or permutation tests for improved accuracy with small samples. For goodness-of-fit, scipy.stats.chisquare() compares observed frequencies to expected ones under the null hypothesis of equal probabilities (or user-specified via f_exp). An example for independence is:
python
import numpy as np
from scipy.stats import chi2_contingency
observed = np.array([[10, 20], [30, 40]])
stat, p, dof, expected = chi2_contingency(observed)
print(f'Statistic: {stat}, p-value: {p}')
This assumes observed and expected frequencies are at least 5 for asymptotic validity. SPSS implements the test through the Crosstabs procedure (Analyze > Descriptive Statistics > Crosstabs) for independence, where users select row and column variables, then enable Chi-square under Statistics; it outputs the statistic, p-value, and optionally residuals. For goodness-of-fit, use Nonparametric Tests > Legacy Dialogs > Chi-Square, specifying test proportions. The software requires expected frequencies ≥1 with no more than 20% of cells <5. In Microsoft Excel, the CHISQ.TEST() function computes the p-value for a goodness-of-fit or independence test by comparing actual and expected ranges, with degrees of freedom as a second output via CHISQ.DIST.RT(). For example: =CHISQ.TEST(A1:B2, C1:D2) where A1:B2 holds observed and C1:D2 expected values. Software implementations often issue warnings for low expected counts, as the chi-squared approximation may be unreliable if more than 20% of cells have expected frequencies <5 or any <1. In R, chisq.test() explicitly warns if expected values are <5. Similarly, SciPy notes potential inaccuracy for small frequencies and recommends alternatives like exact tests. Residuals, useful for identifying influential cells, are accessible in outputs: R provides Pearson residuals (result$residuals) and standardized residuals (result$stdres); SciPy allows computation from returned expected frequencies as (observed - expected) / sqrt(expected); SPSS includes them in Crosstabs tables when selected. As of 2025, modern software includes simulation-based options like or bootstrapping for p-values in small samples to enhance accuracy beyond the asymptotic approximation. In R, set simulate.p.value=TRUE with B replicates for p-values. SciPy's chi2_contingency supports 'monte-carlo' or 'permutation' methods. SPSS Exact Tests module offers simulation for exact p-values in the Crosstabs dialog. These approaches resample the data to estimate the null distribution, mitigating issues with low counts.

Yates's Correction for Continuity

Yates's correction for continuity is a modification to the standard Pearson chi-squared statistic designed specifically for 2×2 contingency tables involving small sample sizes. Introduced by Frank Yates in 1934, it adjusts the test to better account for the discrete nature of categorical count data when approximating the continuous chi-squared distribution, thereby improving the accuracy of the p-value estimation. The corrected statistic is computed as \chi^2 = \sum \frac{(|O_i - E_i| - 0.5)^2}{E_i}, where O_i denotes the observed frequency in cell i, E_i the expected frequency under the null hypothesis of independence, and the subtraction of 0.5 serves as the continuity correction to mitigate the discontinuity between discrete observations and the continuous approximation. This adjustment reduces the value of the chi-squared statistic compared to the uncorrected version, making it less likely to reject the null hypothesis and thus lowering the risk of Type I error inflation in small samples. The correction is recommended for application in 2×2 tables when all expected cell frequencies are at least 1 and at least one is less than 5, as these conditions indicate potential inadequacy of the chi-squared approximation without adjustment. However, Yates explicitly advised against its use for tables larger than 2×2, where the correction has minimal impact and may unnecessarily complicate computations. Despite its historical utility, the routine use of Yates's correction remains debated among statisticians, with critics arguing that it is overly conservative, particularly in modern contexts where exact tests are computationally feasible, potentially reducing statistical power without substantial benefits in controlling error rates. Influential analyses, such as those by Agresti, highlight that the correction is often unnecessary given advancements in exact methods and simulation-based approaches.

Fisher's Exact Test and Binomial Test as Alternatives

Fisher's exact test provides an exact alternative to the chi-squared test of independence for 2×2 contingency tables, particularly when sample sizes are small and the chi-squared approximation may be unreliable. Developed by , the test computes the probability of observing the given table (or one more extreme) under the null hypothesis of independence, assuming fixed marginal totals, using the . The p-value is obtained by summing the hypergeometric probabilities of all tables with the same margins that are as or less probable than the observed table. This test is especially recommended for 2×2 tables where one or more expected cell frequencies are less than 5, as the chi-squared test's asymptotic approximation performs poorly in such cases, potentially leading to inaccurate p-values. Computationally, Fisher's exact test traditionally relies on enumerating all possible tables consistent with the fixed margins, though for larger tables this becomes intensive; modern implementations use efficient network algorithms to optimize the summation over the probability space. Despite these challenges for tables beyond 2×2, the test is routinely available in statistical software for practical use. For even simpler cases, such as testing a single proportion in a 2×1 table (e.g., comparing observed successes to an expected rate under the null), the serves as an exact alternative. This test evaluates deviations from the hypothesized proportion using the exact , calculating the p-value as the cumulative probability of outcomes as extreme as or more extreme than observed. Like , it is preferred when expected counts are small (e.g., fewer than 5 successes or failures), avoiding reliance on normal approximations inherent in large-sample methods. The p-value is computed directly from the binomial cumulative distribution function, which is straightforward and efficient even for moderate sample sizes.

Chi-squared Test for Variance

Formulation for Normal Populations

The chi-squared test for variance assesses whether the variance of a normally distributed population equals a specified hypothesized value, applying specifically to continuous data rather than the categorical data addressed by of independence or goodness-of-fit. This test evaluates the null hypothesis H_0: \sigma^2 = \sigma_0^2, where \sigma^2 is the population variance and \sigma_0^2 is the hypothesized value. The test statistic is formulated as \chi^2 = \frac{(n-1) s^2}{\sigma_0^2}, where n denotes the sample size and s^2 represents the sample variance, calculated as s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2 with \bar{x} as the sample mean. Assuming the population is normally distributed, under the null hypothesis, this statistic follows an exact with n-1 degrees of freedom, eliminating the need for asymptotic approximations unlike in categorical applications. For hypothesis testing, a two-sided alternative (H_a: \sigma^2 \neq \sigma_0^2) rejects H_0 if the p-value or test statistic falls outside critical values from the table at the chosen significance level; one-sided alternatives (H_a: \sigma^2 > \sigma_0^2 or H_a: \sigma^2 < \sigma_0^2) use the appropriate tail. This formulation emerged in the 1920s through Ronald A. Fisher's development of inference methods for normal distributions, distinct from Karl Pearson's earlier work on -squared for discrete data.

Interpretation and Limitations

The chi-squared test for variance involves computing the test statistic \chi^2 = \frac{(n-1)s^2}{\sigma_0^2}, where n is the sample size, s^2 is the sample variance, and \sigma_0^2 is the hypothesized population variance under the null hypothesis H_0: \sigma^2 = \sigma_0^2. Under H_0 and assuming normality, this statistic follows a chi-squared distribution with n-1 degrees of freedom. For a two-sided test at significance level \alpha, reject H_0 if \chi^2 > \chi^2_{1-\alpha/2, n-1} (upper ) or \chi^2 < \chi^2_{\alpha/2, n-1} (lower ); one-sided alternatives adjust the critical region accordingly. A p-value is obtained by comparing the observed \chi^2 to the chi-squared distribution, with rejection if p < \alpha. A (1-\alpha) \times 100\% confidence interval for the population variance \sigma^2 is given by \left( \frac{(n-1)s^2}{\chi^2_{1-\alpha/2, n-1}}, \frac{(n-1)s^2}{\chi^2_{\alpha/2, n-1}} \right), where the quantiles are from the with n-1 degrees of freedom; this interval contains \sigma_0^2 with probability $1-\alpha under normality. If the interval excludes \sigma_0^2, it supports rejection of H_0. For the standard deviation \sigma, take square roots of the interval bounds. This test is often paired with the t-test for the population mean in normal theory inference, where both assess aspects of a normal distribution: the t-test evaluates the mean assuming known or estimated variance, while the verifies the variance assumption. The test relies on the strict assumption that the population is normally distributed, making it highly sensitive to departures from normality, such as skewness, kurtosis, or outliers, which can distort the \chi^2 distribution and lead to invalid p-values or coverage probabilities for confidence intervals. For non-normal data, robust alternatives like (adapted for single samples) or bootstrap methods are recommended over the chi-squared approach. Additionally, the test exhibits low power for detecting deviations from H_0, especially with small to moderate sample sizes, often requiring large n (e.g., >30) to achieve adequate sensitivity to variance changes. The chi-squared test for a single variance can be generalized to compare two population variances using the , where the statistic F = s_1^2 / s_2^2 (assuming equal hypothesized variances) follows an with n_1-1 and n_2-1 under normality and H_0: \sigma_1^2 = \sigma_2^2.

Illustrative Examples

Contingency Table Example

To illustrate the chi-squared test for , consider a hypothetical 2x2 examining the association between status and diagnosis in a sample of 100 patients. The observed frequencies are as follows:
No Lung CancerTotal
401050
Non-Smokers54550
Total4555100
Under the null hypothesis of independence, the expected frequency E_{ij} for each cell is calculated as E_{ij} = \frac{(\text{row total}_i \times \text{column total}_j)}{\text{grand total}}. Thus, the expected values are:
No Lung CancerTotal
Smokers22.527.550
Non-Smokers22.527.550
Total4555100
The chi-squared test is then \chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}, yielding \chi^2 = \frac{(40-22.5)^2}{22.5} + \frac{(10-27.5)^2}{27.5} + \frac{(5-22.5)^2}{22.5} + \frac{(45-27.5)^2}{27.5} = 49.49. With 1 of (df = (rows-1) \times (columns-1)), the is much less than 0.001, leading to rejection of the of independence. To identify which cells contribute most to the significant result, standardized Pearson s are computed as r_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}}}. These are approximately 3.69 for smokers with , -3.34 for smokers without , -3.69 for non-smokers with , and 3.34 for non-smokers without . The large positive for smokers with (and corresponding negative for non-smokers with ) highlights the key association driving the departure from independence. Applying , suitable for 2x2 tables with moderate sample sizes, adjusts the statistic to \chi^2 = \sum \frac{(|O_{ij} - E_{ij}| - 0.5)^2}{E_{ij}} = 46.71, which remains highly significant (p < 0.001). This correction reduces the chi-squared value slightly but does not alter the conclusion. In summary, these results provide strong evidence of an association between and in this hypothetical dataset, with smokers overrepresented among those diagnosed.

Goodness-of-Fit Example

A classic illustration of the chi-squared goodness-of-fit test involves assessing the fairness of a six-sided die, where the states that each face appears with equal probability of \frac{1}{6}. Consider an experiment with 30 rolls, yielding the observed frequencies shown in the table below.
FaceObserved Frequency (O_i)
13
27
35
410
52
63
The expected frequency for each face under the is E_i = \frac{30}{6} = 5. The is computed as \chi^2 = \sum_{i=1}^{6} \frac{(O_i - 5)^2}{5} = 9.2. With 5 (number of categories minus 1), the corresponding is approximately 0.101, which exceeds the common significance level of 0.05, so there is insufficient evidence to reject the of a fair die. In contrast, a ed die would produce observed frequencies that lead to rejection of the null. For instance, in a large-scale experiment rolling 12 dice 26,306 times (totaling 315,672 face outcomes), the observed frequencies were 53,222 for 1, 52,118 for 2, 52,465 for 3, 52,338 for 4, 52,244 for 5, and 53,285 for 6, compared to an expected 52,612 each. This yielded \chi^2 = 24.74 with 5 and a of approximately 0.00016, providing strong evidence to reject uniformity and conclude (attributed to uneven die dimensions). When the hypothesized probabilities are not fully specified but estimated from the sample (e.g., of parameters in a distribution like or ), the must be adjusted downward to account for the ; specifically, df = k - 1 - m, where k is the number of categories and m is the number of estimated parameters.

Broader Applications and Considerations

Applications in Various Fields

The chi-squared test finds extensive application in , particularly for assessing Hardy-Weinberg equilibrium, where it serves as a goodness-of-fit test to evaluate whether observed and frequencies in a conform to expected proportions under assumptions of random mating, no selection, mutation, migration, or . This test is routinely used in studies to detect deviations that may indicate evolutionary forces at play, such as in analyses of single nucleotide polymorphisms (SNPs) in genomic data. In the social sciences, the chi-squared test is commonly employed to analyze contingency tables from survey data, testing for between categorical variables, such as the relationship between education level and income brackets. For instance, researchers use it to determine if observed crosstabulations of responses to questions on and demographic factors significantly differ from what would be expected under , informing sociological theories on . Market research leverages the chi-squared goodness-of-fit test to assess whether consumer preferences for products align with anticipated market shares, such as evaluating if brand choices among surveyed customers deviate from proportional expectations based on historical sales data. This application helps marketers identify significant shifts in behavior, guiding decisions on product positioning and strategies. In physics and , the chi-squared test for variance is applied in processes to verify if the variability in measurements from or experimental matches a specified , assuming , thereby ensuring process stability in settings like or testing. For example, it tests whether the in component dimensions conforms to design tolerances, flagging potential issues in lines. The chi-squared test is integrated into pipelines for evaluating independence between categorical features and target variables in datasets, aiding to enhance model performance in tasks. This usage supports preprocessing in areas like for customer segmentation, where it identifies non-redundant categorical inputs.

Common Pitfalls and Extensions

One common pitfall in applying the chi-squared test is ignoring cells with low expected frequencies, which can lead to invalid p-values due to the poor performance of the asymptotic . Specifically, the test assumes that expected counts are at least 5 in at least 80% of cells, with no less than 1, to ensure the adequately approximates the ; violations often result in inflated Type I error rates. Another frequent error is failing to apply for multiple testing when conducting several chi-squared tests simultaneously, such as in post-hoc analyses of tables, which increases the ; methods like the Bonferroni adjustment, dividing the significance level by the number of tests, are recommended to control this. Additionally, interpreting a significant chi-squared result as evidence of causation rather than mere association confuses statistical dependence with directional influence, as the test only assesses whether categorical variables are independent under the . Over-reliance on the asymptotic chi-squared in small samples exacerbates these issues, as the test's validity diminishes when expected frequencies are low, potentially leading to inaccurate p-values; in such cases, exact alternatives like should be considered. Extensions of the chi-squared test address limitations in handling ordered categories or complex scenarios. For , where categories have a natural order (e.g., low, medium, high satisfaction levels), the standard chi-squared test of may overlook trends; an ordinal chi-squared test incorporates scores for the ordered levels to detect monotonic associations more powerfully, such as through the Cochran-Armitage trend . In cases with sparse data or violations of asymptotic assumptions, simulation-based p-values provide a robust alternative by generating the via resampling of the observed margins, yielding more accurate inference without relying on the chi-squared approximation. For multi-way contingency tables beyond two dimensions, log-linear models extend the chi-squared framework by modeling the logarithm of expected cell frequencies as a of main effects and interactions, allowing hierarchical assessment of associations via likelihood ratio tests that parallel chi-squared goodness-of-fit evaluations. Recent developments include Bayesian analogs to the chi-squared test, which incorporate prior information using Dirichlet priors on multinomial probabilities to compute posterior probabilities of , offering advantages in small samples or when eliciting expert priors.

References

  1. [1]
    [PDF] Karl Pearson's chi-square tests - ERIC
    This study emphasizes the definition and different applications of Chi-square test. Applications of chi-square homogeneity, goodness of fit and independence ...
  2. [2]
    [PDF] chi-square test - analysis of contingency tables - University of Vermont
    PEARSON'S CHI-SQUARE. The original chi-square test, often known as Pearson's chi-square, dates from papers by Karl Pearson in the earlier 1900s. The test ...
  3. [3]
    [PDF] The χ2 Test of Goodness of Fit - UCLA Statistics & Data Science
    Mar 10, 2014 · This paper contains an expository discussion of the chi square test of goodness of fit, intended for the student and user of statistical theory.
  4. [4]
    1.3.5.15. Chi-Square Goodness-of-Fit Test
    The chi-square test (Snedecor and Cochran, 1989) is used to test if a sample of data came from a population with a specific distribution. An attractive feature ...
  5. [5]
    The Chi-square test of independence - PMC - NIH
    Jun 15, 2013 · The Chi-square test is a non-parametric statistic, also called a distribution free test. Non-parametric tests should be used when any one of the ...Missing: NIST handbook
  6. [6]
    CHI-SQUARE INDEPENDENCE TEST
    Name: CHI-SQUARE INDEPENDENCE TEST (LET) Type: Analysis Command Purpose: Perform a chi-square test of independence for a two-way contingency table.Missing: handbook | Show results with:handbook
  7. [7]
    [PDF] Karl Pearson a - McGill University
    From these ~1 groups I have found X 2 by the method of this paper. By this reduction of groups I have given. Sir George Airy's curve even a better chance than ...Missing: test | Show results with:test<|control11|><|separator|>
  8. [8]
    Chi-Square Test of Independence | Introduction to Statistics - JMP
    What do we need? · Data values that are a simple random sample from the population of interest. · Two categorical or nominal variables. Don't use the independence ...
  9. [9]
    Chi square - ScienceDirect.com
    Assumptions and limitations of a chi-square test · 1. Data included in the cells represent frequencies or counts and not percentages. · 2. Each category is ...
  10. [10]
    The Tale of Cochran's Rule: My Contingency Table has so Many ...
    The article will conclude with some advice on what to do if a contingency table has many expected values smaller than 5.
  11. [11]
    Chi-squared test and Fisher's exact test - PMC - NIH
    Mar 30, 2017 · The chi-squared test and Fisher's exact test can assess for independence between two variables when the comparing groups are independent and not correlated.Chi-Squared Test · 1. Independency Test · Fisher's Exact Test
  12. [12]
    11: Chi-Square Tests | STAT 200
    Welcome to STAT 200! 0: Prerequisite Skills · 0.1 - Review of Algebra · 0.1.1 - Order of Operations · 0.1.2 - Summations · 0.1.3 - Basic Linear Equations · 0.2 ...
  13. [13]
    Chi-Square Test for Independence - Sage Knowledge
    A key assumption of the test is that each observation is independent of each other observation. In general, this assumption is met if each ...Chi-Square Test For... · Calculations · Figure 1 Proportion...<|control11|><|separator|>
  14. [14]
    [PDF] Chapter 5 The Goodness of Fit Test
    This test was developed circa 1900 by Karl Pearson (1857–1936), in part to investigate theories of genetic in- heritance. While I cannot give you an exact ...
  15. [15]
    from ancestral heredity to Mendelian genetics (1895-1909) - PubMed
    ... Karl Pearson, and colleague, W. F. R. Weldon, rejected Mendelism as a theory of inheritance ... chi-square (X2, P) goodness-of-fit test on Mendel's data ...Missing: motivation | Show results with:motivation
  16. [16]
    Karl Pearson and the Chi-Squared Test - jstor
    Pearson's paper of 1900 introduced what subsequently became known as the chi-squared test of goodness of fit. The terminology and allusions of 80 years ago ...
  17. [17]
    On the Interpretation of χ<sup>2</sup> from Contingency Tables ...
    By R. A. FISHER, M.A., Rothamsted Experiment Station. IT is well known that the Pearsonian test of goodness of fit depends upon the calculation of the quantity ...
  18. [18]
    Pearson-Fisher Chi-Square Statistic Revisited - MDPI
    The aim of our paper was to present solutions to common problems when applying the Chi-square tests for testing goodness-of-fit, homogeneity and independence.
  19. [19]
    Has Mendel's work been rediscovered? - Taylor & Francis Online
    ... (1936). Has Mendel's work been rediscovered? Annals of Science: Vol. 1, No ... Has Mendel's work been rediscovered? R.A. Fisher M.A.Sc.D.F.R.S. University College, ...Missing: chi- squared
  20. [20]
    The impact of E.F. Lindquist's text "Statistical Analysis in Educational ...
    For instance, Lindquist illustrates Pearson's chi-square test of homogeneity using data for the number of correct and incorrect answers to a test question posed ...
  21. [21]
    [PDF] Statistical Learning with Likelihood and Bayes - Strimmer Lab
    Recall the quadratic approximation of the log-likelihood function ℓn(𝜽0) (= ... which is the well-known Pearson chi-squared statistic (note the expected.
  22. [22]
    Do all the processes have the same proportion of defects?
    The chi-square test statistic, we use the following test statistic: χ 2 = ∑ all cells ( f o − f c ) 2 f c , where f o is the observed frequency in a given cell ...Missing: squared formula
  23. [23]
    [PDF] Seven proofs of the Pearson Chi-squared independence test ... - arXiv
    Sep 3, 2018 · This paper revisits the Pearson Chi-squared independence test. After presenting the underlying theory with modern notations and showing new way ...
  24. [24]
    1.3.6.7.4. Critical Values of the Chi-Square Distribution
    A test statistic with ν degrees of freedom is computed from the data. For upper-tail one-sided tests, the test statistic is compared with a value from the table ...Missing: definition | Show results with:definition
  25. [25]
    The Analysis of Residuals in Cross-Classified Tables - jstor
    Section 2 contains a brief summary of results of Haberman [1972b] which are ... Since the Pearson chi-square for this table is 11.7 and there are 6 degrees.Missing: squared | Show results with:squared
  26. [26]
  27. [27]
    Chi-Square Goodness of Fit Test Introduction to Statistics - JMP
    The Chi-square goodness of fit test checks whether your sample data is likely to be from a specific theoretical distribution. We have a set of data values, and ...Using The Chi-Square... · Table 2: Difference Between... · Understanding ResultsMissing: prerequisite | Show results with:prerequisite<|control11|><|separator|>
  28. [28]
    Goodness of Fit (Chi-square) Test - Quality Control Plan
    Ways to examine chi-square goodness-of-fit test results to detect assumption violations. Possible alternatives if your data or chi-square goodness-of-fit test ...
  29. [29]
    11.2 - Goodness of Fit Test - STAT ONLINE
    In Step 1 you already computed the expected counts. Use this formula to compute the chi-square test statistic: Chi-Square Test Statistic. χ 2 = ∑ ( O − E ) 2 E.
  30. [30]
    Pearson's Chi-squared Test for Count Data - R
    Pearson's Chi-squared Test for Count Data. Description. chisq.test performs chi-squared contingency table tests and goodness-of-fit tests.Missing: function | Show results with:function
  31. [31]
    chi2_contingency — SciPy v1.16.2 Manual
    This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table.
  32. [32]
    chisquare — SciPy v1.16.2 Manual
    It assesses the null hypothesis that the observed frequencies (counts) are obtained by independent sampling of N observations from a categorical distribution ...1.15.2 · Scipy.stats.chisquare · 1.12.0 · 1.8.0
  33. [33]
    Chi-Square Test - IBM
    The Chi-Square Test compares observed and expected frequencies in categories to test if categories have the same proportion of values or a user-specified ...Missing: crosstabs | Show results with:crosstabs
  34. [34]
    CHISQ.TEST function - Microsoft Support
    CHISQ.TEST returns the value from the chi-squared (χ2) distribution for the statistic and the appropriate degrees of freedom. You can use χ2 tests to determine ...Missing: documentation | Show results with:documentation
  35. [35]
    Contingency Tables Involving Small Numbers and the χ<sup ... - jstor
    CONTINGENCY TABLES INVOLVING SMALL NUMBERS AND THE. X2 TEST. By F. YATES, B.A.. Introduction. THERE has in the past been a good deal of ...
  36. [36]
    [PDF] Yates and Contingency Tables: 75 Years Later
    Mar 23, 2009 · Seventy-five years ago, Yates (1934) presented an article intro- ducing his continuity correction to the χ2 test for independence in.
  37. [37]
    Yates Continuity Correction - an overview | ScienceDirect Topics
    The Yates continuity correction is used to compensate for deviations from the theoretical (smooth) probability distribution.
  38. [38]
    Full article: In Response to Kenneth J. Zucker's Comment Regarding ...
    Apr 1, 2014 · They argue that this correction is overly conservative. Much of the argument against Yates' correction is based on computer simulations ...
  39. [39]
    Statistical Method For Research Workers : Fisher, R. A
    Jan 23, 2017 · Statistical Method For Research Workers. by: Fisher, R. A. Publication date: 1934. Topics: Other. Collection: digitallibraryindia; JaiGyan.
  40. [40]
    What statistical analysis should I use? Statistical analyses using Stata
    The Fisher's exact test is used when you want to conduct a chi-square test, but one or more of your cells has an expected frequency of five or less.
  41. [41]
    Exact tests in PROC FREQ: What, when, and how - The DO Loop
    Oct 26, 2015 · The FREQ procedure in SAS can compute exact p-values for more than 20 statistical tests and statistics that are associated with contingency table.
  42. [42]
    [PDF] Some Properties of the Exact and Score Methods for Binomial ...
    The binomial model is commonly postulated for making inference about the proportion p of individuals in a population with a particular attribute of interest.
  43. [43]
    1.3.5.8. Chi-Square Test for the Variance
    The key element of this formula is the ratio s/σ0 which compares the ratio of the sample standard deviation to the target standard deviation. The more this ...Missing: independence | Show results with:independence
  44. [44]
    Lesson 12: Tests for Variances - STAT ONLINE
    ... a normal population with (unknown) mean μ and variance σ 2 , then: χ 2 = ( n − 1 ) S 2 σ 2. follows a chi-square distribution with n−1 degrees of freedom.Missing: formula | Show results with:formula
  45. [45]
    11.7: Test of a Single Variance - Statistics LibreTexts
    Jul 28, 2023 · The test statistic is ( n − 1 ) ⋅ s 2 σ 2 , where n = the total number of data , s 2 = sample variance , and σ 2 = population variance .Example 11 . 7 . 1 · Example 11 . 7 . 2 · Exercise 11 . 7 . 2 · Formula Review
  46. [46]
    12.1 - One Variance | STAT 415
    We reject the null hypothesis in favor of the two-sided alternative hypothesis if the test statistic is either smaller than 23.654 or greater than 58.120.Missing: limitations | Show results with:limitations
  47. [47]
    [PDF] Chi-squared (2) (1.10.5) and F-tests (9.5.2) for the variance of a ...
    random variable with q + r degrees of freedom. 3. Sample variance: Pick X1,...,Xn from a normal distribution ... Test at significance level α = 5%:. H0 : σ. X.Missing: interpretation limitations
  48. [48]
    AP Statistics Curriculum 2007 Estim Var - SOCR Wiki
    Mar 3, 2020 · To form a confidence interval for the variance (σ2), use the χ2(df=n−1) distribution with degrees of freedom equal to one less than the sample ...
  49. [49]
  50. [50]
    Section 9.3: Confidence Intervals for a Population Standard Deviation
    Note: If you need a confidence interval about the population standard deviation, take the square root of the values in the resulting confidence interval.
  51. [51]
    Lesson 4: Confidence Intervals for Variances - STAT ONLINE
    Test and CI for One Variance ... The chi-square method is only for the normal distribution. The Bonett method cannot be calculated with summarized data.
  52. [52]
    [PDF] Basic Data Analysis Principles - International Cost Estimating and ...
    ➢ t test for mean. ➢ Is the Cost Growth Factor (CGF) for NAVAIR programs different than 1.0? ➢ Chi square test for variance ... Normal distribution properties.
  53. [53]
    Power One-Sample Variance Test | Real Statistics Using Excel
    On this webpage, we show how to determine the statistical power of a one-sample chi-square test of the variance. We also show how to determine the minimum ...
  54. [54]
    1.3.6.6.6. Chi-Square Distribution - Information Technology Laboratory
    Probability Density Function, The chi-square distribution results when ν independent variables with standard normal distributions are squared and summed.
  55. [55]
    None
    ### Hypothetical Example: Smoking and Lung Function Impairment
  56. [56]
    2.4 - Goodness-of-Fit Test | STAT 504
    The goodness-of-fit test is applied to corroborate our assumption. Consider our dice example from Lesson 1. ... Chi-Square Test for Specified Proportions. Chi- ...
  57. [57]
    [PDF] Lecture 25 Testing Goodness of Fit Using Chi-Square
    Pearson's Chi-Square Statistic​​ The more the observed frequencies deviate from the expected frequencies, the larger is the χ2-statistic, and • the stronger is ...Missing: inheritance | Show results with:inheritance
  58. [58]
    Chi-Square Goodness of Fit Test - Yale Statistics and Data Science
    The chi-square goodness of fit test checks how well a model fits data, evaluating if observed values are close to expected values under the model.
  59. [59]
    Some Notes on Chi-Square and Exact Tests for Hardy-Weinberg ...
    The ordinary chi-square test has the highest rejection rates, followed by exact selome and corrected chi-square. When the criterion for inclusion of a SNP is ...
  60. [60]
    [PDF] Hardy Weinberg Equilibrium - Carol Lee
    • The chi-squared distribution is used because it is the sum of squared normal distributions. • Calculate Chi-squared test statistic. • Figure out degrees of ...
  61. [61]
    Chi-Square - Sociology 3112 - The University of Utah
    Apr 12, 2021 · The chi-square test is a hypothesis test designed to test for a statistically significant relationship between nominal and ordinal variables ...Missing: history | Show results with:history
  62. [62]
    Chi Square | Practical Applications of Statistics in the Social Sciences
    A chi-square test is a statistical test used to compare observed results with expected results. The purpose of this test is to determine if a difference ...
  63. [63]
    Chi Square Test for Association in SPSS - Explained, Performing
    Researchers often use it in fields such as social sciences, biology, and marketing to determine if an association exists between two categorical factors.1. Introduction · Effect Size Chi-Square Test · Get Support For Your Spss...
  64. [64]
    Deciphering Market Dynamics: Harnessing Chi Square Statistics for ...
    Jun 13, 2024 · Chi-squared tests provide a practical and advantageous framework for the examination of categorical variables present in secondary market ...
  65. [65]
    Chi-Square Test for Variance - MetricGate Calculator
    Learn how to conduct a chi-square test for variance with MetricGate. This test helps evaluate if the variance of a population differs from a known value.Overview · R Code Example · LimitationsMissing: engineering | Show results with:engineering
  66. [66]
    Chi Square and Machine Learning - IBM TechXchange Community
    Jul 21, 2023 · Chi-Square test can be used as a filter-based feature selection method to rank and select the most relevant categorical features in a dataset.Missing: 2020s | Show results with:2020s
  67. [67]
    Chi-square test in Data Science & Data Analytics - GeeksforGeeks
    Sep 8, 2025 · Chi-Square test helps us determine if there is a significant relationship between two categorical variables and the target variable.
  68. [68]
    Simulation methods to estimate design power: an overview for ...
    Jun 20, 2011 · We review an approach to estimate study power for individual- or cluster-randomized designs using computer simulation.
  69. [69]
    Practical guide to calculate sample size for chi-square test in ...
    May 26, 2025 · The chi-square test is used to determine whether there is a significant association between categorical variables [2]. · Degrees of Freedom ( ...
  70. [70]
    How does multiple testing correction work? - PMC - NIH
    Perhaps the simplest and most widely used method of multiple testing correction is the Bonferroni adjustment. If you are using a significance threshold of α, ...
  71. [71]
    12.4 Post Hoc Tests - Passion Driven Statistics
    For post hoc tests following a Chi-Square, we use what is referred to as the Bonferroni Adjustment. Like the post hoc tests used in the context of ANOVA, this ...
  72. [72]
    Association, correlation and causation | Nature Methods
    Sep 29, 2015 · Association should not be confused with causality; if X causes Y, then the two are associated (dependent). However, associations can arise ...
  73. [73]
    2.8: Small Numbers in Chi-Square and G–Tests - Statistics LibreTexts
    Jan 8, 2024 · Chi-square and G–tests are somewhat inaccurate when expected numbers are small, and you should use exact tests instead. A suggestion is to use a ...
  74. [74]
    4: Tests for Ordinal Data and Small Samples - STAT ONLINE
    The tests discussed so far that use the chi-square approximation, including the Pearson and LRT for nominal data as well as the Mantel-Haenszel test for ...
  75. [75]
    Simulation Chi-square Test | Real Statistics Using Excel
    Describes how to calculate a quasi-exact version of the chi-square test for independence using simulation in Excel. Examples and software are provided.
  76. [76]
    10.2 - Log-linear Models for Three-way Tables - STAT ONLINE
    In this section we will extend the concepts we learned about log-linear models for two-way tables to three-way tables.
  77. [77]
    The Calibrated Bayesian Hypothesis Test for Directional ...
    May 23, 2024 · A novel Bayesian analogue to the frequentist χ 2 test is introduced. The test is based on a Dirichlet-multinomial model under a joint sampling ...