Fact-checked by Grok 2 weeks ago

Fisher's exact test

Fisher's exact test is a statistical significance test employed to assess the independence of two categorical variables in a 2×2 contingency table by calculating the exact probability of observing the given data or more extreme outcomes under the null hypothesis, conditioning on the fixed marginal totals.^[1] This method relies on the hypergeometric distribution to derive p-values without approximations, making it particularly suitable for small sample sizes where asymptotic tests like the chi-squared test may produce inaccurate results.^[2] Developed by British statistician Ronald A. Fisher, the test originated in the context of his famous "lady tasting tea" experiment, which illustrated randomized experimental design and hypothesis testing principles.^[3] Introduced in Fisher's 1935 book The Design of Experiments, the test addresses the limitations of approximate methods by providing an exact conditional inference for contingency tables, ensuring validity even with sparse data. In practice, it evaluates whether the observed distribution of categories across groups deviates significantly from what would be expected under independence, often yielding a two-sided or one-sided p-value based on the sum of probabilities for all tables as or more extreme than the observed one.^[4] Unlike the Pearson chi-squared test, which uses a normal approximation and requires expected cell frequencies of at least 5 for reliability, Fisher's exact test is preferred in medical, biological, and social science research involving rare events or limited observations to avoid inflated Type I error rates.^[5] For instance, in analyzing clinical trial outcomes with few successes, the test computes the probability as P = \frac{\binom{a+b}{a} \binom{c+d}{c}}{\binom{n}{a+c}}, where a, b, c, d represent cell counts and n is the total sample size, enumerating all possible tables with the same margins.^[6] The test's computational demands historically limited its use to manual calculations or permutations for small tables, but modern software implementations, such as in R or Python's SciPy library, enable efficient application to larger datasets via algorithms like those by Clarkson et al. for network-based computation.^[7] Despite its exact nature, controversies persist regarding one-sided versus two-sided interpretations and the choice between Fisher's test and alternatives like Barnard's test for certain scenarios, though Fisher's remains the standard for 2×2 tables in randomized experiments.^[8] Its enduring relevance lies in providing rigorous inference for categorical data analysis, underpinning fields from genetics to epidemiology where precise significance assessment is critical.^[9]

Introduction and Background

Purpose and Scope

Fisher's exact test is a nonparametric statistical procedure designed to assess the independence of two categorical variables in a 2×2 contingency table under the condition of fixed marginal totals. It evaluates the null hypothesis that there is no association between the variables by computing the exact probability of observing the given table or one more extreme, assuming the row and column margins are fixed.^[2] This approach relies on the hypergeometric distribution as the underlying probability model for the cell frequencies.^[5] The test is particularly valuable for examining associations between two binary variables, such as treatment versus control groups and success versus failure outcomes in clinical trials with limited sample sizes.^[10] For instance, it is commonly applied in randomized phase II cancer studies to compare binomial proportions between groups when data sparsity might invalidate approximate methods.^[10] In regulatory contexts, such as FDA evaluations of therapeutic efficacy, Fisher's exact test is used to detect differences in response proportions while maintaining exact control over type I error rates.^[11] While primarily scoped to 2×2 tables due to computational feasibility, the test can be extended to larger contingency tables through generalized exact methods, though these are less routinely available in standard software packages and require more intensive enumeration of possible tables.^[5] Fisher's exact test is favored over asymptotic approximations like the chi-squared test when one or more expected cell frequencies are small, generally less than 5, as the normal approximation underlying chi-squared becomes unreliable in such scenarios, potentially yielding misleading inference.^[2] This exact nature ensures precise p-values without relying on large-sample assumptions, making it the method of choice for small-sample categorical analyses.^[12]

Historical Context

Fisher's exact test was developed by the British statistician and geneticist Ronald A. Fisher in 1934, as part of his broader contributions to exact methods of statistical inference, particularly for handling small sample sizes in categorical data analysis.^[13] Fisher introduced the test in the fifth edition of his influential book Statistical Methods for Research Workers, where he outlined its application to contingency tables with fixed marginal totals, emphasizing the need for precise probability calculations when approximations like the chi-square test were unreliable.^[14] The test's origin is closely tied to the celebrated "Lady Tasting Tea" experiment, inspired by a conversation at Rothamsted Experimental Station. Muriel Bristol, a colleague of Fisher, asserted that she could discern whether milk had been poured into a cup before or after the tea infusion, challenging the common assumption that the order made no sensory difference. To test her claim rigorously, Fisher devised an experiment involving eight cups of tea—four prepared with milk first and four with tea first—presented in random order; success required correctly identifying all eight, which translated to a specific 2×2 contingency table configuration under fixed row and column totals.^[15] Fisher elaborated on this experiment and the underlying exact test in his 1935 book The Design of Experiments, using it to illustrate principles of randomization, null hypothesis testing, and the evaluation of significance through exact probabilities rather than approximations.^[15] This publication solidified the test's foundational role in experimental design. Following its introduction, Fisher's exact test saw rapid early adoption in biostatistics, where it became a standard tool for analyzing small-sample categorical data in fields like genetics and clinical research, offering a non-approximate method to assess independence in 2×2 tables and addressing limitations of earlier distributional assumptions.^[13]

Mathematical Foundations

Derivation from Hypergeometric Distribution

Fisher's exact test is derived under the null hypothesis that there is no association between the row and column classifications in a contingency table, meaning the observed cell frequencies arise randomly given fixed marginal totals. This setup models the data generation process as sampling without replacement from a finite population, leading to a multivariate hypergeometric distribution for the joint cell probabilities.^[5] Consider a 2×2 contingency table with cell entries a, b, c, and d, where the row totals are fixed at n = a + b and m = c + d, and the column totals are fixed at k = a + c and N - k = b + d, with N = n + m being the grand total. Under the null hypothesis of independence, the probability of observing any particular table configuration with these fixed margins follows the multivariate hypergeometric distribution, as the table entries represent the outcomes of distributing N items into cells without regard to order beyond the margins.^[16] The marginal distribution of a single cell entry, such as a, is then hypergeometric. Specifically, a can be viewed as the number of "successes" (e.g., row 1 and column 1 assignments) in n draws without replacement from a population of N items, where k items are of the success type. The probability mass function is

P(A = a) = \frac{\dbinom{k}{a} \dbinom{N - k}{n - a}}{\dbinom{N}{n}},

for a = \max(0, n + k - N), \dots, \min(n, k). This formula arises directly from the combinatorial nature of the hypergeometric model, enumerating the ways to achieve a successes while respecting the fixed totals.^[17] Fisher justified conditioning on the observed marginal totals using his principle of conditionality, which posits that inference should be based on the conditional distribution given sufficient ancillary statistics to eliminate nuisance parameters. Here, the marginal totals are ancillary for the association parameter under the null, as their distribution does not depend on the odds ratio, making the conditional hypergeometric distribution the appropriate basis for exact inference.^[16]^[18]

Probability Mass Function and Test Statistic

The probability mass function underlying Fisher's exact test derives from the hypergeometric distribution, which models the probability of observing a specific configuration in a 2×2 contingency table under the null hypothesis of independence, conditional on fixed marginal totals.^[19] For a table with observed cell counts a, b, c, and d, where row totals are n = a + b and m = c + d, column totals are r = a + c and s = b + d, and grand total N = n + m, the probability of the observed table is

P(A = a) = \frac{n! \, m! \, r! \, s!}{N! \, a! \, b! \, c! \, d!}.

This formula gives the probability mass for the exact observed configuration, with b = n - a, c = r - a, and d = s - b.^[2] The possible values of a range from max(0, n + r - N) to min(n, r), and the probabilities sum to 1 over this support. The test statistic in Fisher's exact test is typically the observed cell value a (or equivalently, the odds ratio, though a is used directly for ordering extremeness).^[19] The p-value is computed as the sum of the hypergeometric probabilities for all possible tables that are as extreme as or more extreme than the observed table in the direction of deviation from independence. For a one-sided test assessing positive association (e.g., larger a indicating association), the p-value is

p = \sum_{i = a}^{\min(n, r)} P(A = i),

summing the tail probabilities from the observed a to the maximum possible value.^[2] The one-sided test in the opposite direction sums the lower tail analogously. For the two-sided test, Fisher's original approach sums the probabilities of all tables whose probability mass is less than or equal to that of the observed table:

p = \sum_{\{ (i,j,k,l) : P(A = i) \leq P(A = a) \}} P(A = i),

where the sum is over all valid tables with the fixed margins.^[19] This method, known as the "central" or "probability-based" two-sided p-value, does not necessarily double the one-sided value and can yield conservative results for small samples. An alternative mid-p adjustment, which adds half the probability of the observed table to the sum of more extreme probabilities, is sometimes used to reduce conservativeness:

p_{\text{mid}} = \frac{1}{2} P(A = a) + \sum_{\{ \text{tables more extreme} \}} P(A = i).

This adjustment improves performance in certain scenarios but is not part of Fisher's original formulation.^[20]

Computation Methods

Exact Calculation Algorithm

The exact calculation algorithm for Fisher's exact test in a 2×2 contingency table relies on enumerating all possible tables consistent with the observed row and column margins and summing the hypergeometric probabilities of those tables deemed as extreme or more extreme than the observed table, thereby avoiding the approximation errors associated with the chi-squared test that can arise in small samples.^[17] This approach ensures precise p-value computation under the null hypothesis of independence, leveraging the conditional distribution of the table given fixed margins.^[21] The procedure, often implemented via a simple recursive or network-based traversal for efficiency, begins by fixing the marginal totals from the observed table: denote the row sums as r_1 and r_2, column sums as c_1 and c_2, and grand total N = r_1 + r_2 = c_1 + c_2. Starting from the observed cell entry a in the (1,1) position, the algorithm traverses possible values of this entry, denoted k, by adjusting the remaining cells b = r_1 - k, c = c_1 - k, and d = r_2 - c to maintain the fixed margins. The range for k is from \max(0, r_1 + c_1 - N) to \min(r_1, c_1).^[2] For each feasible k, the hypergeometric probability is computed as the likelihood of that table under the null.^[17] Next, the probability of the observed table is calculated using the same formula. The one-sided p-value is then obtained by summing the probabilities over all tables where the entry k is at or beyond the observed a in the direction of the alternative hypothesis (e.g., k \geq a for a right-tailed test). For the two-sided p-value, the sum includes all tables whose probabilities are less than or equal to that of the observed table. This enumeration-based method, akin to a network traversal along the possible cell values, directly yields the exact conditional p-value without relying on asymptotic approximations.^[21] The computational complexity of this algorithm for 2×2 tables is O(N) in the worst case due to the bounded number of possible tables (at most on the order of N), with constant-time processing per table using recursive probability updates, making it feasible for small to moderate sample sizes where N < 100, beyond which extensions like the network algorithm become essential for larger r \times c generalizations.^[21] By design, this exact method circumvents the discrete nature and low expected frequencies that can lead to inaccurate p-values from chi-squared approximations, providing reliable inference in sparse data settings.^[17]

Software and Implementation

Fisher's exact test is implemented in various statistical software packages, providing users with accessible tools to compute exact p-values for 2x2 contingency tables without relying on large-sample approximations.^[22]^[3] In R, the base function fisher.test() performs the test on a contingency table specified as a matrix or two vectors.^[22] The input is typically a 2x2 matrix where rows represent groups and columns represent outcomes, such as matrix(c(a, b, c, d), nrow=2) with a, b, c, and d as cell counts.^[22] Options include alternative for one-sided ("less" or "greater") or two-sided testing, and conf.int to compute confidence intervals for the odds ratio.^[22] For example, the syntax fisher.test(matrix(c(5, 1, 8, 9), nrow=2)) yields the odds ratio, p-value, and 95% confidence interval for the odds ratio.^[22] Python's SciPy library offers scipy.stats.fisher_exact() in the stats module, which takes a 2x2 contingency table as a NumPy array or list of lists.^[3] The function supports the alternative parameter for two-sided, less, or greater hypotheses, defaulting to two-sided.^[3] It returns a tuple containing the odds ratio and the p-value; for instance, from scipy.stats import fisher_exact; oddsratio, pvalue = fisher_exact([[5, 1], [8, 9]]) computes these for the given table.^[3] In SAS, PROC FREQ computes Fisher's exact test via the TABLES statement for contingency tables, with the EXACT option specifying the test, such as PROC FREQ; TABLES row*col / EXACT FISHER;. Input data are raw observations or precomputed frequencies in a dataset, and output includes the p-value, Freeman-Halton generalization for larger tables if needed, and odds ratio measures. SPSS implements the test through the CROSSTABS procedure with the Exact Tests module, using syntax like CROSSTABS /TABLES=var1 BY var2 /STATISTICS=CHISQ /METHOD=EXACT. The input is categorical variables forming the 2x2 table, and results provide exact p-values for one- or two-tailed tests, along with the odds ratio when selected. For large samples where exact enumeration becomes computationally infeasible due to excessive table sizes, many implementations offer simulation-based approximations. In R, the simulate.p.value = TRUE parameter enables Monte Carlo simulation with B replicates (default 2000), generating random tables under fixed marginals to estimate the p-value.^[22] SAS PROC FREQ supports Monte Carlo estimation via the MC option in the EXACT statement, such as EXACT FISHER / MC NM=10000 SEED=12345;, sampling tables to approximate the exact p-value with specified iterations. Similarly, SPSS Exact Tests module provides Monte Carlo simulation for Fisher's test on large tables, yielding unbiased p-value estimates based on sampled reference sets, configurable through the dialog or syntax. These options balance accuracy and efficiency, with simulation p-values approaching exact results as the number of iterations increases.

Applications and Examples

Lady Tasting Tea Experiment

The Lady Tasting Tea experiment, devised by Ronald A. Fisher, provided the foundational example for applying what became known as Fisher's exact test to assess claims of sensory discrimination. In this setup, a lady asserted that she could determine, by taste alone, whether milk had been added to the tea before or after the tea infusion in a cup of tea. Fisher proposed an experiment involving 8 cups prepared such that 4 had milk poured first and 4 had tea poured first, presented to her in random order without labels. She was required to identify correctly which 4 cups had the milk added first. Assuming she succeeds in identifying all 4 correctly, the data are summarized in a 2×2 contingency table with the observed value a = 4 for the cell representing correct identifications of milk-first cups, while the expected value under the null hypothesis of random guessing is 2.^[23] The contingency table is defined with rows corresponding to the lady's guesses ("Guessed milk first" and "Guessed tea first") and columns to the actual preparations ("Actual milk first" and "Actual tea first"), yielding fixed marginal totals of 4 for each row and each column. The observed table thus appears as:

	Actual milk first	Actual tea first	Total
Guessed milk first	4	0	4
Guessed tea first	0	4	4
Total	4	4	8

This configuration captures the perfect match between guesses and reality.^[2] The p-value is calculated by considering the hypergeometric distribution under the null hypothesis, enumerating all possible tables with the fixed margins. The total number of such tables is given by the binomial coefficient

\binom{8}{4} = 70,

representing the ways to assign the 4 "guessed milk first" to the 8 cups. Only 1 of these tables is as extreme as or more extreme than the observed (the perfect match itself, as 4 is the maximum possible correct identifications). Thus, the one-sided p-value is $1/70 \approx 0.0143.^[2] At a significance level of \alpha = 0.05, this p-value rejects the null hypothesis, providing statistical evidence in favor of the lady's claimed ability to discriminate.^[23]

Clinical Trial Example

In clinical trials, Fisher's exact test is commonly applied to evaluate the association between a binary treatment variable and a binary outcome, particularly when sample sizes are small and the chi-square approximation may be unreliable. Consider a hypothetical randomized controlled trial involving 20 patients equally allocated to two treatments, A and B (10 patients each), to assess the efficacy of a new drug in achieving treatment success (defined as a positive clinical response). The observed outcomes yield the following 2×2 contingency table:

Treatment	Success	Failure	Total
A	7	3	10
B	2	8	10
Total	9	11	20

Here, the rows represent the treatments and the columns represent the outcome categories (success/failure), with fixed marginal totals of 10 patients per treatment group and 9 successes and 11 failures overall.^[10] To compute the one-sided p-value under the alternative hypothesis that treatment A is associated with a higher success rate, the test enumerates all possible contingency tables consistent with the fixed marginal totals that are as extreme or more extreme than the observed table in the direction of the alternative. This involves summing the hypergeometric probabilities for tables where the number of successes in treatment A (denoted a) is at least 7 (the observed value). The relevant probabilities are derived from the hypergeometric distribution: for a fixed population of 20 patients with 9 successes, the probability of drawing 10 patients for treatment A with a successes is given by

P(a) = \frac{\binom{10}{a} \binom{10}{9-a}}{\binom{20}{9}},

where \binom{n}{k} denotes the binomial coefficient. The tables with a \geq 7 have probabilities summing to approximately 0.035, indicating a statistically significant association at the 0.05 level.^[2] The odds ratio provides a measure of the strength of association, calculated as the ratio of the odds of success under treatment A to those under treatment B: (7 \times 8) / (3 \times 2) = 56 / 6 \approx 9.33. This suggests that the odds of success are over nine times higher with treatment A than with B. Exact confidence intervals for the odds ratio, accounting for the discrete nature of the data and small sample, can be constructed using conditional inference methods that condition on the sufficient marginal totals.

P-value Interpretation

The p-value obtained from Fisher's exact test represents the probability of observing the contingency table data (or a table more extreme) under the null hypothesis of independence between the row and column variables.^[2] A small p-value, typically below a pre-specified significance level such as 0.05, indicates that the observed data are unlikely under the null hypothesis, providing evidence to reject independence and infer an association between the variables.^[24] This threshold is a convention for decision-making but does not measure the strength or practical importance of the association.^[25] To contextualize the p-value, it is essential to pair it with measures of effect size, such as the odds ratio (OR), which quantifies the magnitude of the association in 2×2 tables.^[5] An OR greater than 1 suggests a positive association (e.g., higher odds of an outcome in one category relative to another), while values below 1 indicate a negative association; an OR of 1 implies no association. A 95% confidence interval for the OR provides a range of plausible values for the true effect in the population, aiding interpretation of precision and clinical relevance. In scenarios involving multiple contingency tables or tests, such as genome-wide association studies, p-values from Fisher's exact test require adjustment to control the family-wise error rate and reduce false positives.^[26] Common methods include the Bonferroni correction, which divides the significance level by the number of tests performed.^[27] Misinterpretations of p-values can lead to erroneous conclusions; notably, the p-value is not the probability that the null hypothesis is true or that the alternative hypothesis is false, but rather the probability of the data assuming the null is true.^[28] It also does not indicate the probability of replicating the result in future studies. To mitigate these risks, findings should be validated through replication in independent datasets, emphasizing that a single significant p-value does not establish causal or generalizable effects.^[25]

Assumptions, Limitations, and Controversies

Key Assumptions

Fisher's exact test is a statistical procedure for testing the independence of two categorical variables in a 2×2 contingency table, and its validity depends on specific statistical prerequisites. These assumptions ensure that the exact probability calculations under the null hypothesis accurately reflect the underlying data-generating process. The test conditions on fixed marginal totals, treating the observed row and column sums as ancillary statistics that do not influence the inference about association. This conditioning arises because, under the null hypothesis of independence, the distribution of the table entries given the margins follows a hypergeometric distribution, allowing exact computation without relying on approximations.^[2] A fundamental requirement is the independence of observations, meaning that each data point is drawn without influence from others, with no clustering, repeated measures, or correlations between variables. This ensures that the joint probability of the table entries can be properly modeled under the null hypothesis.^[5] The data must result from random sampling, typically under a multinomial or product multinomial model, where the overall sample sizes for rows (or columns) are fixed, and the cell probabilities are products of independent marginal probabilities. This sampling framework justifies the use of the conditional hypergeometric distribution for exact inference.^[29] Although the test provides exact p-values for any sample size, it is particularly suitable for small samples where the chi-squared approximation may be unreliable, such as when more than 20% of the cells have expected frequencies less than 5 or any cell has an expected frequency less than 1. In such cases, the exact nature of the test avoids the biases introduced by asymptotic methods.^[5]

Limitations and Common Misuses

For larger contingency tables (beyond 2×2), Fisher's exact test can be computationally intensive, as the number of possible tables consistent with the observed marginal totals grows exponentially with the table dimensions, often requiring Monte Carlo simulations or approximations in software implementations. However, for 2×2 tables, exact computation remains efficient and feasible for any practical sample size. For large samples, Fisher's exact test tends to be conservative, meaning its actual type I error rate is lower than the nominal significance level (e.g., <0.05 for α=0.05), resulting in reduced statistical power compared to alternatives like the chi-squared test.^[30] This conservatism arises partly from the discreteness of the hypergeometric distribution and the conditioning on fixed margins, which may not align with experimental designs where row or column totals are random (e.g., independent binomial sampling in observational studies). Consequently, for large N, more powerful unconditional tests are preferable, as the exact test's advantages in small samples diminish while its drawbacks persist. Common misuses include applying the test to data without fixed marginal totals, such as in independent samples where rows and columns represent separate randomizations rather than conditioned constraints, leading to invalid conditional inference. ^[31] Another frequent error is relying solely on the p-value without assessing effect size, such as odds ratios or phi coefficients, which obscures the practical significance of any detected association.

Debates on One- vs. Two-Sided Testing

Ronald Fisher originally defined the two-sided p-value for his exact test as the sum of the probabilities of all contingency tables that are as extreme as or more extreme than the observed table, based on their probabilities under the null distribution of the hypergeometric model, rather than relying on symmetric tail probabilities.^[32] This approach avoids assuming symmetry in the distribution and focuses on the rarity of the observed outcome relative to all possible outcomes with equal or lower probability. In response to perceived conservativeness in Fisher's conditional test, modern alternatives have emerged, including Barnard's unconditional exact test, which conditions only on one margin and maximizes power over nuisance parameters, often yielding higher power than Fisher's test for 2x2 tables.^[33] Another adjustment is the mid-p method, proposed by Lancaster, which subtracts half the probability of the observed table from the standard p-value to reduce conservativeness and anti-conservative bias while approximately maintaining the nominal Type I error rate.^[34] A key controversy surrounds the choice between one-sided and two-sided testing in Fisher's exact test, particularly for directional hypotheses; one-sided tests are advocated when theory predicts a specific direction (e.g., a treatment improving outcomes), offering greater power to detect such effects, whereas two-sided tests are preferred for exploring any association, though they may lack power for strictly directional claims. This debate highlights tensions between hypothesis specification and exploratory analysis, with one-sided tests criticized for potentially inflating Type I errors if the direction is misspecified.^[35] As of 2025, statistical reforms, including updates influenced by the American Statistical Association's 2016 statement, increasingly emphasize reporting effect sizes and confidence intervals alongside p-values from tests like Fisher's exact, to better convey practical significance and uncertainty rather than relying solely on dichotomous significance.^[36]^[37] These guidelines underscore that p-values do not measure effect magnitude, promoting a shift toward integrated inference in fields like clinical trials.^[38]

Alternatives and Extensions

Comparison with Chi-Square Test

The Pearson chi-square test for independence in 2×2 contingency tables provides an asymptotic approximation to the exact hypergeometric distribution that underlies Fisher's exact test.^[5] The chi-square test statistic is calculated as

\chi^2 = \sum \frac{(O - E)^2}{E},

where O denotes observed cell frequencies and E denotes expected frequencies under the null hypothesis of independence, with the test following a chi-square distribution with 1 degree of freedom.^[39] This approximation relies on large sample sizes to ensure the validity of the chi-square distribution, but it can lead to inaccurate p-values and inflated Type I error rates when expected frequencies are small.^[40] Fisher's exact test is preferred over the chi-square test in scenarios involving small sample sizes, particularly when any expected cell frequency is less than 5, as the asymptotic assumptions of the chi-square test break down, potentially overestimating statistical significance.^[9] In such cases, the exact nature of Fisher's test ensures precise p-value computation without relying on approximations, making it the method of choice when exact inference is required, such as in clinical trials with sparse data.^[40] Conversely, the chi-square test is suitable for larger samples where the approximation holds well, offering a computationally efficient alternative.^[5] To mitigate the chi-square test's limitations in 2×2 tables with moderate sample sizes, Yates' continuity correction adjusts the statistic to account for the discreteness of the data:

\chi^2 = \sum \frac{(|O - E| - 0.5)^2}{E},

which subtracts 0.5 from the absolute difference between observed and expected values before squaring, yielding a closer match to the hypergeometric probabilities.^[41] This correction improves the approximation's accuracy when expected frequencies are between 5 and 10 but is generally less precise than Fisher's exact test for very small cells, where the exact method remains superior.^[42] Empirically, guidelines such as Cochran's rules of thumb recommend the chi-square test (with or without Yates' correction) when the total sample size exceeds 40 and all expected frequencies are at least 5, as the approximation then provides reliable results without excessive Type I error inflation.^[42] Fisher's exact test, while always exact, is computationally more intensive, requiring enumeration of all possible contingency tables weighted by hypergeometric probabilities, which becomes inefficient for total sample sizes much larger than 40; thus, the chi-square test is favored in those contexts for its speed and sufficiency.^[5]

Generalizations to Larger Contingency Tables

The Freeman-Halton extension generalizes Fisher's exact test to contingency tables with r rows and c columns, modeling the cell counts under the null hypothesis of independence as a sample from a multivariate hypergeometric distribution with fixed marginal totals. The exact p-value is obtained by summing the probabilities of all possible tables that exhibit a deviation from independence at least as extreme as the observed table, typically ordered by the probability of the observed table or a suitable test statistic such as the Pearson chi-squared value. This approach preserves the conditional exactness of the original 2×2 test while accommodating multiple categories in rows or columns. For larger r×c tables, complete enumeration of the reference set of tables becomes computationally prohibitive, as the number of possible configurations grows factorially with table size and marginal sums. Practical algorithms therefore rely on Monte Carlo simulation to approximate the p-value, generating a large sample of tables from the multivariate hypergeometric distribution and estimating the tail probability empirically. Markov chain Monte Carlo (MCMC) methods enhance efficiency by constructing a Markov chain that samples from the conditional distribution, ensuring rapid mixing for moderate-sized tables; for instance, the fisher.test function in the R statistical software package implements such simulation when the simulate=TRUE option is specified for tables exceeding feasible enumeration limits. In cases involving more than two categorical variables—yielding multidimensional contingency tables—exact inference extends to testing specific hypotheses within log-linear models, conditioning on the sufficient statistics for the nuisance parameters to isolate effects of interest. This involves enumerating or simulating tables that match the observed marginals for higher-order interactions while varying those for the term under test, such as conditional independence in three-way tables. Software like the exactLoglinTest package in R facilitates these computations for small multi-way tables by network algorithms or MCMC. These extensions are applied in genetics to assess associations in multi-allelic or multi-locus contingency tables, such as testing Hardy-Weinberg equilibrium deviations across multiple genotypes. In ecology, they analyze sparse multi-species distribution data, evaluating independence between environmental factors and species occurrences across categories.^[43] Despite these uses, exact generalizations remain rare in practice owing to their computational demands, becoming infeasible for tables with more than about 20 cells where simulation accuracy also degrades without extensive runtime.

References

[1]
Fisher Exact Test - an overview | ScienceDirect Topics
Fisher's exact test is used to gauge if there is a statistically significant difference between the proportions of the categories in two group variables.
[2]
4.5 - Fisher's Exact Test | STAT 504
Exact computations are based on the statistical theory of exact conditional inference for contingency tables. Fisher's exact test is definitely appropriate when ...
[3]
fisher_exact — SciPy v1.16.2 Manual
Fisher, Sir Ronald A, “The Design of Experiments: Mathematics of a Lady Tasting Tea.” ISBN 978-0-486-41151-4, 1935. [2]. “Fisher's exact test”, https://en.Scipy.stats.fisher_exact · Scipy.stats. · 1.15.2 · 1.11.4
[4]
[PDF] Fisher's exact test
P-value = the sum of the probabilities for all tables having a probability equal to or smaller than that observed. Page 11. Fisher's exact test: the example. ∑ ...
[5]
Chi-squared test and Fisher's exact test - PMC - NIH
Mar 30, 2017 · The chi-squared test and Fisher's exact test can assess for independence between two variables when the comparing groups are independent and not correlated.
[6]
Fisher's Exact Test - Routledge - Wiley Online Library
Sep 19, 2008 · Fisher's exact test can be used to assess the significance of a ... The Design of Experiments. Oliver & Boyd, Edinburgh. Google Scholar.
[7]
https://web.mit.edu/r/current/lib/R/library/stats/html/fisher.test.html
[8]
Fisher's Exact Test | Journal of the Royal Statistical Society Series A
This paper reviews the problems that bedevil the selection of an appropriate test for the analysis of a 2 times 2 table.
[9]
Chi-square and Fisher's exact tests
Sep 1, 2017 · This article aims to introduce the statistical methodology behind chi-square and Fisher's exact tests, which are commonly used in medical research.
[10]
Randomized Phase II Cancer Clinical Trials - PMC - NIH
Fisher's (1935) exact test has been a popular testing method for comparing two sample binomial proportions with small sample sizes. In a randomized phase II ...Missing: applications | Show results with:applications
[11]
[PDF] Statistical Review and Evaluation - FDA
Dec 18, 2015 · Protocol specified primary efficacy analysis: Fisher's exact test was used to test if there is a difference of proportion of patients who ...Missing: applications | Show results with:applications
[12]
Fisher: SAS instruction
Comparing to the contingency chi-square test, Fisher's exact test is to exaclty calculate the p-value rather than being based on an asymptotic approximation.
[13]
[PDF] Historical Highlights in the Development of Categorical Data Analysis
CDA History – p. 9/39. Page 10. Fisher's exact test. 2nd ed. Statistical Methods for Research Workers (1934),. The Design of Experiments (1935). Tea-tasting ...
[14]
Statistical Method For Research Workers : Fisher, R. A
Jan 23, 2017 · Statistical Method For Research Workers ; Publication date: 1934 ; Topics: Other ; Collection: digitallibraryindia; JaiGyan ; Language: English.
[15]
[PDF] The Design of Experiments By Sir Ronald A. Fisher.djvu
That he seems to have been the first man in Europe to have seen the importance of developing an exact and quantitative theory of inductive reasoning, of ...
[16]
http://www.dcscience.net/fisher-1935.pdf
[17]
[PDF] A Survey of Exact Inference for Contingency Tables - Statistics
This article surveys the current theoretical and computational developments of exact methods for contingency tables. Primary attention is given to the exact ...
[18]
Elucidating the Foundations of Statistical Inference with 2 x 2 Tables
Fisher's exact test was derived by conditioning on the observed success ... Principle follows from the Sufficiency and Conditionality Principles [48–50].
[19]
On the Interpretation of χ<sup>2</sup> from Contingency Tables ...
By R. A. FISHER, M.A., Rothamsted Experiment Station. IT is well known that the Pearsonian test of goodness of fit depends upon the calculation of the quantity ...
[20]
Fisher's Exact Test - Upton - 1992 - Wiley Online Library
This paper reviews the problems that bedevil the selection of an appropriate test for the analysis of a 2 times 2 table. In contradiction to an earlier ...Missing: original | Show results with:original
[21]
A Network Algorithm for Performing Fisher's Exact Test in r × c ...
Mar 12, 2012 · An exact test of significance of the hypothesis that the row and column effects are independent in an r × c contingency table can be executed in principle.
[22]
R: Fisher's Exact Test for Count Data
### Summary of `simulate.p.value` Parameter for Large Tables
[23]
The Design Of Experiments : Fisher, R.a. - Internet Archive
Jan 17, 2017 · The Design Of Experiments. by: Fisher, R.a.. Publication date: 1935. Topics: C-DAK. Collection: digitallibraryindia; JaiGyan. Language: Unknown.
[24]
Odds Ratio - StatPearls - NCBI Bookshelf
May 22, 2023 · Figure. Case-Control Studies Odds Ratio. The figure shows a 2x2 table with calculations for the odds ratio and a 95% confidence interval for it.
[25]
9. Exact probability test - The BMJ
The observed set has a probability of 0.0115427. The P value is the probability of getting the observed set, or one more extreme. A one tailed P value would be.
[26]
[PDF] p-valuestatement.pdf - American Statistical Association
Mar 7, 2016 · By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis. The statement has short paragraphs elaborating ...
[27]
Adjusting multiple testing in multilocus analyses using the ... - Nature
Aug 3, 2005 · Appropriate adjustment for multiple testing is very important to ... Fisher's exact test (When Cochran (1954)'s condition was satisfied ...<|control11|><|separator|>
[28]
So Many Correlated Tests, So Little Time! Rapid Adjustment of P ...
... Fisher's exact test, for these models. A related issue is that sample ... multiple-testing adjustment in association studies. Am J Hum Genet 76:399 ...
[29]
What Do P Values and Confidence Intervals Really Represent? - NIH
The P value is calculated assuming that the null hypothesis is true, and it is therefore not the probability that the null hypothesis is true. Another ...
[30]
[PDF] Categorical Data Analysis
... product multinomial sampling. Independent multinomial sampling also results ... Fisher's Exact Test for 2 = 2 Tables. In Section 3.5.7 we show that a ...
[31]
Fisher's exact test: Use & misuse - 2x2 contingency table, fixed ...
Assuming the marginal totals are fixed greatly simplifies the mathematics and means that probabilities can be estimated using the hypergeometric distribution ...Missing: derivation original
[32]
[PDF] Exact Tests for 2–Way Tables - College of Education | Illinois
Introduction. Fisher's Exact Test ... Any other problems with the Nike or Admissions scandal examples and our use of Fisher's test? Fisher's exact test conditions ...
[33]
[PDF] Conditional versus Unconditional Exact Tests for Comparing Two ...
Sep 4, 2003 · Fisher's test is conditional, restricting to tables where the sum of observed responses is fixed. Barnard's test is unconditional, considering ...
[34]
Using Lancaster's mid-P correction to the Fisher's exact test for ...
This article reviews Lancaster's mid-P (LMP) test (Lancaster, 1961), an adjustment to the FET that tends to have increased power while maintaining a Type I ...
[35]
One‐Sided and Two‐Sided Tests and Other Issues to Do with ...
Jun 2, 2021 · The one issue with Fisher's exact test is that it tends to be more discrete than some alternatives, that is to say involves fewer points in ...
[36]
The ASA Statement on p-Values: Context, Process, and Purpose
Jun 9, 2016 · P-values can indicate how incompatible the data are with a specified statistical model. A p-value provides one approach to summarizing the ...
[37]
Interpreting p values and interval estimates based on practical ...
Oct 5, 2025 · In both cases, an interval estimate (see below) is needed to display the range of average effect sizes with p>0.05. Note also the qualifier ...
[38]
Statistics Reform: Practitioner's Perspective - MDPI
Terms such as null hypothesis, alternative hypothesis, p-values, statistical significance, and statistical power are eliminated. 4.1. Observed Effect Size (ES) ...Statistics Reform... · 1. Introduction · 4.3. Exceedance Probability...
[39]
The Chi-square test of independence - PMC - NIH
Jun 15, 2013 · The second is the Fisher's exact test, which is a bit more precise than the Chi-square, but it is used only for 2 × 2 Tables (4).Calculating Chi-Square · Table 2 · Chi-Square And Closely...
[40]
Choosing Statistical Tests: Part 12 of a Series on Evaluation of ... - NIH
May 14, 2010 · Unpaired samples: If the frequency of success in two treatment groups is to be compared, Fisher's exact test is the correct statistical test, ...
[41]
an overview of Fisher's exact test and Yates' correction for continuity
This paper reviews the concepts and controversies underlying these procedures and discusses their appropriateness and adequacies in analyzing such data.
[42]
Chi-Square Test of Independence Rule of Thumb: n > 5
According to Cochran (1952, 1954), all expected counts should be 10 or greater. If < 10, but >=5, Yates' Correction for continuity should be applied. More ...
[43]
Facets of the nesting ecology of a ground-nesting bird, the Spotted ...
For testing differences in clutch sizes according to months or land-use classification, we used the Fisher–Freeman–Halton Exact test. We used a Chi-square test ...