Fact-checked by Grok 2 weeks ago

Fisher's method

Fisher's method is a statistical technique for combining p-values obtained from multiple independent hypothesis tests to produce an overall assessment of , particularly useful when testing the same across different datasets or experiments. The method computes a defined as -2 \sum_{i=1}^{k} \ln(p_i), where p_i are the individual p-values and k is the number of tests; under the , this statistic follows a with $2k . Developed by British statistician and geneticist Sir Ronald A. , the method was first suggested in the edition of his influential book Statistical Methods for Research Workers, where he proposed using the product of the individual probabilities to obtain "a single test of the of the aggregate." elaborated on the approach in a 1948 article in The American Statistician, emphasizing its application to independent tests and the chi-squared approximation for determining the combined . The procedure assumes that the tests are independent and that the p-values are uniformly distributed under the , making it particularly powerful for detecting subtle effects when multiple lines of evidence converge against the null. In practice, Fisher's method transforms small individual s into an even smaller combined , enhancing the ability to detect signals in scenarios where no single test reaches conventional significance thresholds like 0.05. It has been shown to be asymptotically optimal among common methods under certain conditions, such as when effect sizes are equal across tests, due to its high Bahadur relative . However, its performance can degrade with dependent tests or highly unequal effect sizes, prompting extensions like weighted versions or adaptations for correlation structures. The method finds broad application in fields requiring evidence synthesis, including meta-analyses of clinical trials, genomic studies for gene enrichment, and bioinformatics for integrating multi-omics data. For instance, in large-scale , it helps identify pathways by pooling p-values from association tests across traits or datasets. Implementations are available in statistical software such as R's poolr package and Python's library, facilitating its routine use while accounting for nuances like one-sided versus two-sided p-values. Despite its strengths, users must verify assumptions, as violations can inflate type I error rates, and alternative methods like Z-score may be preferable for dependent data.

Introduction

Definition and Purpose

Fisher's method is a statistical technique used in to aggregate p-values from multiple independent hypothesis tests, each providing against a common . It combines the p-values p_1, p_2, \dots, p_k from k such tests into a single , enabling a more powerful assessment of the overall than any individual test alone. This approach is particularly valuable when individual tests may yield non-significant results due to limited sample sizes or effect magnitudes, yet collectively suggest a stronger signal. The primary purpose of Fisher's method is to enhance statistical power for detecting shared effects across studies or experiments, making it suitable for fields such as , where thousands of tests are performed to identify associations between genetic variants and traits, and , where evidence from multiple cohorts or endpoints is pooled to evaluate risk factors. By focusing solely on p-values, the method is non-parametric, requiring no assumptions about the underlying effect sizes, test distributions, or parametric forms beyond the uniformity of p-values under the . This flexibility allows its application to diverse data types and test statistics without needing or standardized effect measures. Named after Ronald A. Fisher, the method was developed to combine probabilities in the context of experimental design, as introduced in his seminal work on statistical methods for research.

Historical Background

Ronald A. Fisher introduced the method for combining p-values in his seminal 1925 book Statistical Methods for Research Workers, where he proposed using the product of probabilities from independent tests to assess overall significance in replicated experiments. This approach allowed researchers to aggregate evidence from multiple similar tests, transforming individual p-values into a single chi-squared statistic under the . Fisher elaborated on the inferential principles underlying this technique in his 1956 work Statistical Methods and Scientific Inference, emphasizing its role in for scientific discovery. The method emerged amid the foundational debates on in the 1920s and 1930s, particularly the Neyman-Pearson framework versus Fisher's significance testing paradigm. Fisher advocated combining p-values specifically for synthesizing results from homogeneous replicated experiments, contrasting with Neyman and Pearson's focus on power and error rates in hypothesis testing. This period of contention shaped modern statistical practice, with Fisher's method positioning evidence accumulation as central to rejecting null hypotheses based on improbability alone. During his tenure at the Rothamsted Experimental Station from 1919 to 1943, Fisher applied the method to analyze agricultural field trials and biological assays, where vast datasets from long-term experiments required integrating multiple significance tests to draw robust conclusions about crop yields and treatments. These early uses demonstrated its practicality in handling variability in experimental data, influencing the station's adoption of randomized designs and significance testing protocols. The method gained wider prominence in the mid-20th century alongside the development of techniques, particularly through extensions by statisticians like William Cochran, though Fisher consistently stressed its suitability for scenarios assuming homogeneous effects across studies rather than heterogeneous ones. This emphasis underscored its original intent for controlled, replicated scientific inquiries rather than broad syntheses of diverse evidence.

Mathematical Formulation

For Independent Test Statistics

Fisher's method applies specifically when the test statistics from multiple tests are , providing a way to evidence against a common across the tests. The core of the method is the computation of a combined from the individual p-values obtained from these tests. This statistic leverages the logarithmic transformation of the p-values to produce a quantity that follows a known under the , enabling a unified assessment of . The is defined as \chi^2 = -2 \sum_{i=1}^k \ln(p_i), where p_i denotes the from the i-th independent test, for i = 1, \dots, k, and k is the number of tests. This formula arises from the property that, under the , each p_i is uniformly distributed on [0, 1], implying that -2 \ln(p_i) follows a with 2 . Since the tests are independent, the sum of these independent chi-squared random variables yields \chi^2 distributed as chi-squared with $2k . The derivation transforms the product of p-values into a scalable chi-squared form. Key assumptions underpinning the method include the of the underlying test statistics, ensuring no correlation between the p-values, and the of each p_i under the for continuous test statistics. Additionally, the method assumes a homogeneous , where the direction and strength of against the are consistent across tests, to maintain optimal ; p-values should derive from one-sided or two-sided tests as appropriate to the research context, with two-sided p-values commonly used when the direction of effect is unspecified. Violations of can distort the reference distribution, though the method is robust to moderate departures under certain conditions. To apply the method, first collect the p-values p_i from each of the k independent tests, ensuring they are computed consistently (e.g., all two-sided if applicable). Next, compute the test statistic \chi^2 using the summation formula. Finally, compare \chi^2 to the critical value from the with $2k at the desired significance level \alpha, or derive the combined as the of this distribution evaluated at \chi^2; reject the null if the combined is below \alpha. This procedure integrates the evidence from all tests into a single decision rule, enhancing detection power when multiple lines of evidence align against the null.

Distribution and Computation

Under the , assuming the individual p-values are independent and uniformly distributed on [0,1], the -2 \sum_{i=1}^k \log p_i follows a with $2k exactly. This result holds for p-values derived from continuous s, providing a precise distributional basis for without reliance on asymptotic approximations. The combined p-value is computed as the survival function of the chi-squared distribution evaluated at the observed test statistic, that is, \text{p-value} = P(\chi^2_{2k} > -2 \sum_{i=1}^k \log p_i), which can be obtained directly from the (CDF) of the chi-squared distribution. Since the \chi^2_{2k} distribution is equivalent to a with shape parameter k and scale parameter 2, the p-value may alternatively be expressed using the regularized upper for numerical evaluation. For small k (such as 2 to 10), exact critical values and p-values can be referenced from published tables derived from the distribution. Simulation methods, involving repeated generation of uniform p-values and recomputation of the statistic, offer an additional approximate approach for verification when k is small, though direct CDF evaluation is typically sufficient and more efficient. Numerical computation requires care with very small p-values, as \log(0) is undefined and products of small p-values can cause underflow. To address this, the test statistic is calculated in logarithmic space by summing -2 \log p_i, and any zero p-values are conventionally replaced with a small positive value (e.g., machine epsilon or $10^{-16}) to ensure stability. For large k, standard statistical software employs optimized algorithms for the chi-squared CDF, maintaining accuracy without excessive computational cost. The distributional result remains valid across all k, with practical accuracy of the chi-squared form improving for larger k in cases involving discrete p-values or minor deviations from uniformity, though it is inherently exact under ideal conditions.

Applications

In Meta-Analysis

Fisher's method plays a central role in by aggregating p-values from multiple independent studies that test the same , thereby increasing the statistical power to detect an overall effect where individual studies may lack sufficient evidence alone. This approach is particularly valuable in systematic reviews, such as those evaluating drug efficacy, where it synthesizes evidence from disparate trials without requiring access to beyond the reported p-values. By transforming and combining these p-values, the method produces a single that follows a known distribution under the , enabling a unified assessment of significance across the body of research. One key advantage of Fisher's method in is its simplicity, as it requires only the p-values from each study and does not necessitate estimates of effect sizes or their variances, making it computationally straightforward and applicable even when detailed study data are unavailable. It is especially suitable for scenarios involving homogeneous effects across studies, where the assumption of holds, allowing for robust detection of subtle signals that might be obscured in single analyses. This efficiency has made it a preferred choice over more complex methods in resource-limited settings, though its performance shines in balanced datasets without extreme outliers. The method finds widespread application in fields like , where it is routinely used to combine signals from genome-wide studies (GWAS) to identify genetic variants associated with traits or diseases. In clinical trials, it supports the integration of results from randomized controlled trials to assess treatment outcomes, such as in or . Similarly, in , Fisher's method aids in synthesizing evidence from ecological studies, for instance, evaluating the impact of pollutants on across multiple sites. These applications leverage the method's ability to handle diverse datasets while assuming study independence. In practice, the workflow for applying Fisher's method in begins with selecting relevant studies that provide independent data and report p-values for the of interest, ensuring exclusion of overlapping samples to maintain the . Researchers then extract these p-values, compute the combined as described in the mathematical formulation, and derive the overall from its . The final step involves reporting the combined alongside sensitivity analyses to confirm robustness, providing a clear summary of the aggregated evidence for in policy or further research.

Modern Examples

In research, Fisher's method has been applied to combine p-values from multiple genome-wide studies (GWAS) to enhance the detection of rare variants associated with complex diseases. For instance, a 2021 study evaluated Fisher's method alongside other combination techniques for identifying incomplete associations in rare variant analyses, finding it robust but demonstrating that weighted variants like wFisher showed superior power in scenarios with moderate effects across diverse genomic datasets, such as those from cohorts. This approach allows researchers to aggregate evidence from independent GWAS on cancer susceptibility, improving statistical power for rare variants that might otherwise be overlooked in individual studies. In , particularly during the , Fisher's method has facilitated the aggregation of statistical signals from tests across geographic regions to detect without centralizing sensitive data. A 2023 study on federated applied Fisher's method to combine p-values from local tests on hospitalization counts reported to the U.S. Department of Health and Human Services (HHS), achieving high rates (up to 99.2% true positives detected at the week of the true surge) in semi-synthetic data spanning multiple regions. This enabled timely identification of hospitalization trends while preserving privacy, outperforming uncombined local analyses in unevenly distributed data scenarios. Microbiome analysis has leveraged for meta-analyses that integrate s from diverse statistical s to assess the of taxa in gut studies. A 2022 investigation compared with alternatives like for combining s from methods such as ANCOM-BC and Wilcoxon s on gut datasets, noting challenges with type I error control due to p-value correlations and recommending alternatives like the Cauchy test for better performance in heterogeneous samples. This application has supported efforts to identify key microbial signatures in gut across multiple cohorts, though with caveats on assumptions. In frameworks, weighted variants of Fisher's method have been employed for privacy-preserving in distributed healthcare data. The aforementioned 2023 federated surveillance study extended Fisher's method through site-specific weighting to combine p-values from local models on HHS hospitalization data, detecting trends with 99.2% true positive rates (at the week of the surge) during surges across decentralized custodians. This weighted approach mitigated biases from varying regional sample sizes, enabling robust trend detection in real-time without data sharing, and has implications for broader applications in multi-institutional health monitoring.

Limitations and Assumptions

Independence Requirement

The independence requirement in Fisher's method stipulates that the individual hypothesis tests must be statistically , such that the resulting p-values are uncorrelated under the . This condition holds when the tests do not share underlying data or variables that could induce , such as overlapping samples or common covariates across analyses. The rationale for this assumption lies in its role in deriving the of the combined . Specifically, independence guarantees that the quantity -2 \sum_{i=1}^k \ln(p_i) follows a \chi^2 distribution with $2k , enabling accurate computation of the combined ; any among the p-values disrupts this property and leads to an incorrect distribution. To assess whether the independence assumption is met prior to applying the method, researchers can compute the correlation matrix of the underlying test statistics and verify that off-diagonal elements are negligible or zero, or conduct empirical simulations to evaluate the joint behavior of the p-values under the null. Fisher's method is particularly suitable for scenarios involving disjoint datasets, where tests draw from completely separate samples, such as independent clinical trials evaluating the same intervention across different populations or laboratory experiments replicated under non-overlapping conditions.

Consequences of Violations

Violations of the in Fisher's method, particularly positive dependence among the test statistics or p-values, lead to an anti-conservative test where the Type I is inflated beyond the nominal level, resulting in combined p-values that are too small and an increased likelihood of false positives. For instance, when p-values exhibit positive , the standard chi-squared underestimates the true distribution, causing the method to reject the more frequently than intended. In contrast, negative dependence renders the test conservative, producing combined p-values that are larger than they would be under , thereby reducing statistical and making it harder to detect true effects. Empirical simulations demonstrate the severity of these violations, with pre-2020 studies showing that under moderate positive correlations (e.g., 0.3 to 0.5), the Type I error rate can approximately double the nominal 0.05 level, especially at small thresholds. To address such issues, preliminary mitigation can involve sensitivity analyses that evaluate the robustness of results to assumed dependence structures, though comprehensive remedies require specialized extensions beyond the standard method.

Extensions

Handling Dependent Statistics

When the test statistics in Fisher's method are dependent, the standard chi-squared distribution no longer holds under the , leading to inflated type I error rates if is assumed. Adaptations extend the method by accounting for the structure, typically through adjustments to the test statistic's or empirical of the . These approaches maintain the core idea of combining transformed p-values but modify the procedure to preserve validity. Brown's method addresses known dependence structures by assuming the underlying test statistics follow a with a specified . It approximates the of the , -2 ∑ log(p_i), as a scaled where the scale factor and are adjusted based on the . Specifically, the are modified using a Satterthwaite-type approximation that incorporates the trace and quadratic form of the of the transformed p-values, ensuring the first two moments match those of the approximated distribution. This method is particularly suitable when the dependence is fully specified, such as in designed experiments with known correlations. Kost's approach extends Brown's method to cases with unknown covariance matrices, providing analytical approximations for the scale and degrees of freedom via polynomial regressions on the correlation coefficients. It derives the covariance of the log-transformed p-values through numerical integration and matches moments to a scaled chi-squared, offering improved accuracy for moderate to high positive correlations among test statistics. For highly correlated datasets, such as those from genomic studies, Kost's approximations perform well. Weighted versions of Fisher's method incorporate correlations by generalizing the combining function with weights derived from the dependence structure. For instance, the weighted inverse chi-square method modifies the statistic to ∑ w_i (-2 log(p_i)), where weights w_i are chosen based on the estimated correlations to adjust for non-independence, approximating the null as a weighted sum of chi-squared variables. This allows flexibility in emphasizing tests with varying reliability or correlation levels, often using data-derived weights from the covariance matrix. Generalized forms, such as the weighted sum of transformed p-values, further adapt the approach for arbitrary dependence while preserving asymptotic properties. These extensions are commonly applied in scenarios with overlapping samples, such as multi-omics analyses integrating and transcriptomics data where features share biological pathways, or multi-site clinical trials with correlated outcomes across locations. In multi-omics, for example, they combine p-values from gene set enrichment across layers like and , where dependencies arise from shared experimental conditions. Computational demands rise with the number of tests k, as covariance estimation requires O(k^2) operations, making them feasible for moderate k (up to hundreds) but challenging for very large-scale data without approximations. In terms of performance, these methods maintain nominal type I error rates under dependence, unlike the standard Fisher's approach which can exceed 5% error for correlations ρ > 0.2. However, they often exhibit reduced power compared to the independent case, with losses proportional to the average correlation; for instance, Empirical Brown's method (an adaptation for estimated covariances), introduced in 2016, maintains type I error control and outperforms unadjusted Fisher in correlated settings. Permutation-based resampling can complement analytical methods by empirically deriving the null distribution under observed dependence, though it increases computational cost for large k.

Alternative Combination Methods

While Fisher's method remains a cornerstone for combining p-values under the of homogeneity, several approaches have been developed to address scenarios involving heterogeneity, dependence, or different sensitivity to p-value distributions. These methods aggregate p-values in distinct ways, offering robustness or advantages depending on the data characteristics. The p-value (HMP) provides a robust , particularly for combining dependent tests where correlations are unknown. It computes the harmonic mean by weighting each p-value inversely (as 1/p), emphasizing smaller p-values while downweighting larger ones, and has been shown to control the more powerfully than conservative corrections like Bonferroni in dependent settings. This method, introduced by in , is especially useful in genomic studies with correlated signals. Edgington's method offers a straightforward additive approach, summing the individual s directly and comparing the total to a under the for k tests (ranging from 0 to k). Proposed by Edgington in , it treats all p-values more equally than Fisher's logarithmic transformation, making it less sensitive to outliers and suitable for balanced contributions across studies. Tippett's method, dating back to , focuses on the most extreme evidence by taking the minimum and raising it to the power of 1/k, where k is the number of tests; this yields a conservative combined that is particularly effective when signals are sparse or extreme but less powerful for diffuse effects. Selection among these methods depends on the underlying assumptions: Fisher's method excels with homogeneous, tests, whereas alternatives like Edgington's or Tippett's are preferable for heterogeneous effects, and the HMP for potential dependence. Recent trends in the emphasize hybrid methods for large-scale data, such as the Cauchy combination test, which transforms s via the and sums them for analytic computation under arbitrary dependence, enhancing power in applications like .

Interpretation

Assessing Combined Significance

The combined p-value from Fisher's method is typically evaluated against a significance threshold of α = 0.05 to determine whether there is sufficient to reject the overall . When performing multiple such combined tests, adjustments for multiple testing, such as the (dividing α by the number of tests), are recommended to control the and reduce the risk of false positives. This conservative approach ensures robustness in applications like meta-analyses involving numerous hypotheses. Power analysis for Fisher's method reveals that its statistical power to detect true effects depends on the individual effect sizes and sample sizes of the constituent tests; it generally offers higher power than single tests, particularly for detecting small effects across multiple studies by aggregating subtle evidence. Simulations indicate that the method excels when effect sizes are equal across tests but may underperform relative to weighted alternatives if effect sizes vary substantially. Due to the -2 log transformation, Fisher's method is highly sensitive to small individual p-values, which can dominate the combined statistic and drive even if only a few tests show strong , while large p-values contribute minimally. This asymmetry necessitates careful interpretation, accounting for the quality, heterogeneity, and potential outliers among the input studies to avoid overemphasizing isolated strong results. In reporting results, it is standard to include the number of combined tests k, a summary of the individual p-values (such as their range, , or ), the combined χ² statistic, its (2k), and the final to provide and allow . A key risk of misinterpretation is assuming that combined implies a large or causal relationship; in reality, it only assesses the joint probability under the and requires separate evaluation of effect magnitudes and study designs for broader inferences.

Practical Implementation

Fisher's method is commonly implemented in statistical software packages that facilitate and combination. In , the 'metap' package provides a straightforward for Fisher's method, allowing users to input and obtain the combined and significance level. Similarly, Python's library includes the scipy.stats.combine_pvalues , which supports Fisher's method via calculations for efficient computation. For applications, the suite in offers specialized tools like the 'metap' integration within packages such as 'limma' or 'topGO', enabling scalable analysis of high-throughput data. Practical application begins with data preparation, where individual p-values from independent tests must be extracted and verified to be uniformly distributed under the . The computation step involves applying the formula to obtain the combined -2 log(p) statistic and referencing it against a with 2k , where k is the number of tests. Visualization aids interpretation, such as forest plots in software to display individual and combined effect sizes alongside p-values. Recent advancements include applications of Fisher's method in for privacy-preserving analysis, such as in epidemic surveillance where s are combined across distributed datasets without sharing raw data. Best practices emphasize documenting the independence assumption and conducting diagnostics, such as tests, to check for potential dependencies among s. Researchers should report the full set of input s, the combined statistic, , and to ensure reproducibility. Open-source implementations have been widely accessible since the early 2010s, with ongoing updates in packages like 'metap' (version 1.5 released in 2021; current version 1.12 as of 2025) to handle large-scale data through and optimized algorithms.

Stouffer's Z-Score Method

Stouffer's Z-score method provides a alternative to Fisher's method for combining from multiple tests by transforming p-values into standard deviates, or Z-scores, and then computing a weighted sum. The combined is calculated as Z = \frac{\sum_{i=1}^k w_i Z_i}{\sqrt{\sum_{i=1}^k w_i^2}}, where Z_i = \Phi^{-1}(1 - p_i) for the i-th p-value p_i, \Phi^{-1} is the inverse cumulative distribution function of the standard normal distribution, and w_i are non-negative weights (often set to \sqrt{n_i}, the square root of the sample size in the i-th study, to reflect precision). Under the null hypothesis of no effect in all tests, Z follows a standard normal distribution N(0,1). This approach was originally proposed by Samuel A. Stouffer and colleagues in 1949 within their sociological analysis of U.S. Army personnel attitudes, serving as a practical tool for aggregating survey-based probabilities in social sciences where Fisher's chi-squared method was less intuitive. The method was later extended by Lipták in 1958 to explicitly include weights, enhancing its utility in meta-analyses with varying study qualities. In comparison to Fisher's non-parametric method, which combines p-values via the sum of -2 \ln p_i to yield a chi-squared statistic and assumes only uniformity of p-values under the null without weights, Stouffer's approach is explicitly , relying on the of Z-scores derived from the tests, and naturally accommodates unequal variances through . Fisher's method remains unweighted and better suited to raw p-values from diverse test types, but it lacks the flexibility to prioritize larger or more precise studies inherent in Stouffer's framework. Stouffer's method is particularly advantageous when the direction of effects is anticipated (e.g., using one-sided p-values to preserve sign in Z-scores) or in weighted meta-analyses accounting for study-specific sample sizes, scenarios where Fisher's uniform treatment of p-values may dilute evidence from stronger studies. Conversely, Fisher's method excels with purely uniform p-values from exploratory or two-sided tests without prior directional assumptions. Simulation-based empirical comparisons prior to 2020 indicate that weighted Z-score method often exhibits greater statistical power than Fisher's under effect homogeneity across studies, especially when sample sizes vary—for instance, achieving power levels up to 0.807 in pooled-like homogeneous settings at \alpha = 0.05 compared to Fisher's lower values in similar simulations—due to its efficient incorporation of weights. However, reliance on Z-score transformations renders it more sensitive to outliers, such as extreme deviations from in individual test statistics, potentially reducing robustness in highly heterogeneous data relative to Fisher's non-parametric resilience in pre-2020 evaluations of meta-analytic performance.

Other Techniques

In meta-analysis, effect size-based techniques aggregate standardized measures of association, such as Cohen's d, across studies using fixed- or random-effects models to combine evidence while accounting for variability. In fixed-effects models, all studies are assumed to estimate a single true , with weights assigned inversely proportional to each study's variance to emphasize more precise estimates; this approach minimizes the variance of the pooled effect. Random-effects models, in contrast, incorporate between-study heterogeneity by adding a variance component, allowing for diverse true effects, and still use for aggregation. These methods provide interpretable summaries of overall impact, such as a weighted average Cohen's d, but require estimates and their standard errors from each study. Bayesian alternatives to Fisher's method focus on combining posterior probabilities from multiple tests, often employing priors to flexibly model uncertainty and dependence in hypothesis testing. For instance, spiked priors enable nonparametric treatment of random effects in multiple hypothesis testing, updating prior beliefs with observed data to yield posterior probabilities for each hypothesis while controlling the . This hierarchical approach allows incorporation of prior knowledge about effect sizes or dependencies, producing joint posteriors that quantify evidence strength more probabilistically than combinations. Such methods are particularly useful in high-dimensional settings, like , where Dirichlet priors facilitate clustering of similar tests. In , ensemble methods integrate within pipelines to combine statistical evidence across subsets or models, enhancing discovery in high-dimensional data. The 2021 FRL , for example, combines Fisher scores (for ranking features by discriminative power) with recursive feature elimination and in an ensemble framework, iteratively selecting and validating features while implicitly aggregating evidence from multiple iterations to identify cancer genomic biomarkers. This integration leverages Fisher's principles for scoring but extends them through ensemble , improving robustness in predictive modeling tasks like . Compared to Fisher's method, these techniques offer greater flexibility in handling heterogeneous effects or prior information but often demand larger datasets and computational resources for ; Fisher's approach excels in simplicity and rapid aggregation for uniform tests. Emerging applications include using Fisher's method in ethics to combine fairness tests across large language models, as in analyses of communication biases where it aggregates from multiple demographic checks to detect systemic trends (e.g., P < 10^{-16} after correction).

References

  1. [1]
    Choosing an Optimal Method to Combine P-values - PMC
    Fisher [1925] was the first to suggest a method of combining the p-values obtained from several statistics and many other methods have been proposed since then.
  2. [2]
    Combining Independent Tests of Significance
    It is shown that no single method of combining independent tests of significance ... It is shown that for such problems Fisher's method and a method proposed by ...
  3. [3]
    Powerful p-value combination methods to detect incomplete ... - Nature
    Mar 26, 2021 · The Fisher's method has been the most commonly used to combine p-values. The following test statistic T has -distribution with DF of ...
  4. [4]
    The Generalized Fisher's Combination and Accurate P-Value ...
    Feb 18, 2022 · The paper presents several new p-value calculation methods based on two novel ideas: moment-ratio matching and joint-distribution surrogating.
  5. [5]
    Combining dependent P-values with an empirical adaptation of ...
    Fisher R.A. (1948) Answer to question 14 on combining independent tests of significance. Am. Statistician, 2, 30–31. [Google Scholar]; Friedman N. et al ...
  6. [6]
    combine_pvalues — SciPy v1.16.2 Manual
    Combine p-values from independent tests that bear upon the same hypothesis. These methods are intended only for combining p-values from hypothesis tests based ...
  7. [7]
    Fisher's method of combining dependent statistics using ... - NIH
    A classical approach to combine independent test statistics is Fisher's combination of Inline graphic -values, which follows the Inline graphic distribution.<|control11|><|separator|>
  8. [8]
    Combining p-values in large scale genomics experiments - PMC
    If all p-values are smaller than 1/e, the combined pc goes to zero, and if they are larger that 1/e, the pc goes to one. Thus, 1/e provides a threshold point ...<|control11|><|separator|>
  9. [9]
    Statistical methods and scientific inference : Fisher, Ronald Aylmer ...
    Aug 13, 2019 · Statistical methods and scientific inference ; Publication date: 1956 ; Topics: Logic, Symbolic and mathematical, Mathematical statistics, ...
  10. [10]
    Using History to Contextualize p-Values and Significance Testing
    Ronald A. Fisher and his contemporaries formalized these methods in the early twentieth century and Fisher's 1925 Statistical Methods for Research Workers ...
  11. [11]
    An historical perspective on meta-analysis: dealing quantitatively ...
    Fisher's influence on meta-analysis is hard to exaggerate. For instance, one of the earliest publications warning about preferential publication of studies ...
  12. [12]
    Fisher's method of combining dependent statistics using ...
    Oct 29, 2013 · One approach is to combine the p-values of one-sided tests using Fisher's method (Fisher, 1932), referred to here as the Fisher's combination ...
  13. [13]
    P-value evaluation, variability index and biomarker categorization ...
    In this paper, we develop an importance sampling scheme with spline interpolation to increase the accuracy and speed of the P-value calculation.
  14. [14]
    21 Meta-analysis in environmental statistics - ScienceDirect.com
    This chapter reviews standard methods for such synthesis, including combining p-values, effect sizes, and methods for combining contingency tables. Recent ...
  15. [15]
    Meta-analysis based on weighted ordered P-values for genomic ...
    We consider weighted versions of classical procedures such as Fisher's method and Stouffer's method where the weight for each p-value is based on its order ...
  16. [16]
    [PDF] arXiv:2307.02616v2 [stat.AP] 13 Sep 2024
    Sep 13, 2024 · This paper explores federated epidemic surveillance, using hypothesis tests and meta-analysis to detect outbreaks without sharing data, ...
  17. [17]
    Combining p-values from various statistical methods for ... - Frontiers
    The most common method is Fisher's method that uses a chi-square distribution to calculate the combined value of p (Fisher, 1925). The method using the minimum ...Introduction · Materials and methods · Results · Discussion
  18. [18]
    A modified generalized Fisher method for combining probabilities ...
    Feb 19, 2014 · In this work, we propose modifications to the Lancaster procedure by taking the correlation structure among p-values into account.
  19. [19]
    None
    Nothing is retrieved...<|separator|>
  20. [20]
    Evaluating statistical significance in a meta-analysis by using ...
    We applied Fisher's ( p F ) and our ( p N ) methods to combine the two p-values from the two TWASs. There were 3,175 statistical tests (because of 3,175 genes ...<|control11|><|separator|>
  21. [21]
  22. [22]
    [PDF] Choosing Between Methods of Combining p-values - arXiv
    Dec 14, 2017 · Tippett's and Fisher's methods are clearly more sensitive to the smallest p-value, ... Then the optimal p-value combination method is ST .
  23. [23]
    The harmonic mean p-value for combining dependent tests - PNAS
    In this paper, I introduce the harmonic mean p-value (HMP), a simple to use and widely applicable alternative to Bonferroni correction motivated by Bayesian ...
  24. [24]
    An Additive Method for Combining Probability Values from ...
    (1972). An Additive Method for Combining Probability Values from Independent Experiments. The Journal of Psychology: Vol. 80, No. 2, pp. 351-363.
  25. [25]
    Combining probability from independent tests: the weighted Z ...
    Aug 25, 2005 · Fisher's method is asymmetrically sensitive to small P-values compared to large P-values. The undesirability of this result can be seen when we ...Missing: numerical | Show results with:numerical<|control11|><|separator|>
  26. [26]
    The Effect Size: Beyond Statistical Significance - PMC - NIH
    The effect size is considered an essential complement of the statistical significance test when a significant difference is found.
  27. [27]
    Optimally weighted Z-test is a powerful method for combining ... - NIH
    The weighted Z-test is a method for combining P-values in meta-analysis, superior to Fisher's method, and uses weights to improve power.
  28. [28]
    Full article: Combining independent p-values in replicability analysis
    Via simulations, we find that the Stouffer method works well if the null p-values are uniformly distributed and the signal strength is low, and the Fisher ...
  29. [29]
  30. [30]
    FRL: An Integrative Feature Selection Algorithm Based on the Fisher ...
    This paper proposes an integrative feature selection algorithm named FRL to explore potential cancer genomic biomarkers on cancer subsets.Missing: rare variants
  31. [31]
    AI–AI bias: Large language models favor communications ... - NIH
    Jul 29, 2025 · Using Fisher's method (BH-corrected, α = 0.05), the combined P-value confirms a highly significant overall trend: P < 10−16. Our findings ...