Fact-checked by Grok 2 weeks ago

F -test

The F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is commonly used to test the equality of variances from two or more populations by comparing the ratio of sample variances, which follows the F-distribution under the null hypothesis of equal variances. The F-statistic is the ratio of two independent estimates of variance, with degrees of freedom corresponding to the numerator and denominator. Developed by British statistician Sir Ronald A. Fisher in the 1920s as part of his work on variance analysis, the test and its associated distribution were later tabulated and formally named in Fisher's honor by American statistician George W. Snedecor in 1934. The F-test plays a central role in several inferential statistical methods, particularly in analysis of variance (ANOVA), where it compares the variance between group means to the variance within groups to determine if observed differences in means are statistically significant. In multiple linear regression, an overall F-test assesses the joint significance of all predictors by testing the that all regression coefficients (except the intercept) are zero, comparing the model's explained variance to the residual variance. It is also employed in nested model comparisons to evaluate whether adding more parameters significantly improves model fit. Key assumptions for the validity of the F-test include that the are normally distributed and that samples are , though robust variants exist for violations of . The test's is derived from the tables or software, with rejection of the indicating significant differences in variances or model effects at a chosen significance level, such as 0.05.

Definition and Background

Definition

The F-test is a statistical used to test hypotheses concerning the equality of variances across populations or the relative explanatory power of statistical models by comparing explained and unexplained variation. At its core, the is constructed as the ratio of two independent scaled chi-squared random variables, each divided by their respective , which under the follows an . This framework enables inference about population parameters when data are assumed to follow a , forming a key component of statistical analysis. Named after the British statistician Sir Ronald A. Fisher, the F-test originated in the 1920s as a variance ratio method developed during his work on experimental design for agricultural research at Rothamsted Experimental Station. Fisher introduced the approach in his 1925 book Statistical Methods for Research Workers to facilitate the analysis of experimental data in biology and agriculture, where comparing variability between treatments was essential. The term "F" was later coined in honor of Fisher by George W. Snedecor in the 1930s. In the hypothesis testing framework, the F-test evaluates a (H_0) positing equal variances (for variance comparisons) or no significant effect (for model assessments) against an (H_a) indicating inequality or the presence of an effect. The procedure relies on the of the to compute p-values or critical values, allowing researchers to assess evidence against the null at a chosen level. This makes the F-test foundational in , particularly under assumptions, for drawing conclusions about population variability or model adequacy.

F-distribution

The F-distribution, also known as Snedecor's F-distribution, is defined as the of the of two independent chi-squared random variables, each scaled by their respective . Specifically, if U \sim \chi^2_{\nu_1} and V \sim \chi^2_{\nu_2} are independent, with \nu_1 and \nu_2 , then the random variable F = \frac{U / \nu_1}{V / \nu_2} follows an with parameters \nu_1 (numerator ) and \nu_2 (denominator ). This distribution is central to testing involving variances, as it models the of sample variances from normally distributed populations. The probability density function of the F-distribution is f(x; \nu_1, \nu_2) = \frac{\Gamma\left( \frac{\nu_1 + \nu_2}{2} \right) \left( \frac{\nu_1}{\nu_2} \right)^{\nu_1 / 2} x^{(\nu_1 / 2) - 1} }{ \Gamma\left( \frac{\nu_1}{2} \right) \Gamma\left( \frac{\nu_2}{2} \right) \left( 1 + \frac{\nu_1 x}{\nu_2} \right)^{(\nu_1 + \nu_2)/2} } for x > 0 and \nu_1, \nu_2 > 0, where \Gamma is the . Here, \nu_1 influences the near the , while \nu_2 affects the behavior; both parameters must be positive real numbers, though values are common in applications. Key properties of the F-distribution include its right-skewed shape, which becomes less pronounced as \nu_1 and \nu_2 increase. As \nu_2 \to \infty, the distribution approaches a with \nu_1 , scaled by $1/\nu_1. The mean exists for \nu_2 > 2 and is given by \frac{\nu_2}{\nu_2 - 2}. The variance exists for \nu_2 > 4 and is \frac{2 \nu_2^2 (\nu_1 + \nu_2 - 2)}{\nu_1 (\nu_2 - 2)^2 (\nu_2 - 4)}. The F-distribution relates to other distributions in special cases; notably, when \nu_1 = [1](/page/1), an F($1, \nu_2) is the square of a Student's t-distributed with \nu_2 . Critical values for the , which define rejection regions in tests at significance levels such as \alpha = 0.05, are obtained from tables or computed using statistical software, as the distribution lacks a closed-form . These values depend on \nu_1, \nu_2, and \alpha, with higher \nu_1 typically yielding larger critical thresholds.

Assumptions and Interpretation

Key Assumptions

The F-test relies on several fundamental statistical assumptions to ensure its validity and the reliability of its inferences. These assumptions underpin the derivation of the under the and must hold for the to follow the expected . Primarily, they include of the underlying populations or errors, of observations, homoscedasticity (equal variances) in contexts where it is not the hypothesis being tested, and random sampling from the populations of interest. Violations of these can compromise the test's performance, leading to distorted results. Normality assumes that the data or error terms are drawn from normally distributed populations. For the F-test comparing two variances, both populations must be normally distributed, as deviations from normality can severely bias the test statistic. In applications like analysis of variance (ANOVA), the residuals (errors) are assumed to follow a , enabling the F-statistic to approximate the under the null. This assumption is crucial because the F-test's exact depends on it, particularly in small samples. Independence requires that observations within and across groups are , meaning the value of one does not another. This is essential for the additivity of variances in the F-statistic and prevents or clustering effects that could inflate variance estimates. Random sampling further ensures that the samples are representative and unbiased, drawn independently from the target populations without systematic , which supports the generalizability of the test's conclusions. Homoscedasticity, or equal variances across groups, is a key assumption for F-tests in ANOVA and contexts, where the posits no group differences in means under equal spread. However, in the specific F-test for equality of two variances, homoscedasticity is the hypothesis under scrutiny rather than a prerequisite, though and still apply. Breaches here can lead to unequal variances that skew the toward false positives or negatives. Violations of these assumptions can have significant consequences, including inflated Type I rates, reduced statistical power, and invalid p-values. For instance, non-normal , especially with heavy tails or skewness, often causes the actual size to exceed the nominal level (e.g., more than 5% rejections under the ), distorting decisions. Heteroscedasticity may similarly bias the F-statistic, leading to overly liberal or conservative inferences depending on the direction of variance inequality. Independence violations, such as in clustered , can underestimate standard errors and overstate . To verify these assumptions before applying the F-test, diagnostic methods are recommended. Normality can be assessed using the Shapiro-Wilk test, which evaluates whether sample data deviate significantly from a and is particularly powerful for small samples (n < 50). For homoscedasticity, Levene's test serves as a robust alternative to the F-test itself, checking equality of variances by comparing absolute deviations from group means and being less sensitive to non-normality. These checks help identify potential issues, allowing researchers to consider transformations, robust alternatives, or non-parametric methods if assumptions fail.

Interpreting Results

The F-test statistic, denoted as F, represents the ratio of two variances or mean squares, where a larger value indicates a greater discrepancy between the compared variances or a stronger difference in model fit relative to the expected variability under the null hypothesis. For instance, in contexts like , an F value substantially exceeding 1 suggests that between-group variability dominates within-group variability. This interpretation holds provided the underlying assumptions of normality and homogeneity of variances are met, ensuring the validity of the as the reference. The p-value associated with the F-statistic is the probability of observing an F value at least as extreme as the calculated one, assuming the null hypothesis of equal variances (or no effect) is true. Researchers typically compare this p-value to a significance level \alpha, such as 0.05; if p < \alpha, the null hypothesis is rejected, indicating statistically significant evidence against equality of variances or presence of an effect. This decision rule quantifies the risk of Type I error but does not measure the probability that the null hypothesis is true. Confidence intervals for the ratio of two population variances can be constructed using quantiles from the F-distribution. Specifically, for samples with variances s_1^2 and s_2^2 and degrees of freedom \nu_1 and \nu_2, a (1 - \alpha) \times 100\% interval is given by: \left( \frac{s_1^2}{s_2^2} \cdot \frac{1}{F_{\alpha/2, \nu_1, \nu_2}}, \quad \frac{s_1^2}{s_2^2} \cdot F_{\alpha/2, \nu_2, \nu_1} \right) where F_{\gamma, a, b} denotes the \gamma-quantile of the F-distribution with a and b degrees of freedom. If the interval excludes 1, it provides evidence against the null hypothesis of equal variances at level \alpha. Beyond significance, effect size measures quantify the magnitude of the variance ratio or effect, independent of sample size. In ANOVA applications of the F-test, eta-squared (\eta^2) serves as a generalized effect size, calculated as the proportion of total variance explained by the between-group (or model) component. Values of \eta^2 around 0.01, 0.06, and 0.14 are conventionally interpreted as small, medium, and large effects, respectively, though these benchmarks vary by field. Common interpretive errors include equating statistical significance (low p-value) with practical importance, overlooking that large samples can yield significant results for trivial effects. Another frequent mistake is failing to adjust for multiple F-tests, which inflates the family-wise error rate, though corrections like Bonferroni are recommended without delving into specifics here. Software outputs for F-tests, such as in R's anova() function or SPSS's ANOVA tables, typically display the F-statistic, associated degrees of freedom (numerator and denominator), and p-value in a structured summary. For example, an R output might show "F = 4.56, df = 2, 27, p = 0.019," indicating rejection of the null at \alpha = 0.05 based on the p-value column. Similarly, SPSS tables report these alongside sums of squares and mean squares, facilitating quick assessment of the test statistic's magnitude relative to error variance.

Calculation Methods

General Test Statistic

The F-test statistic provides a general framework for testing hypotheses about variances or model parameters in settings assuming normality of errors. In its universal form, the statistic is expressed as the ratio of two mean squares (MS), which are unbiased estimates of variance components: F = \frac{\text{MS}_\text{numerator}}{\text{MS}_\text{denominator}} = \frac{\text{SS}_\text{numerator} / \nu_1}{\text{SS}_\text{denominator} / \nu_2}, where \text{SS}_\text{numerator} and \text{SS}_\text{denominator} denote the sums of squares associated with the numerator and denominator components, respectively, and \nu_1 and \nu_2 are their corresponding degrees of freedom. Alternatively, it can be viewed as the ratio of two independent variance estimates, \hat{\sigma}_1^2 / \hat{\sigma}_2^2, under the null hypothesis where both estimate the same population variance \sigma^2. The derivation of this statistic stems from the properties of the normal distribution. Under normality assumptions, sums of squares in linear models or variance comparisons follow scaled chi-squared distributions. Specifically, if U \sim \chi^2(\nu_1) and V \sim \chi^2(\nu_2) are independent chi-squared random variables (arising from quadratic forms of normal deviates), then the ratio F = \frac{U / \nu_1}{V / \nu_2} follows an with \nu_1 and \nu_2 degrees of freedom under the null hypothesis. This decomposition often arises from partitioning the total sum of squares into components attributable to the hypothesis of interest and residual error, each proportional to \sigma^2 times a central chi-squared variable when the null holds. Equivalently, in the context of normal linear models, the F-statistic is a monotonic transformation of the for nested models, where -2 \log \Lambda = n \log\left(1 + F \frac{\nu_1}{\nu_2}\right), with n the sample size, confirming its optimality under normality. To compute the F-statistic, follow these steps: (1) Identify and calculate the relevant sums of squares based on the data and hypothesis, such as through model fitting or variance pooling; (2) determine the degrees of freedom \nu_1 for the numerator (e.g., number of parameters or groups minus 1) and \nu_2 for the denominator (e.g., total observations minus parameters); (3) divide each sum of squares by its degrees of freedom to obtain the mean squares; (4) form the ratio F = \text{MS}_\text{numerator} / \text{MS}_\text{denominator}, ensuring the numerator reflects the larger expected variance under the alternative to maintain a right-tailed test. For instance, \nu_1 might equal the number of groups minus 1, while \nu_2 equals the total sample size minus the number of groups. Under the null hypothesis, the sampling distribution of the F-statistic is the central F-distribution with parameters \nu_1 and \nu_2, denoted F \sim F(\nu_1, \nu_2). This distribution is used to obtain critical values or p-values for hypothesis testing, with rejection of the null occurring for large values of F.

Equality of Two Variances

The F-test for the equality of two variances assesses whether two independent samples are drawn from normal populations with equal population variances. The null hypothesis states that the variances are equal, H_0: \sigma_1^2 = \sigma_2^2, while the alternative can be two-tailed, H_a: \sigma_1^2 \neq \sigma_2^2, or one-sided, such as H_a: \sigma_1^2 > \sigma_2^2. The test statistic is the ratio of the sample variances, with the larger variance in the numerator for the two-tailed case: F = \frac{s_1^2}{s_2^2}, where s_1^2 > s_2^2 and s_i^2 denotes the sample variance from group i. Under H_0, F follows an F-distribution with degrees of freedom \nu_1 = n_1 - 1 and \nu_2 = n_2 - 1. Consider hypothetical data from two samples: one with n_1 = 10 and sample standard deviation s_1 = 5 (so s_1^2 = 25), the other with n_2 = 12 and s_2 = 3 (so s_2^2 = 9). The test statistic is F = 25 / 9 \approx 2.78, with degrees of freedom 9 and 11; the p-value is obtained by comparing this to the critical values or cumulative distribution of the F(9,11) distribution. This test generally exhibits relatively low power for detecting small differences in variances compared to some robust alternatives, limiting its sensitivity to subtle departures from H_0. For more than two groups, Bartlett's test is preferred as an alternative due to its under . One of the earliest uses of the F-test was by in 1924, in developing methods for comparing variances in experimental data.

Applications in Analysis of Variance

One-way ANOVA

The (ANOVA) utilizes the F-test to assess whether the means of three or more independent groups differ significantly by comparing the ratio of between-group variance to within-group variance. Developed by Ronald A. Fisher in the early for analyzing agricultural experiments, this method partitions the total observed variability into components attributable to differences between groups and random variation within groups. In a one-way ANOVA setup, observations are collected from k independent groups, where each group corresponds to a level of a single categorical factor. The null hypothesis (H₀) posits that all population means are equal (μ₁ = μ₂ = … = μ_k), while the alternative hypothesis (H_a) states that at least one mean differs. The test assumes independent observations, normality within each group, and equal variances across groups. The total sum of squares (SST) measures overall variability and decomposes as SST = SSB + SSW, where SSB is the between-group sum of squares reflecting variation due to group differences, and SSW is the within-group sum of squares capturing residual variation. SSB is computed as ∑{i=1}^k n_i (\bar{y}i - \bar{y})^2, with n_i as the size of group i, \bar{y}i as its mean, and \bar{y} as the grand mean; SSW is ∑{i=1}^k ∑{j=1}^{n_i} (y{ij} - \bar{y}_i)^2, summing squared deviations from each group mean. The mean squares are then MSB = SSB / (k - 1) and MSW = SSW / (N - k), where N is the total sample size. The test statistic is F = MSB / MSW, distributed as F(k-1, N-k) under H₀. A large F value suggests greater between-group variance, leading to rejection of H₀ if the p-value (from the F-distribution) is below the significance level. For a worked example, consider three groups (k=3) with five observations each (n=5, N=15), such as yields from different fertilizers: Group 1: 10, 12, 11, 13, 14 (\bar{y}_1=12); Group 2: 13, 14, 15, 16, 17 (\bar{y}_2=15); Group 3: 16, 17, 18, 19, 20 (\bar{y}_3=18). The grand mean \bar{y}=15, SSW=30 (sum of variances within groups, each contributing 10), and SSB=90. Thus, MSB=45, MSW=2.5, and F=18 with df₁=2, df₂=12. The p-value ≈0.0002 (far below α=0.05), rejecting H₀ and indicating significant mean differences. This calculation follows standard procedures for balanced designs. A significant F-test signals overall differences but does not specify which groups differ, necessitating post-hoc analyses for pairwise comparisons. One key advantage of one-way ANOVA over multiple t-tests is its control of the family-wise error rate, making it more efficient and appropriate for comparing more than two groups without inflating Type I error.

Multiple Comparisons in ANOVA

In analysis of variance (ANOVA), a significant overall F-test indicates that at least one group mean differs from the others, but it does not specify which pairs differ. Performing multiple unplanned pairwise t-tests without adjustment inflates the (FWER), defined as the probability of committing at least one Type I error across the family of comparisons. This inflation occurs because each t-test is conducted at the nominal significance level (e.g., α = 0.05), leading to an experiment-wise error rate approaching 1 - (1 - α)^m for m comparisons under the of no differences. To address this, F-protected multiple comparison procedures condition pairwise tests on a significant overall ANOVA , thereby controlling the FWER at the desired level while enhancing power compared to unconditional methods. These approaches leverage the from the ANOVA to gate subsequent comparisons, ensuring that Type I error protection is maintained only when evidence of overall differences exists. Common F-protected tests include Tukey's honestly significant difference (HSD) and Scheffé's method, both of which extend the framework for post-hoc analysis. Tukey's HSD procedure, introduced by , controls the FWER for all pairwise comparisons among group means by using the , which is closely related to the (as the square root of an F statistic with 1 and ν approximates the t distribution for two groups). The for the range between two means is q = \frac{|\bar{Y}_i - \bar{Y}_j|}{\sqrt{\frac{\text{MSW}}{n}}}, where \bar{Y}_i and \bar{Y}_j are the sample means, MSW is the mean square within from the ANOVA, and n is the sample size per group (assuming equal sizes). This q is compared to a from the q_{\alpha, k, N-k}, where k is the number of groups and N-k is the error ; significant differences occur if q exceeds the . The method is conservative for non-pairwise comparisons but optimal for planned all-pairs under balanced designs. Scheffé's method, developed by Henry Scheffé, provides a more flexible F-based approach for testing any linear among means, controlling the FWER for the entire set of possible contrasts. After a significant ANOVA F-test with value F_0, a ψ = ∑ c_i μ_i (with ∑ c_i = 0 and ∑ c_i^2 = 1 for normalization) is tested via the statistic F = \frac{(\hat{\psi})^2}{ \text{MSE} \cdot \sum (c_i^2 / n_i)}, compared to (k-1) F_{\alpha, k-1, N-k}; the is significant if F > (k-1) F_{\alpha, k-1, N-k}, ensuring simultaneous confidence for all contrasts. This procedure is less powerful than Tukey's HSD for pairwise tests but superior for complex, unplanned contrasts involving more than two groups. For illustration, consider a one-way ANOVA on yields from three fertilizer treatments (A, B, C) with n=10 per group and a significant overall F (p < 0.05), means 20, 25, and 30 units, and MSW=25. Post-hoc analysis might involve three pairwise comparisons: using Tukey's HSD, q_{0.05,3,27} ≈ 3.51, se = \sqrt{25/10} ≈ 1.58, critical HSD ≈ 3.51 × 1.58 ≈ 5.55; differences A-B=5 and B-C=5 < 5.55 (non-significant), but A-C=10 > 5.55 (significant). Scheffé's method could instead test a like ψ = (μ_A + μ_B)/2 - μ_C, with appropriate coefficients normalized so ∑ c_i^2 = 1, potentially indicating significance for such complex comparisons depending on the exact values and critical threshold. Tukey's HSD is preferred for all pairwise comparisons in balanced designs where the goal is to identify differing pairs without preconceived contrasts, while Scheffé's method suits exploratory analyses with arbitrary linear combinations, such as subset means or trends, despite its conservatism. Both are applied only after a significant ANOVA F-test to maintain FWER control.

Applications in Regression

Overall Model Significance

In analysis, the F-test for overall model significance evaluates whether the fitted model accounts for a statistically significant portion of the variance in the response variable, beyond what would be expected under a null model containing only the intercept. The null hypothesis H_0 posits that all slope coefficients are zero (\beta_1 = \beta_2 = \dots = \beta_p = 0), implying that none of the predictor variables are useful for explaining the response, while the H_a states that at least one \beta_i \neq 0. This test is foundational in multiple , as it determines if there is evidence of a linear relationship between the predictors and the response before exploring individual effects. The test statistic follows an F-distribution under the null hypothesis and is computed as F = \frac{\text{SSR}/p}{\text{SSE}/(n - p - 1)}, where SSR is the regression sum of squares (measuring explained variance), SSE is the sum of squared errors (measuring unexplained variance), p is the number of predictor variables, and n is the sample size. Equivalently, it can be expressed using the coefficient of determination R^2 (the proportion of total variance explained by the model) as F = \frac{R^2 / p}{(1 - R^2)/(n - p - 1)}, with degrees of freedom df_1 = p for the numerator and df_2 = n - p - 1 for the denominator. The p-value is obtained by comparing the calculated F to the critical value from the F-distribution table or via software, with rejection of H_0 at a chosen significance level (e.g., 0.05) indicating model significance. This formulation directly tests whether R^2 > 0 more than expected by random chance, as a high F-value reflects a large of explained to unexplained variance relative to their . For instance, consider a (p = 1) with n = 20 observations, = 100, and = 200; the is F = (100 / 1) / (200 / 18) \approx 9, yielding a less than 0.01 and rejecting H_0 at the 1% level, confirming the predictor explains significant variation. In practice, a significant overall F-test establishes the model's basic utility, justifying further analysis of individual coefficients, though it does not identify which specific predictors contribute.

Comparing Nested Models

In linear regression analysis, the F-test for comparing nested models assesses whether a full model with additional predictors provides a significantly better fit to the data than a reduced (simpler) nested model. The reduced model contains p_1 parameters, while the full model includes p_2 > p_1 parameters, where the extra parameters correspond to the additional predictors. The H_0 posits that the coefficients \beta of these additional predictors are all zero, implying no improvement from including them. The test statistic follows an under H_0 and is given by F = \frac{(SSE_{\text{reduced}} - SSE_{\text{full}}) / (p_2 - p_1)}{SSE_{\text{full}} / (n - p_2 - 1)}, where SSE denotes the sum of squared errors (), n is the sample size, the numerator are \text{df}_1 = p_2 - p_1, and the denominator are \text{df}_2 = n - p_2 - 1. Here, p_1 and p_2 represent the number of predictors (excluding ). An equivalent formulation uses the coefficients of determination: F = \frac{(R^2_{\text{full}} - R^2_{\text{reduced}}) / (p_2 - p_1)}{(1 - R^2_{\text{full}}) / (n - p_2 - 1)}. This test is particularly useful in hierarchical model building, such as when adding interaction terms between existing predictors or incorporating subsets of new variables (e.g., testing if quadratic terms enhance a linear model of economic growth). If the computed F-statistic exceeds the critical value from the F-distribution at a chosen significance level (e.g., \alpha = 0.05), the null hypothesis is rejected, supporting the inclusion of the additional predictors. For illustration, consider a reduced model with 2 predictors yielding R^2 = 0.3 and a full model with 4 predictors yielding R^2 = 0.45, based on n = 50 observations. Substituting into the R^2-based gives F = \frac{(0.45 - 0.3) / 2}{(1 - 0.45) / 45} = \frac{0.075}{0.01222} \approx 6.14 with \text{df}(2, 45). Since 6.14 exceeds the of approximately 3.18 for \alpha = 0.05, the result is significant, indicating the two additional predictors meaningfully improve the model fit. The assumptions mirror those of the general F-test in : linearity of the relationship, of errors, homoscedasticity (constant error variance), and of the error , with the added requirement that the models are properly nested (the full model encompasses all parameters of the reduced model). Violations, such as non-normality, can inflate Type I error rates. This approach extends the overall model significance test as a special case, where the reduced model is the intercept-only null.

Limitations and Extensions

Limitations

The F-test is sensitive to violations of its normality assumption, as the test statistic deviates from the under non-normal conditions, potentially leading to inflated or deflated Type I error rates. For instance, meta-analyses of simulation studies have shown that in the data distribution has a greater impact than . Additionally, the F-test exhibits low statistical power when sample sizes are small or when the variances being compared are nearly equal, making it difficult to detect true differences reliably. As an , the F-test in ANOVA only assesses whether there is any overall difference among group means but does not specify which groups differ, necessitating follow-up post-hoc analyses to identify pairwise differences. This broad nature limits its interpretative utility in isolation, particularly when multiple groups are involved. In high-dimensional settings where the number of variables exceeds the sample size (p > n), the traditional F-test becomes inapplicable, as the requirements cannot be satisfied and the degenerates, leading to unreliable . Historically, Fisher's original formulation of the F-test in the assumed homogeneity of variances across groups, an idealization that overlooked common heterogeneity in real-world data; post-1950s critiques, including those emphasizing alternative models for variance instability, highlighted how this assumption often fails in practice, prompting awareness of the test's incomplete applicability to heterogeneous datasets.

Robust Alternatives

When the standard F-test for equality of variances assumes , robust alternatives like address these limitations by using absolute deviations from the group to construct an F-statistic, making it less sensitive to non-. , proposed in 1960, performs an ANOVA on these absolute deviations to test the of equal variances. A modification, the Brown-Forsythe test, replaces the with the in the deviation calculation, further enhancing robustness against outliers and skewed distributions. Bootstrap methods offer another non-parametric approach, resampling the data to estimate the distribution of a variance ratio statistic under the of homogeneity, which is particularly useful when sample sizes are small or distributions are unknown. In the context of ANOVA, Welch's test extends the F-test to handle unequal variances by adjusting the and weighting groups inversely by their variances, providing a more reliable assessment of mean differences without assuming homoscedasticity. For non-parametric settings, the Kruskal-Wallis test ranks the data and applies a chi-squared statistic to compare medians across groups, bypassing assumptions of normality and equal variances entirely. For , likelihood ratio tests in generalized linear models (GLMs) compare nested models by evaluating the difference in deviance, offering a robust alternative to the F-test when errors are non-normal or heteroscedastic. tests randomize residuals or predictors to generate an empirical for the , suitable for small samples or complex dependencies. Additionally, robust F-tests using estimators adjust standard errors for heteroscedasticity and clustering, preserving the F-statistic's form while correcting inference. These alternatives are preferred in scenarios with small sample sizes, non-normal errors, or heteroscedasticity, where the F-test may inflate Type I errors; for instance, often exhibits higher power than the F-test when group means are equal but variances differ under non-normal conditions. Emerging extensions include Bayesian F-tests for ANOVA, which compute Bayes factors to quantify evidence for equal versus unequal effects using default priors, providing probabilistic interpretations beyond p-values. In machine learning contexts, analogs like permutation-based feature importance tests mimic F-test logic for model comparison in high-dimensional settings, as seen in implementations post-2000.

References

  1. [1]
    1.3.5.9. F-Test for Equality of Two Variances
    Purpose: Test if variances from two populations are equal, An F-test (Snedecor and Cochran, 1983) is used to test if the variances of two populations are ...
  2. [2]
    Understanding Analysis of Variance (ANOVA) and the F-test
    May 18, 2016 · F-tests are named after its test statistic, F, which was named in honor of Sir Ronald Fisher. The F-statistic is simply a ratio of two variances ...
  3. [3]
    12.3 The F Distribution and the F-Ratio - OpenStax
    Dec 13, 2023 · It is called the F distribution, invented by George Snedecor but named in honor of Sir Ronald Fisher, an English statistician. The F statistic ...
  4. [4]
    A Simple Guide to Understanding the F-Test of Overall Significance ...
    The F-test checks if a regression model fits data better than a model with no predictor variables, testing if all predictor variables are jointly significant.<|control11|><|separator|>
  5. [5]
    5.6 - The General Linear F-Test | STAT 462
    Use an F-statistic to decide whether or not to reject the smaller reduced model in favor of the larger full model. As you can see by the wording of the third ...Missing: applications | Show results with:applications
  6. [6]
    1.3.6.6.5. F Distribution - Information Technology Laboratory
    The F distribution is the ratio of two chi-square distributions, used for hypothesis tests and determining confidence intervals, like in analysis of variance.
  7. [7]
    R. A. Fisher's Statistical Methods for Research Workers - jstor
    This is our justification for regarding 1925 as a significant date in the history not only of statistics but also of biology.
  8. [8]
    [PDF] Statistical Methods For Research Workers Thirteenth Edition
    Page 1. Statistical Methods for. Research Workers. BY. Sir RONALD A. FISHER, sg.d., f.r.s.. D.Sc. (Ames, Chicago, Harvard, London), LL.D. (Calcutta, Glasgow).
  9. [9]
    An F-Test for the 21st Century | Quality Digest
    Apr 3, 2017 · The F-ratio, created by Sir Ronald Fisher around 1925, is a generalization of Student's t-test for comparing two averages.
  10. [10]
    The F Distribution and the F-Ratio | Introduction to Statistics
    The distribution used for the hypothesis test is a new one. It is called the F distribution, named after Sir Ronald Fisher, an English statistician.
  11. [11]
    6.11 - F distribution - biostatistics.letgen.org
    The F distribution is the ratio of two chi-square distributions, with degrees of freedom v1 and v2 for numerator and denominator, respectively. Note 1: George ...Missing: 1925 | Show results with:1925
  12. [12]
    4.2 - The F-Distribution | STAT 415
    The confidence interval for the ratio of two variances requires the use of the probability distribution known as the F-distribution.
  13. [13]
    Probability Playground: The F Distribution
    Snedecor's F distribution is a sampling distribution derived from the ratio of sample variances from two independent normal distributions.
  14. [14]
    3.5 - The Analysis of Variance (ANOVA) table and the F-test
    3.5 - The Analysis of Variance (ANOVA) table and the F-test · The degrees of freedom associated with SSR will always be 1 for the simple linear regression model.
  15. [15]
    One-Way ANOVA Sums of Squares, Mean Squares, and F-test
    MSA = SSA/(J-1), which estimates the variance of the group means around the grand mean. · MSError = SSError/(N-J), which estimates the variation of the errors ...
  16. [16]
    4.3 - Two Variances | STAT 415 - STAT ONLINE
    A confidence interval for the ratio of two population variances is calculated, but these intervals are not very accurate when data is not normally distributed.
  17. [17]
    ANOVA - Sociology 3112 - The University of Utah
    Apr 12, 2021 · A large F Statistic means that there is more between-group variance than within-group variance, thus increasing our chances of rejecting the ...
  18. [18]
    SticiGui Hypothesis Testing: Does Chance explain the Results?
    Apr 21, 2021 · Interpreting P-values. A common mistake in hypothesis testing is to misinterpret the P-value or significance level; in particular, to consider ...
  19. [19]
    Lesson 4: Confidence Intervals for Variances - STAT ONLINE
    , X n are normally distributed with mean μ and population variance σ 2 , then: ... formula for the confidence interval for the population variance. Doing so ...
  20. [20]
    [PDF] Confidence intervals for two populations - Chrysafis Vogiatzis
    Confidence intervals for two populations can be built for the difference between two means, the ratio of two variances, and the difference between two ...
  21. [21]
    Measures of Effect Size (Strength of Association)
    The analysis of variance table with the corresponding Eta squared scores for each effect is shown in Table 1. Sig.
  22. [22]
    Partial Eta Squared - Statistics Resources - National University Library
    Oct 27, 2025 · Partial Eta Squared · η2 = 0.01 indicates a small effect · η2 = 0.06 indicates a medium effect · η2 = 0.14 indicates a large effect.
  23. [23]
    Misunderstanding the p-value
    Mar 12, 2013 · A p-value higher than 0.05 usually indicates that the results of the study, however good or bad, were probably due only to chance.
  24. [24]
    [PDF] Lecture 10: F-Tests, ANOVA and R2 1 ANOVA
    The textbook (§2.7–2.8) goes into great detail about an F test for whether the simple linear regression model “explains” (really, predicts) a “significant” ...
  25. [25]
  26. [26]
    [PDF] Chapter 6 Inference
    The central F is used to find significance levels of the test, and the non-central F can be used to construct power functions, as in Section 6.10. 6.5 General ...Missing: derivation | Show results with:derivation
  27. [27]
    Statistical tests for homogeneity of variance for clinical trials and ...
    A key assumption for these parametric tests is that data are normally, independently distributed and the response variances are equal.
  28. [28]
    Introduction to Fisher (1926) The Arrangement of Field Experiments
    In 1919, the Director of Rothamsted Experimental Station, Sir John Russell, invited Ronald Aylmer Fisher, a young mathematician with interests in evolution ...
  29. [29]
    Analysis of Variance - Cardinal - Major Reference Works
    Analysis of variance (ANOVA) was initially developed by R. A. Fisher, beginning around 1918, and had early applications in agriculture.
  30. [30]
    7.4.3.1. One-way ANOVA overview
    Overview and principles, This section gives an overview of the one-way ANOVA. First we explain the principles involved in the one-way ANOVA.Missing: test | Show results with:test
  31. [31]
    7.4.3.2. The one-way ANOVA model and assumptions
    The mathematical model that describes the relationship between the response and treatment for the one-way ANOVA is given by Y i j = μ + τ i + ϵ i j , where Y i ...Missing: F- | Show results with:F-
  32. [32]
    7.4.3.3. The ANOVA table and tests of hypotheses about means
    The sums of squares SST and SSE previously computed for the one-way ANOVA are used to form two mean squares, one for treatments and the second for error.
  33. [33]
    3.3 - Multiple Comparisons | STAT 503 - STAT ONLINE
    Alternatively, the Bonferroni method does control the family error rate, by performing the pairwise comparison tests using α / g level of significance, where g ...
  34. [34]
    METHOD FOR JUDGING ALL CONTRASTS IN THE ANALYSIS OF ...
    Abstract. A simple answer is found for the following question which has plagued the practice of the analysis of variance: Under the usual assumptions, if t.
  35. [35]
    5.7 - MLR Parameter Tests | STAT 462
    To test that all of the slope parameters in a multiple linear regression model are 0, we use the overall F-test reported in the analysis of variance table.
  36. [36]
    The F-test for Linear Regression
    The F-test for linear regression tests whether any of the independent variables in a multiple linear regression model are significant. Definitions for ...Missing: applications | Show results with:applications
  37. [37]
    How to Interpret the F-test of Overall Significance in Regression ...
    The F-test sums the predictive power of all independent variables and determines that it is unlikely that all of the coefficients equal zero.
  38. [38]
    6.2 - The General Linear F-Test | STAT 501
    The "general linear F-test" involves three basic steps, namely: Define a larger full model. (By "larger," we mean one with more parameters.) ...Missing: key | Show results with:key
  39. [39]
    [PDF] F-tests and Nested Models - Rose-Hulman
    Manually compute the F-statistic for testing Ho: β1 = 0 vs. Ha: β1 6= 0. vi. Determine the numerator and denominator degrees of freedom for your F-statistic.Missing: applications | Show results with:applications
  40. [40]
    14.1 Nested Model Tests | A Guide on Data Analysis - Bookdown
    The F-Test is commonly used in linear regression to evaluate the joint significance of multiple coefficients. It compares the fit of the restricted and ...
  41. [41]
    9 A Test for Comparing Nested Models | Applied regression analysis
    In this lecture I shall discuss a hypothesis test (a F test) that will do just that. It is important to remember that we can only use this test when the models ...
  42. [42]
    [PDF] STAT 224 Lecture 4 Multiple Linear Regression, Part 3
    When two models are nested (Model 1 is nested in Model 2),. • the simpler ... • alternatively, one can conduct an F-test comparing the models. Full model ...
  43. [43]
    [PDF] Non-normal data: Is ANOVA still a valid option? - Psicothema
    However, previous studies have found that F-test is sensitive to violations of homogeneity assumptions (Alexander & Govern, 1994; Blanca, Alarcón, Arnau ...
  44. [44]
    Test for high-dimensional regression coefficients using refitted cross ...
    However, if p is greater than n in the high-dimensional problems, the F-test is no longer applicable. Zhong and Chen [24] showed that the power of the F-test is ...
  45. [45]
    1.3.5.10. Levene Test for Equality of Variances
    Levene's test ( Levene 1960) is used to test if k samples have equal variances. Equal variances across samples is called homogeneity of variance. Some ...
  46. [46]
    Levene, H. (1960) Robust Tests for Equality of Variances. In Olkin, I ...
    The present study presents a method used to merge historical precipitation data with the latest data collected by satellite in order to perform graphs with IDF ...
  47. [47]
    Robust Tests for the Equality of Variances - Taylor & Francis Online
    Alternative Levene's test statistics for equality of variances are robust under nonnormality, using more robust estimators of central location.
  48. [48]
    Bootstrap Methods for Testing Homogeneity of Variances
    Mar 23, 2012 · This article describes the use of bootstrap methods for the problem of testing homogeneity of variances when means are not assumed equal or ...
  49. [49]
    Default Bayes factors for ANOVA designs - ScienceDirect.com
    In this paper, we discuss and expand a set of default Bayes factor tests for ANOVA designs. These tests are based on multivariate generalizations of Cauchy ...Default Bayes Factors For... · Introduction · Bayes Factor For Anova...