Fact-checked by Grok 2 weeks ago

G -test

The G-test, also known as the or log-likelihood ratio test (often denoted as G²), is a statistical method used to evaluate whether the observed frequencies in categorical data conform to expected frequencies under a of a specified . It is commonly applied to test goodness-of-fit for a single nominal variable with multiple categories or independence between two nominal variables, serving as a robust alternative to the . The test statistic is computed as G = 2 ∑ Oᵢ ln(Oᵢ / Eᵢ), where Oᵢ represents the observed count in category i and Eᵢ the corresponding expected count, with the sum taken over all categories; under the and for sufficiently large samples, this statistic asymptotically follows a with equal to the number of categories minus one (or adjusted for estimated parameters). This formulation derives from the general likelihood-ratio testing framework, originally formalized by Samuel S. Wilks in 1938 to assess composite hypotheses in large samples. In practice, the G-test is favored in fields such as , , and for analyzing contingency tables, as it provides additive results across hierarchical models and performs well with moderate sample sizes or when expected counts are low (though exact tests are recommended if any Eᵢ < 5). For instance, it can test whether observed proportions of categories (e.g., species distributions or allele frequencies) deviate significantly from theoretical expectations like a uniform or Mendelian ratio, yielding a p-value to reject or retain the null. Compared to Pearson's chi-squared test (χ² = ∑ (Oᵢ - Eᵢ)² / Eᵢ), the G-test often yields similar p-values but is theoretically superior for likelihood-based inference and handling outliers or skewed data, though both approximate the same distribution asymptotically. Its widespread adoption stems from computational ease in modern software and its alignment with maximum likelihood estimation principles.

Overview

Definition

The G-test, also known as the likelihood ratio test or G² test, is a statistical method employed to assess the goodness of fit between observed counts and expected frequencies across categories of a nominal variable in categorical data analysis. It evaluates whether the observed data conform to a specified theoretical distribution, such as one derived from a biological model or population proportions. As an alternative to Pearson's chi-squared test, the G-test is grounded in maximum likelihood estimation and utilizes the logarithm of the ratio of likelihoods to measure discrepancies, offering similar results to the chi-squared test but with advantages in additivity for nested models. The null hypothesis tested by the G-test states that the observed frequencies match the expected frequencies under a multinomial distribution, implying no significant deviation from the hypothesized proportions. Under this null hypothesis, the G-test statistic asymptotically follows a chi-squared distribution.

Historical Development

The principles of the G-test originate from the foundational work on likelihood-ratio testing introduced by R.A. Fisher in the 1920s, particularly through his development of as a method for parameter estimation in statistical models, including . This laid the groundwork for comparing observed data to expected distributions under multinomial assumptions, enabling tests of goodness-of-fit and independence. The formal framework, upon which the G-test is based, was further refined by and in the 1930s as part of their contributions to theory. The G-test itself, often denoted as the likelihood ratio statistic G^2, was formalized for applications in multinomial contexts during the mid-20th century, with its asymptotic chi-squared distribution established via in 1938, providing a rigorous basis for inference in contingency table analysis. It gained practical prominence in biological and ecological statistics through the 1981 edition of by Robert R. Sokal and F. James Rohlf, who recommended it as a superior alternative to traditional methods for testing nominal variables. In the 1990s, Alan Agresti played a key role in introducing and popularizing the G-test within the broader field of categorical data analysis, emphasizing its utility for log-linear models and contingency tables in his seminal textbook, where it was presented as a core tool for assessing model fit and associations. This integration helped establish the G-test as a standard in social sciences, epidemiology, and related disciplines. Post-2000, the G-test experienced wider adoption amid growing critiques of the chi-squared test's performance with small expected frequencies or sparse data, favoring likelihood-based approaches for their additivity and accuracy in complex designs. By the early 2020s, enhancements in computational efficiency, such as algorithms that decompose the G-statistic into reusable joint entropy terms, have extended its applicability to big data scenarios in causal inference and feature selection.

Mathematical Formulation

General Formula

The G-test statistic, also known as the likelihood ratio test statistic for categorical data, is computed as G = 2 \sum_{i=1}^k O_i \ln \left( \frac{O_i}{E_i} \right), where O_i represents the observed frequency in category i, E_i is the expected frequency under the null hypothesis for the same category, \ln denotes the natural logarithm, and the sum is taken over all k categories. Here, the O_i values are the empirical counts obtained from the sample data, while the E_i values are derived from the hypothesized probability model, such as a uniform distribution where E_i = n / k for total sample size n, or a Poisson distribution for modeling count data under specified rates. For the statistic to be computable, all O_i must be positive to avoid undefined logarithms; however, when O_i = 0, the contribution O_i \ln(O_i / E_i) is conventionally defined as 0 by taking the limit as O_i approaches 0 from above, provided E_i > 0. If any E_i = 0 while the corresponding O_i > 0, then G becomes infinite, immediately rejecting the . Under the null hypothesis of no difference between observed and expected frequencies, G asymptotically follows a (detailed in the Asymptotic Distribution section).

Derivation from Likelihood Ratio

The G-test statistic arises as the likelihood-ratio test statistic for testing goodness-of-fit in a multinomial model. Consider observed frequencies O_1, \dots, O_k from a with total sample size n = \sum O_i and that the probabilities are fixed at \tilde{p}_1, \dots, \tilde{p}_k, yielding expected frequencies E_i = n \tilde{p}_i. The likelihood under the null hypothesis is L(\tilde{p}) = \frac{n!}{\prod_{i=1}^k O_i!} \prod_{i=1}^k \tilde{p}_i^{O_i}. The maximum likelihood estimates under the unrestricted alternative are \hat{p}_i = O_i / n, giving the maximized likelihood L(\hat{p}) = \frac{n!}{\prod_{i=1}^k O_i!} \prod_{i=1}^k \left( \frac{O_i}{n} \right)^{O_i}. The likelihood ratio is then \Lambda = \frac{L(\tilde{p})}{L(\hat{p})} = \prod_{i=1}^k \left( \frac{E_i}{O_i} \right)^{O_i}. Taking the natural logarithm yields \ln \Lambda = \sum_{i=1}^k O_i \ln (E_i / O_i), so the test statistic is G = -2 \ln \Lambda = 2 \sum_{i=1}^k O_i \ln \left( \frac{O_i}{E_i} \right). This form follows directly from the log-likelihood ratio, where the multinomial constant terms cancel in the ratio. The conventional factor of 2 in [G](/page/G) standardizes the statistic for use in likelihood-ratio testing frameworks, aligning it with twice the difference in log-likelihoods between the restricted and unrestricted models. This derivation positions the G-test as a specific instance of the general likelihood-ratio principle applied to discrete categorical data.

Statistical Properties

Asymptotic Distribution

Under the , the G-test statistic G = -2 \ln \Lambda, where \Lambda is the likelihood ratio, asymptotically follows a with df equal to the number of categories minus one minus the number of parameters estimated under the (often df = k - 1 for fully specified null probabilities), as the sample size n \to \infty. This result stems from , which establishes the large-sample distribution of the statistic for testing composite hypotheses in . The chi-squared approximation becomes valid when expected frequencies are sufficiently large (typically at least 5 per cell) and sample sizes are moderate to large, ensuring the applies to the multinomial or sampling framework underlying the . Critical values from the with the appropriate are then used to construct rejection regions; for example, at a significance level of \alpha = 0.05, the test rejects the null if G > \chi^2_{df, 1-\alpha}, providing a basis for computation in large samples. Monte Carlo simulations confirm the reliability of this asymptotic approximation, particularly for sample sizes exceeding 1,000 observations, where the type I error rates of the G-test closely match nominal levels under the . In smaller samples (n \leq 1,000), simulations reveal inflated type I error rates for the G-test compared to , highlighting limitations of the approximation when expected frequencies are low, though both tests converge for larger n. As of 2025, computational advances in exact inference, such as simulations for p-values, have supplemented the asymptotic approach for small samples but have not altered the foundational for the G-test. The validity of the requires standard regularity conditions, such as identifiable parameters and positive expected frequencies, as detailed in the assumptions section.

Assumptions and Conditions for Use

The G-test requires independent observations, ensuring that the outcome of one trial does not affect others, as this underpins the validity of the multinomial model used in the test. Data must also follow multinomial sampling, where a fixed total number of trials results in counts across mutually exclusive categories, with the specifying the expected probabilities for each category. To ensure the chi-squared approximation to the test statistic's distribution is reliable, expected frequencies E_i should meet certain thresholds: generally, E_i \geq 5 for most cells, with no cell having E_i < 1 and no more than 20% of cells having E_i < 5, per guidelines adapted from for categorized data tests. These conditions help maintain the asymptotic properties of the test, though violations may necessitate adjustments or alternative approaches. For small samples, defined as total sample size n < 1000 or sparse data with many low expected frequencies, the G-test's large-sample approximation can be inaccurate, leading to unreliable p-values. In such cases, exact tests like or simulation-based methods are recommended over the G-test. For moderately small samples, Williams' correction can be applied, which scales the test statistic by a factor involving the degrees of freedom and sample size to better approximate the . The G-test is preferable to the Pearson chi-squared test in likelihood-based frameworks, as it directly computes the likelihood ratio, aligning naturally with maximum likelihood estimation and providing additivity for complex designs. It also performs better than the chi-squared test when category probabilities are unequal (leading to disparate variances proportional to expected counts) or when the number of categories is large but the distribution is concentrated, such as in Poisson-like scenarios.

Relations to Other Concepts

Relation to Chi-Squared Test

The Pearson chi-squared test, developed by Karl Pearson in 1900, computes the test statistic as
X^2 = \sum_i \frac{(O_i - E_i)^2}{E_i},
where O_i denotes the observed frequency in category i and E_i the expected frequency under the null hypothesis. This statistic measures deviations between observed and expected counts, scaled by the expected values.
The G-test statistic, G = 2 \sum_i O_i \ln \left( \frac{O_i}{E_i} \right), approximates X^2 under conditions of large sample sizes and roughly equal expected frequencies. This equivalence arises from a second-order Taylor expansion of the natural logarithm around 1:
\ln \left( \frac{O_i}{E_i} \right) \approx \frac{O_i - E_i}{E_i} - \frac{1}{2} \left( \frac{O_i - E_i}{E_i} \right)^2.
Substituting into the G-test formula and simplifying, noting that \sum_i (O_i - E_i) = 0, yields G \approx X^2.
Despite their asymptotic equivalence and shared reference to the chi-squared distribution under the null hypothesis, the tests differ in finite-sample performance. The G-test is less sensitive to extreme deviations due to its logarithmic form. In biomedical contexts with n \leq 40 and over 20% of cells having E_i \leq 5, Pearson's test has inadequate performance with inflated type I error rates, whereas the (often with Williams' correction) is more robust. These advantages hold particularly for unequal expectations where Pearson's quadratic form amplifies outliers.

Relation to Likelihood-Ratio Test

The likelihood-ratio test (LRT) is a general framework for hypothesis testing that compares the goodness-of-fit of two nested statistical models: a full (unrestricted) model and a reduced (restricted) model corresponding to the null hypothesis. The test statistic is given by \Lambda = 2 \ln \left( \frac{L_{\text{full}}}{L_{\text{reduced}}} \right), where L_{\text{full}} and L_{\text{reduced}} are the maximized likelihoods under the full and reduced models, respectively; under the null hypothesis and suitable regularity conditions, -2 \ln \Lambda asymptotically follows a chi-squared distribution with degrees of freedom equal to the difference in the number of free parameters between the models. The G-test, also known as the log-likelihood ratio test, is a specific application of this LRT framework to multinomial models for categorical data, such as in goodness-of-fit or independence testing for contingency tables. In these settings, the observed counts are modeled as multinomial random variables, and the G-test statistic G^2 takes the form G^2 = 2 \sum_i o_i \ln (o_i / e_i), where o_i are the observed frequencies and e_i the expected frequencies under the null; this directly corresponds to \Lambda evaluated for the multinomial likelihood, providing a measure of discrepancy between observed and expected categorical distributions. A key feature of the G-test within the LRT paradigm is the clear nesting of models: the reduced model imposes the null hypothesis restrictions (e.g., specified proportions for goodness-of-fit or independence constraints for contingency tables), while the full model is the saturated multinomial model, which estimates a separate parameter for each category and thus fits the observed data perfectly, yielding L_{\text{full}} = \prod_i (o_i^{o_i} / o_i!) up to constants. This structure ensures the test's asymptotic chi-squared distribution is well-defined, with the exact nesting facilitating straightforward computation. In categorical data analysis, this application of the LRT via the G-test offers advantages over other discrepancy measures, as the precise model nesting leads to interpretable degrees of freedom given by df = k - p, where k is the number of categories and p is the number of free parameters in the reduced (null) model; for instance, in a simple multinomial goodness-of-fit test with fixed proportions, df = k - 1, and for an r \times c independence test, df = (r-1)(c-1). This interpretability enhances the G-test's utility in assessing model adequacy for discrete data without requiring ad hoc adjustments.

Relation to Kullback-Leibler Divergence

The G-test statistic provides an information-theoretic measure of the discrepancy between observed and expected categorical data through its direct connection to the . The KL divergence between two discrete probability distributions P and Q over categories i is defined as D_{\text{KL}}(P \parallel Q) = \sum_i p_i \ln \left( \frac{p_i}{q_i} \right), where the logarithm is base e, yielding a measure in units of nats. For the G-test, let O_i denote the observed counts and E_i the expected counts under the null hypothesis, with total sample size n = \sum_i O_i. The empirical probabilities are \hat{p}_i = O_i / n and the hypothesized probabilities are p_{0i} = E_i / n. The G statistic is then given by G = 2n \, D_{\text{KL}}(\hat{p} \parallel p_0) = 2 \sum_i O_i \ln \left( \frac{O_i}{E_i} \right). This formulation interprets G as twice the total KL divergence between the empirical and hypothesized distributions, scaled by the sample size, highlighting how the test assesses the "information loss" when approximating the observed distribution by the expected one. Under the null hypothesis, where the true distribution matches the hypothesized p_0, the statistic G follows an asymptotic chi-squared distribution with degrees of freedom equal to the number of categories minus the number of estimated parameters. This convergence arises because, for large n, the scaled KL divergence $2n \, D_{\text{KL}}(\hat{p} \parallel p_0) behaves like a quadratic form in the deviations between \hat{p} and p_0, aligning with the chi-squared approximation from likelihood ratio test theory.

Relation to Mutual Information

The mutual information I(X; Y) between two discrete random variables X and Y taking values in finite sets quantifies the amount of information one variable contains about the other, defined as the between the joint distribution and the product of the marginals: I(X; Y) = \sum_{i=1}^r \sum_{j=1}^c p_{ij} \ln \left( \frac{p_{ij}}{p_{i \cdot} p_{\cdot j}} \right), where p_{ij} = P(X=i, Y=j), p_{i \cdot} = \sum_j p_{ij}, p_{\cdot j} = \sum_i p_{ij}, and the table has r rows and c columns. For the G-test of independence applied to an observed r \times c contingency table with cell counts n_{ij} and total sample size n = \sum_i \sum_j n_{ij}, the test statistic equals twice the sample size times the empirical mutual information: G = 2n \hat{I}(X; Y) = 2 \sum_{i=1}^r \sum_{j=1}^c n_{ij} \ln \left( \frac{n_{ij} n}{n_{i \cdot} n_{\cdot j}} \right), where \hat{I}(X; Y) = \frac{1}{n} \sum_i \sum_j n_{ij} \ln \left( \frac{n_{ij}/n}{(n_{i \cdot}/n)(n_{\cdot j}/n)} \right) uses the empirical joint and marginal probabilities. Under the null hypothesis of independence, I(X; Y) = 0, so \hat{I}(X; Y) = 0 in large samples, and the asymptotically follows a chi-squared distribution with (r-1)(c-1) degrees of freedom, providing a basis for p-value computation and hypothesis testing. This connection frames the within information theory, where it assesses statistical dependence by measuring the reduction in entropy of one variable given knowledge of the other, beyond the information from the marginal distributions alone.

Applications

Goodness-of-Fit Testing

The G-test of goodness-of-fit evaluates whether the observed frequencies of a categorical variable align with those expected under a specified theoretical distribution, such as a uniform distribution or binned probabilities from a . In this setup, hypothesized probabilities p_i are defined for each of the k categories, and expected frequencies are computed as E_i = n p_i, where n is the total sample size. For instance, under a uniform hypothesis, each p_i = 1/k; for binned , p_i derives from the probability mass function evaluated at category midpoints using the estimated rate parameter. The procedure involves calculating the test statistic G = 2 \sum_{i=1}^k O_i \ln \left( \frac{O_i}{E_i} \right), where O_i denotes the observed frequency in category i. This statistic measures the deviation between observed and expected counts via the log-likelihood ratio. Under the null hypothesis of a good fit, G follows an approximate chi-squared distribution with k-1 degrees of freedom for large samples. To conduct the test, compute G and either compare it to the critical value from the \chi^2_{k-1} table at the desired significance level (e.g., 0.05) or derive the p-value; rejection occurs if G exceeds the critical value or the p-value is sufficiently low. A continuity correction, such as Williams' adjustment G_w = G / \left(1 + \frac{k-1}{6n}\right), may be applied for small expected frequencies to improve accuracy. In genetics, the G-test is commonly applied to assess Hardy-Weinberg equilibrium by comparing observed genotype counts (e.g., AA, Aa, aa) against expected frequencies under random mating assumptions, aiding detection of population structure or selection pressures. For example, with allele frequency p for A and q = 1 - p for a, expected proportions are p^2, $2pq, and q^2, enabling tests on biallelic markers in large genomic datasets.

Test of Independence

The G-test of independence assesses whether two categorical variables exhibit an association within an r × c contingency table, where observed frequencies O_{ij} are cross-classified by row and column categories. Under the null hypothesis of independence, expected frequencies are derived from the marginal totals as E_{ij} = \frac{O_{i \cdot} O_{\cdot j}}{n}, with O_{i \cdot} denoting the i-th row total, O_{\cdot j} the j-th column total, and n the overall sample size. This setup tests the fit of the independence model against the observed data, providing a likelihood-based measure of deviation. The test statistic is computed as G = 2 \sum_{i=1}^r \sum_{j=1}^c O_{ij} \ln \left( \frac{O_{ij}}{E_{ij}} \right), where the summation occurs over all cells, and terms with O_{ij} = 0 are conventionally omitted or handled via continuity corrections for small samples. Asymptotically, G follows a \chi^2 distribution with (r-1)(c-1) degrees of freedom, enabling p-value calculation to evaluate evidence against independence; the test performs well even with moderate sample sizes, often outperforming the Pearson chi-squared test for sparse tables due to its multiplicative structure. Extensions to stratified contingency tables involve pooling G statistics across strata to test for overall or partial association, adjusting for confounding factors by fitting hierarchical log-linear models that compare nested hypotheses via likelihood ratios. For ordinal data, the G-test accommodates ordered categories through score-based parameters in generalized linear models, allowing detection of monotonic trends while maintaining the likelihood ratio framework. In epidemiology, the G-test is applied in studies of patient outcomes with missing values, where it integrates with multiple imputation to test associations in contingency tables; for example, as of 2025, it has been used to assess associations between race/ethnicity and flu vaccination.

Practical Aspects

Illustrative Examples

To illustrate the application of the G-test for goodness-of-fit, consider testing whether a six-sided die is fair based on 30 rolls. The observed frequencies for each face are as follows:
FaceObserved (O)
13
27
35
410
52
63
Under the null hypothesis of a fair die, the expected frequency E_i for each face is $30 / 6 = 5. The G-test statistic is calculated as G = 2 \sum O_i \ln(O_i / E_i), yielding contributions (O_i \ln(O_i / E_i)) of approximately -1.53 for face 1, 2.35 for face 2, 0 for face 3, 6.93 for face 4, -1.83 for face 5, and -1.53 for face 6, for a total sum of ≈4.39 and thus G \approx 8.78. With 5 degrees of freedom (6 categories minus 1), the p-value is approximately 0.118, failing to reject the null hypothesis at \alpha = 0.05. To assess the contribution of individual categories to the test statistic, standardized residuals are computed as (O_i - E_i) / \sqrt{E_i}. These are approximately -0.89 for face 1, 0.89 for face 2, 0 for face 3, 2.24 for face 4, -1.34 for face 5, and -0.89 for face 6. The largest absolute residual (2.24 for face 4) indicates this category deviates most from expectation, though none exceed the threshold of about 2 in absolute value typically associated with notable contributions in larger samples. For an example of the G-test of independence in a 2×2 contingency table, consider data on bicycle helmet use and type of cyclist injury from a study of 6,745 cyclists involved in crashes. The observed counts are:
Helmet UseHead InjuryOther InjuryTotal
Yes3724,7155,087
No2671,3911,658
Total6396,1066,745
Expected values are calculated as row total × column total / grand total, giving approximately 482 for helmeted head injuries, 4,605 for helmeted other injuries, 157 for non-helmeted head injuries, and 1,501 for non-helmeted other injuries. The G-test statistic is G = 2 \sum O_{ij} \ln(O_{ij} / E_{ij}), resulting in G \approx 101.5 with 1 degree of freedom ((2-1)×(2-1)). The p-value is approximately $7 \times 10^{-24}, rejecting the null hypothesis of independence and indicating helmet use is associated with lower head injury rates. Standardized residuals for this table are approximately -5.0 for helmeted head injuries, 1.6 for helmeted other injuries, 8.8 for non-helmeted head injuries, and -2.8 for non-helmeted other injuries. The large absolute residuals (exceeding 2) for head injuries highlight the primary source of deviation, with non-helmeted cyclists showing a substantially higher proportion of head injuries.

Limitations and Alternatives

The G-test encounters significant limitations when expected frequencies E_i are small, typically less than 5, as the asymptotic chi-squared approximation becomes unreliable, leading to inaccurate p-values and reduced validity of the test. In cases involving zero observed or expected counts, the logarithmic terms in the formula—specifically O_i \log(O_i / E_i)—require the mathematical convention that \lim_{x \to 0^+} x \log x = 0, allowing computation by omitting such terms or treating them as zero contributions; however, this does not fully mitigate the approximation's poor performance in sparse tables. Additionally, the G-test is sensitive to model misspecification, such as incorrect assumptions about the underlying multinomial distribution, which can inflate type I error rates or diminish power if the fitted model deviates substantially from the true data-generating process. For small samples or sparse data, alternatives like provide non-asymptotic exact p-values by enumerating all possible tables under the null, avoiding reliance on approximations. In 2×2 contingency tables, offers higher power than Fisher's exact test while maintaining control over the type I error rate, particularly when marginal totals are not fixed. Bootstrap methods serve as a robust non-parametric alternative for generating empirical distributions of the test statistic, suitable when asymptotic assumptions fail. The G-test should be avoided in highly sparse datasets, where many cells have low expected counts, or when extending to non-categorical data like continuous variables discretized ad hoc, as these violate the multinomial assumptions and exacerbate approximation errors.

Software Implementations

In R and Python

The G-test is readily implemented in R using specialized packages for categorical data analysis. The vcd package provides the assocstats function, which computes the likelihood ratio chi-squared statistic (G²) alongside the Pearson chi-squared for contingency tables, offering a direct way to perform the test of independence. For goodness-of-fit tests, the goodfit function in the same package calculates G² by fitting discrete distributions to count data. Alternatively, the DescTools package offers the GTest function for both contingency tables and goodness-of-fit scenarios, returning the G statistic, degrees of freedom, and p-value. An example for a 2x2 contingency table in R using vcd::assocstats is as follows:
r
library(vcd)
cont_table <- matrix(c(10, 20, 30, 40), nrow = 2, byrow = TRUE)
assoc_result <- assocstats(cont_table)
print(assoc_result$chisq_tests["Likelihood ratio", "statistic"])  # G² statistic
print(assoc_result$chisq_tests["Likelihood ratio", "p.value"])    # p-value
This yields the G² value and associated p-value, confirming or rejecting independence. For manual computation without packages, one can use base R functions like log and summation over observed and expected frequencies, though packages are recommended for accuracy and handling. The base stats::chisq.test with correct = FALSE computes the Pearson chi-squared but not the G-test directly. In Python, the G-test for contingency tables is supported in SciPy via the chi2_contingency function with the lambda_='log-likelihood' parameter, which computes the likelihood ratio statistic and p-value. For goodness-of-fit, scipy.stats.power_divergence with the same lambda option applies the test to observed counts against expected probabilities. Manual implementation is straightforward using NumPy for the core formula, enabling customization. Here is an example for the same 2x2 contingency table in Python:
python
import numpy as np
from scipy.stats import chi2_contingency

obs = np.array([[10, 20], [30, 40]])
g_stat, p_value, dof, expected = chi2_contingency(obs, lambda_='log-likelihood')
print(f"G² statistic: {g_stat}")
print(f"p-value: {p_value}")
For a manual NumPy-based computation:
python
def g_test_manual(observed, expected):
    mask = observed > 0
    return 2 * np.sum(observed[mask] * np.log(observed[mask] / expected[mask]))

expected = np.array([[12, 18], [20, 30]])  # Corrected expected under [independence](/page/Independence)
g_manual = g_test_manual(obs, expected)
print(f"Manual G²: {g_manual}")
When dealing with zero counts, which can lead to undefined logarithms, a standard correction adds 0.5 to all cells in observed and expected tables (Haldane-Anscombe adjustment) before computation; this is implemented in some functions or applied manually to improve stability for small samples. As of 2025, libraries like StatsModels offer optimized tools for analysis, including tests for , which can complement SciPy's G-test implementations for larger datasets.

In SAS, Stata, and Other Tools

In SAS, the G-test is implemented through the FREQ procedure with the CHISQ option, which computes the likelihood ratio chi-square statistic as G^2 = 2 \sum O \ln(O/E), where O are observed frequencies and E are expected frequencies under the null hypothesis. This statistic is labeled "Likelihood Ratio Chi-Square" in the output and follows a chi-square distribution asymptotically, providing p-values for goodness-of-fit or independence tests. For example, the syntax PROC FREQ DATA=dataset; TABLES var1*var2 / CHISQ; RUN; generates the test alongside Pearson's chi-square for comparison. In , the G-test is available via the tabulate command for two-way tables, using the lrchi2 option to output the likelihood-ratio , equivalent to G^2. This tests for between categorical variables, with results stored in r(chi2_lr) and r(p_lr). For instance, tabulate var1 var2, lrchi2 produces the and its , suitable for tables without weights. The command supports survey data adjustments via svy: tabulate for design-based corrections. In other statistical software, such as , the G-test appears as the "Likelihood Ratio" under the statistics in the Crosstabs procedure, computed similarly as $2 \sum O \ln(O/E) for tests of or goodness-of-fit. Users select it via Analyze > Descriptive Statistics > Crosstabs, then Statistics > , yielding the statistic, , and . In , while no built-in function exists specifically for categorical G-tests, the general lratiotest function supports likelihood ratio comparisons for nested models, adaptable for analysis via custom log-likelihood computation.

References

  1. [1]
    G-test of Goodness of Fit: Definition + Example - Statology
    Mar 15, 2021 · In statistics, the G-test of Goodness of Fit is used to determine whether or not some categorical variable follows a hypothesized distribution.
  2. [2]
  3. [3]
    G–test of goodness-of-fit - Handbook of Biological Statistics
    Jul 20, 2015 · G–tests are a subclass of likelihood ratio tests, a general category of tests that have many uses for testing the fit of data to mathematical ...
  4. [4]
    The Large-Sample Distribution of the Likelihood Ratio for Testing ...
    March, 1938 The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses. S. S. Wilks · DOWNLOAD PDF + SAVE TO MY LIBRARY.
  5. [5]
    Convex Loss Applied to Design in Regression Problems on JSTOR
    **Summary of G-test in Power Divergence Statistics:**
  6. [6]
    Accelerating Causal Inference and Feature Selection Methods ...
    This article presents a novel and remarkably efficient method of computing the statistical G-test made possible by exploiting a connection with the fundamental ...1. Introduction · 7.2. Data Sets · Appendix A. Analysis
  7. [7]
    [PDF] Lecture 32-33. Pearson's χ2 Test For Multinomial Data
    To construct the generalized likelihood ratio test, first, we need to determine the likelihood function L(p). In this case: L(p1,..., pk ) = πX (X|n, p) = n ...
  8. [8]
    [PDF] Chapter 7
    Sep 24, 2001 · The Chi-squared statistics are derived from the likelihood ratio test and the test can as well be applied in the form, reject if –2logeλ > K ...
  9. [9]
    [PDF] A Monte Carlo comparison of categorical tests of independence - arXiv
    Apr 2, 2020 · The X2 and G2 tests are the most frequently applied tests for testing the independence of two categorical variables.
  10. [10]
    G–test of independence - Handbook of Biological Statistics
    Jul 20, 2015 · Unlike the chi-square test, G-values are additive, which means they can be used for more elaborate statistical designs.
  11. [11]
    Goodness-of-fit tests for categorized data- Principles - InfluentialPoints
    Pearson's chi square test is the oldest and most frequently used goodness of fit test. The likelihood ratio G-test is an alternative method which has been ...Missing: history origin<|separator|>
  12. [12]
    G likelihood-ratio test - InfluentialPoints
    The G-test of independence is a likelihood ratio test which tests the goodness of fit of observed frequencies to their expected frequencies.
  13. [13]
    G-test vs Pearson's $\chi^2$ test - Cross Validated - Stack Exchange
    Aug 28, 2015 · They are asymptotically the same. They are just different ways of getting at the same idea. More specifically, Pearson's χ2 test is a score ...Why is the chi-square test more popular than the G-test?Asymptotic chi-squared distribution of likelihood ratio statistic in ...More results from stats.stackexchange.com
  14. [14]
    Why is the chi-square test more popular than the G-test?
    Aug 2, 2020 · Likelihood ratio tests were not invented until decades after Pearson's paper on the chi-squared test. The awkwardness of computing the ...Trouble interpreting the likelihood ratio chi-squared test statisticG-test vs Pearson's $\chi^2$ test - Cross Validated - Stack ExchangeMore results from stats.stackexchange.comMissing: origin | Show results with:origin
  15. [15]
    [PDF] USP: Independence Test Improves Pearson's Chi-squared & G-test
    Jan 26, 2021 · To complement these theoretical results, we present several numerical comparisons between the USP test and both Pearson's test and the G-test, ...
  16. [16]
  17. [17]
    On Information and Sufficiency - Project Euclid
    Project Euclid, Open Access March, 1951, On Information and Sufficiency, S. Kullback, RA Leibler, DOWNLOAD PDF + SAVE TO MY LIBRARY.
  18. [18]
    [PDF] The Two-Way Likelihood Ratio (G) Test
    Mar 4, 2009 · Sokal and F. James Rohlf. Biometry: The Principles and Practices of Statistics in. Biological Research. W.H. Freeman, 3 edition, 1994. 5.
  19. [19]
    [PDF] Mutual information of Contingency Tables and Related Inequalities
    Feb 1, 2014 · We note that G2 equals the mutual information between the indices i and j times 2n when the empirical distributionˆP is used as joint ...
  20. [20]
    Testing for Hardy-Weinberg Equilibrium in Samples With Related ...
    Classical approaches to testing for HWE include Pearson's χ2 goodness-of-fit test (Gof-HW) and the corresponding likelihood-ratio test for unrelated individuals ...
  21. [21]
    A Note on Exact Tests of Hardy-Weinberg Equilibrium - ScienceDirect
    Tests of HWE are commonly performed using a simple χ2 goodness-of-fit test. We show that this χ2 test can have inflated type I error rates, even in ...
  22. [22]
    A generalized likelihood ratio test for monitoring profile data
    Profile data emerges when the quality of a product or process is characterized by a functional relationship among (input and output) variables.
  23. [23]
    2.4 - Goodness-of-Fit Test | STAT 504
    Likelihood-ratio Test Statistic. \(G^2 = -2\log\left(\dfrac{\ell_0}{\ell_1}\right) = -2\left(L_0 - L_1\right)\). Note that \(X^2\) and \(G^2\) are both ...Test Statistics · Likelihood-Ratio Test... · Example: Dice Rolls
  24. [24]
    Goodness of fit likelihood ratio test with zero values - Cross Validated
    Jul 12, 2014 · I have a vector of observed frequencies that have zero values in some cells and a vector of expected frequencies generated by a model.
  25. [25]
    [PDF] Comparison of Chi-Square and Likelihood Ratio Chi-Square Tests:
    On the contrary many authors generally prefer G statistics to Chi-Square statistics, power of test should be examined as well as composition of the cell ...
  26. [26]
    Errors in Statistical Inference Under Model Misspecification
    Misspecification can have a host of causes, including omission of real covariates, inclusion of spurious covariates, incorrect specification of functional form, ...Missing: limitations | Show results with:limitations
  27. [27]
  28. [28]
    [PDF] DescTools.pdf
    The package con- tains furthermore functions to produce documents using MS Word (or PowerPoint) and func- tions to import data from Excel. Many of the included ...
  29. [29]
    chi2_contingency — SciPy v1.16.2 Manual
    This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table.
  30. [30]
    Contingency tables - statsmodels 0.14.4
    A contingency table is a multi-way table that describes a data set in which each observation belongs to one category for each of several variables.Independence · Symmetry And Homogeneity · Stratified 2x2 TablesMissing: G- | Show results with:G-
  31. [31]
    Chi-Square Tests and Statistics - SAS Help Center
    Sep 29, 2025 · Chi-square tests include Pearson, likelihood ratio, and Mantel-Haenszel, used for homogeneity or independence, and measures of association.
  32. [32]
    Proc freq | SAS Annotated Output - OARC Stats - UCLA
    Likelihood Ratio Chi-Square – This involves the ratio between the observed and the expected frequencies, whereas the ordinary chi-square test involves the ...
  33. [33]
    [PDF] The FREQ Procedure - SAS Support
    PROC FREQ computes the one-way likelihood ratio test as. G. 2 D 2. C. X. iD1 fi ln.fi =ei / where fi is the observed frequency of class i, and ei is the ...
  34. [34]
    [PDF] tabulate twoway — Two-way table of frequencies - Stata
    tabulate produces a two-way table of frequency counts, along with various measures of association, including the common Pearson's 𝜒2, the likelihood-ratio 𝜒2, ...Missing: G- | Show results with:G-
  35. [35]
    [PDF] svy: tabulate twoway — Two-way tables for survey data - Stata
    When the lr option is specified, an identical correction is produced for the likelihood-ratio statistic X2. LR . When null is specified, (4) is also used ...
  36. [36]
    lratiotest - Likelihood ratio test of model specification - MATLAB
    The lratiotest function performs a likelihood ratio test of model specification, comparing nested models by assessing restrictions to an extended model.