G -test
The G-test, also known as the likelihood-ratio test or log-likelihood ratio test (often denoted as G²), is a statistical method used to evaluate whether the observed frequencies in categorical data conform to expected frequencies under a null hypothesis of a specified distribution.[1] It is commonly applied to test goodness-of-fit for a single nominal variable with multiple categories or independence between two nominal variables, serving as a robust alternative to the chi-squared test.[2] The test statistic is computed as G = 2 ∑ Oᵢ ln(Oᵢ / Eᵢ), where Oᵢ represents the observed count in category i and Eᵢ the corresponding expected count, with the sum taken over all categories; under the null hypothesis and for sufficiently large samples, this statistic asymptotically follows a chi-squared distribution with degrees of freedom equal to the number of categories minus one (or adjusted for estimated parameters).[3] This formulation derives from the general likelihood-ratio testing framework, originally formalized by Samuel S. Wilks in 1938 to assess composite hypotheses in large samples.[4] In practice, the G-test is favored in fields such as biology, genetics, and linguistics for analyzing contingency tables, as it provides additive results across hierarchical models and performs well with moderate sample sizes or when expected counts are low (though exact tests are recommended if any Eᵢ < 5).[2] For instance, it can test whether observed proportions of categories (e.g., species distributions or allele frequencies) deviate significantly from theoretical expectations like a uniform or Mendelian ratio, yielding a p-value to reject or retain the null.[1] Compared to Pearson's chi-squared test (χ² = ∑ (Oᵢ - Eᵢ)² / Eᵢ), the G-test often yields similar p-values but is theoretically superior for likelihood-based inference and handling outliers or skewed data, though both approximate the same distribution asymptotically.[3] Its widespread adoption stems from computational ease in modern software and its alignment with maximum likelihood estimation principles.[2]Overview
Definition
The G-test, also known as the likelihood ratio test or G² test, is a statistical method employed to assess the goodness of fit between observed counts and expected frequencies across categories of a nominal variable in categorical data analysis.[3] It evaluates whether the observed data conform to a specified theoretical distribution, such as one derived from a biological model or population proportions.[3] As an alternative to Pearson's chi-squared test, the G-test is grounded in maximum likelihood estimation and utilizes the logarithm of the ratio of likelihoods to measure discrepancies, offering similar results to the chi-squared test but with advantages in additivity for nested models.[3][5] The null hypothesis tested by the G-test states that the observed frequencies match the expected frequencies under a multinomial distribution, implying no significant deviation from the hypothesized proportions.[3] Under this null hypothesis, the G-test statistic asymptotically follows a chi-squared distribution.[3]Historical Development
The principles of the G-test originate from the foundational work on likelihood-ratio testing introduced by R.A. Fisher in the 1920s, particularly through his development of maximum likelihood estimation as a method for parameter estimation in statistical models, including multinomial distributions. This laid the groundwork for comparing observed data to expected distributions under multinomial assumptions, enabling tests of goodness-of-fit and independence. The formal likelihood ratio test framework, upon which the G-test is based, was further refined by Jerzy Neyman and Egon Pearson in the 1930s as part of their contributions to hypothesis testing theory. The G-test itself, often denoted as the likelihood ratio statistic G^2, was formalized for applications in multinomial contexts during the mid-20th century, with its asymptotic chi-squared distribution established via Wilks' theorem in 1938, providing a rigorous basis for inference in contingency table analysis. It gained practical prominence in biological and ecological statistics through the 1981 edition of Biometry by Robert R. Sokal and F. James Rohlf, who recommended it as a superior alternative to traditional methods for testing nominal variables. In the 1990s, Alan Agresti played a key role in introducing and popularizing the G-test within the broader field of categorical data analysis, emphasizing its utility for log-linear models and contingency tables in his seminal textbook, where it was presented as a core tool for assessing model fit and associations. This integration helped establish the G-test as a standard in social sciences, epidemiology, and related disciplines. Post-2000, the G-test experienced wider adoption amid growing critiques of the chi-squared test's performance with small expected frequencies or sparse data, favoring likelihood-based approaches for their additivity and accuracy in complex designs.[3] By the early 2020s, enhancements in computational efficiency, such as algorithms that decompose the G-statistic into reusable joint entropy terms, have extended its applicability to big data scenarios in causal inference and feature selection.[6]Mathematical Formulation
General Formula
The G-test statistic, also known as the likelihood ratio test statistic for categorical data, is computed as G = 2 \sum_{i=1}^k O_i \ln \left( \frac{O_i}{E_i} \right), where O_i represents the observed frequency in category i, E_i is the expected frequency under the null hypothesis for the same category, \ln denotes the natural logarithm, and the sum is taken over all k categories. Here, the O_i values are the empirical counts obtained from the sample data, while the E_i values are derived from the hypothesized probability model, such as a uniform distribution where E_i = n / k for total sample size n, or a Poisson distribution for modeling count data under specified rates.[2] For the statistic to be computable, all O_i must be positive to avoid undefined logarithms; however, when O_i = 0, the contribution O_i \ln(O_i / E_i) is conventionally defined as 0 by taking the limit as O_i approaches 0 from above, provided E_i > 0. If any E_i = 0 while the corresponding O_i > 0, then G becomes infinite, immediately rejecting the null hypothesis.[3] Under the null hypothesis of no difference between observed and expected frequencies, G asymptotically follows a chi-squared distribution (detailed in the Asymptotic Distribution section).Derivation from Likelihood Ratio
The G-test statistic arises as the likelihood-ratio test statistic for testing goodness-of-fit in a multinomial model.[7] Consider observed frequencies O_1, \dots, O_k from a multinomial distribution with total sample size n = \sum O_i and null hypothesis that the probabilities are fixed at \tilde{p}_1, \dots, \tilde{p}_k, yielding expected frequencies E_i = n \tilde{p}_i.[8] The likelihood under the null hypothesis is L(\tilde{p}) = \frac{n!}{\prod_{i=1}^k O_i!} \prod_{i=1}^k \tilde{p}_i^{O_i}. The maximum likelihood estimates under the unrestricted alternative are \hat{p}_i = O_i / n, giving the maximized likelihood L(\hat{p}) = \frac{n!}{\prod_{i=1}^k O_i!} \prod_{i=1}^k \left( \frac{O_i}{n} \right)^{O_i}. The likelihood ratio is then \Lambda = \frac{L(\tilde{p})}{L(\hat{p})} = \prod_{i=1}^k \left( \frac{E_i}{O_i} \right)^{O_i}. Taking the natural logarithm yields \ln \Lambda = \sum_{i=1}^k O_i \ln (E_i / O_i), so the test statistic is G = -2 \ln \Lambda = 2 \sum_{i=1}^k O_i \ln \left( \frac{O_i}{E_i} \right). This form follows directly from the log-likelihood ratio, where the multinomial constant terms cancel in the ratio.[7][8] The conventional factor of 2 in [G](/page/G) standardizes the statistic for use in likelihood-ratio testing frameworks, aligning it with twice the difference in log-likelihoods between the restricted and unrestricted models.[8] This derivation positions the G-test as a specific instance of the general likelihood-ratio principle applied to discrete categorical data.[7]Statistical Properties
Asymptotic Distribution
Under the null hypothesis, the G-test statistic G = -2 \ln \Lambda, where \Lambda is the likelihood ratio, asymptotically follows a chi-squared distribution with degrees of freedom df equal to the number of categories minus one minus the number of parameters estimated under the null hypothesis (often df = k - 1 for fully specified null probabilities), as the sample size n \to \infty. This result stems from Wilks' theorem, which establishes the large-sample distribution of the likelihood ratio test statistic for testing composite hypotheses in maximum likelihood estimation. The chi-squared approximation becomes valid when expected frequencies are sufficiently large (typically at least 5 per cell) and sample sizes are moderate to large, ensuring the central limit theorem applies to the multinomial or Poisson sampling framework underlying the G-test.[9] Critical values from the chi-squared distribution with the appropriate degrees of freedom are then used to construct rejection regions; for example, at a significance level of \alpha = 0.05, the test rejects the null if G > \chi^2_{df, 1-\alpha}, providing a basis for p-value computation in large samples.[9] Monte Carlo simulations confirm the reliability of this asymptotic approximation, particularly for sample sizes exceeding 1,000 observations, where the type I error rates of the G-test closely match nominal levels under the chi-squared distribution. In smaller samples (n \leq 1,000), simulations reveal inflated type I error rates for the G-test compared to Pearson's chi-squared test, highlighting limitations of the approximation when expected frequencies are low, though both tests converge for larger n. As of 2025, computational advances in exact inference, such as Monte Carlo simulations for p-values, have supplemented the asymptotic approach for small samples but have not altered the foundational chi-squared theory for the G-test. The validity of the asymptotic distribution requires standard regularity conditions, such as identifiable parameters and positive expected frequencies, as detailed in the assumptions section.Assumptions and Conditions for Use
The G-test requires independent observations, ensuring that the outcome of one trial does not affect others, as this underpins the validity of the multinomial model used in the test.[10] Data must also follow multinomial sampling, where a fixed total number of trials results in counts across mutually exclusive categories, with the null hypothesis specifying the expected probabilities for each category.[2] To ensure the chi-squared approximation to the test statistic's distribution is reliable, expected frequencies E_i should meet certain thresholds: generally, E_i \geq 5 for most cells, with no cell having E_i < 1 and no more than 20% of cells having E_i < 5, per guidelines adapted from Cochran's rules for categorized data tests.[11] These conditions help maintain the asymptotic properties of the test, though violations may necessitate adjustments or alternative approaches. For small samples, defined as total sample size n < 1000 or sparse data with many low expected frequencies, the G-test's large-sample approximation can be inaccurate, leading to unreliable p-values.[10] In such cases, exact tests like Fisher's exact test or simulation-based methods are recommended over the G-test. For moderately small samples, Williams' correction can be applied, which scales the test statistic by a factor involving the degrees of freedom and sample size to better approximate the chi-squared distribution.[13] The G-test is preferable to the Pearson chi-squared test in likelihood-based frameworks, as it directly computes the likelihood ratio, aligning naturally with maximum likelihood estimation and providing additivity for complex designs.[14] It also performs better than the chi-squared test when category probabilities are unequal (leading to disparate variances proportional to expected counts) or when the number of categories is large but the distribution is concentrated, such as in Poisson-like scenarios.[15]Relations to Other Concepts
Relation to Chi-Squared Test
The Pearson chi-squared test, developed by Karl Pearson in 1900, computes the test statistic asX^2 = \sum_i \frac{(O_i - E_i)^2}{E_i},
where O_i denotes the observed frequency in category i and E_i the expected frequency under the null hypothesis. This statistic measures deviations between observed and expected counts, scaled by the expected values. The G-test statistic, G = 2 \sum_i O_i \ln \left( \frac{O_i}{E_i} \right), approximates X^2 under conditions of large sample sizes and roughly equal expected frequencies. This equivalence arises from a second-order Taylor expansion of the natural logarithm around 1:
\ln \left( \frac{O_i}{E_i} \right) \approx \frac{O_i - E_i}{E_i} - \frac{1}{2} \left( \frac{O_i - E_i}{E_i} \right)^2.
Substituting into the G-test formula and simplifying, noting that \sum_i (O_i - E_i) = 0, yields G \approx X^2. Despite their asymptotic equivalence and shared reference to the chi-squared distribution under the null hypothesis, the tests differ in finite-sample performance. The G-test is less sensitive to extreme deviations due to its logarithmic form. In biomedical contexts with n \leq 40 and over 20% of cells having E_i \leq 5, Pearson's test has inadequate performance with inflated type I error rates, whereas the G-test (often with Williams' correction) is more robust.[16] These advantages hold particularly for unequal expectations where Pearson's quadratic form amplifies outliers.
Relation to Likelihood-Ratio Test
The likelihood-ratio test (LRT) is a general framework for hypothesis testing that compares the goodness-of-fit of two nested statistical models: a full (unrestricted) model and a reduced (restricted) model corresponding to the null hypothesis. The test statistic is given by \Lambda = 2 \ln \left( \frac{L_{\text{full}}}{L_{\text{reduced}}} \right), where L_{\text{full}} and L_{\text{reduced}} are the maximized likelihoods under the full and reduced models, respectively; under the null hypothesis and suitable regularity conditions, -2 \ln \Lambda asymptotically follows a chi-squared distribution with degrees of freedom equal to the difference in the number of free parameters between the models. The G-test, also known as the log-likelihood ratio test, is a specific application of this LRT framework to multinomial models for categorical data, such as in goodness-of-fit or independence testing for contingency tables. In these settings, the observed counts are modeled as multinomial random variables, and the G-test statistic G^2 takes the form G^2 = 2 \sum_i o_i \ln (o_i / e_i), where o_i are the observed frequencies and e_i the expected frequencies under the null; this directly corresponds to \Lambda evaluated for the multinomial likelihood, providing a measure of discrepancy between observed and expected categorical distributions. A key feature of the G-test within the LRT paradigm is the clear nesting of models: the reduced model imposes the null hypothesis restrictions (e.g., specified proportions for goodness-of-fit or independence constraints for contingency tables), while the full model is the saturated multinomial model, which estimates a separate parameter for each category and thus fits the observed data perfectly, yielding L_{\text{full}} = \prod_i (o_i^{o_i} / o_i!) up to constants. This structure ensures the test's asymptotic chi-squared distribution is well-defined, with the exact nesting facilitating straightforward computation. In categorical data analysis, this application of the LRT via the G-test offers advantages over other discrepancy measures, as the precise model nesting leads to interpretable degrees of freedom given by df = k - p, where k is the number of categories and p is the number of free parameters in the reduced (null) model; for instance, in a simple multinomial goodness-of-fit test with fixed proportions, df = k - 1, and for an r \times c independence test, df = (r-1)(c-1). This interpretability enhances the G-test's utility in assessing model adequacy for discrete data without requiring ad hoc adjustments.Relation to Kullback-Leibler Divergence
The G-test statistic provides an information-theoretic measure of the discrepancy between observed and expected categorical data through its direct connection to the Kullback-Leibler (KL) divergence. The KL divergence between two discrete probability distributions P and Q over categories i is defined as D_{\text{KL}}(P \parallel Q) = \sum_i p_i \ln \left( \frac{p_i}{q_i} \right), where the logarithm is base e, yielding a measure in units of nats.[17] For the G-test, let O_i denote the observed counts and E_i the expected counts under the null hypothesis, with total sample size n = \sum_i O_i. The empirical probabilities are \hat{p}_i = O_i / n and the hypothesized probabilities are p_{0i} = E_i / n. The G statistic is then given by G = 2n \, D_{\text{KL}}(\hat{p} \parallel p_0) = 2 \sum_i O_i \ln \left( \frac{O_i}{E_i} \right). This formulation interprets G as twice the total KL divergence between the empirical and hypothesized distributions, scaled by the sample size, highlighting how the test assesses the "information loss" when approximating the observed distribution by the expected one.[18] Under the null hypothesis, where the true distribution matches the hypothesized p_0, the statistic G follows an asymptotic chi-squared distribution with degrees of freedom equal to the number of categories minus the number of estimated parameters. This convergence arises because, for large n, the scaled KL divergence $2n \, D_{\text{KL}}(\hat{p} \parallel p_0) behaves like a quadratic form in the deviations between \hat{p} and p_0, aligning with the chi-squared approximation from likelihood ratio test theory.[18]Relation to Mutual Information
The mutual information I(X; Y) between two discrete random variables X and Y taking values in finite sets quantifies the amount of information one variable contains about the other, defined as the Kullback-Leibler divergence between the joint distribution and the product of the marginals: I(X; Y) = \sum_{i=1}^r \sum_{j=1}^c p_{ij} \ln \left( \frac{p_{ij}}{p_{i \cdot} p_{\cdot j}} \right), where p_{ij} = P(X=i, Y=j), p_{i \cdot} = \sum_j p_{ij}, p_{\cdot j} = \sum_i p_{ij}, and the table has r rows and c columns. For the G-test of independence applied to an observed r \times c contingency table with cell counts n_{ij} and total sample size n = \sum_i \sum_j n_{ij}, the test statistic equals twice the sample size times the empirical mutual information: G = 2n \hat{I}(X; Y) = 2 \sum_{i=1}^r \sum_{j=1}^c n_{ij} \ln \left( \frac{n_{ij} n}{n_{i \cdot} n_{\cdot j}} \right), where \hat{I}(X; Y) = \frac{1}{n} \sum_i \sum_j n_{ij} \ln \left( \frac{n_{ij}/n}{(n_{i \cdot}/n)(n_{\cdot j}/n)} \right) uses the empirical joint and marginal probabilities.[19] Under the null hypothesis of independence, I(X; Y) = 0, so \hat{I}(X; Y) = 0 in large samples, and the G-statistic asymptotically follows a chi-squared distribution with (r-1)(c-1) degrees of freedom, providing a basis for p-value computation and hypothesis testing. This connection frames the G-test within information theory, where it assesses statistical dependence by measuring the reduction in entropy of one variable given knowledge of the other, beyond the information from the marginal distributions alone.Applications
Goodness-of-Fit Testing
The G-test of goodness-of-fit evaluates whether the observed frequencies of a categorical variable align with those expected under a specified theoretical distribution, such as a uniform distribution or binned probabilities from a Poisson process. In this setup, hypothesized probabilities p_i are defined for each of the k categories, and expected frequencies are computed as E_i = n p_i, where n is the total sample size. For instance, under a uniform hypothesis, each p_i = 1/k; for binned Poisson data, p_i derives from the probability mass function evaluated at category midpoints using the estimated rate parameter.[1][2] The procedure involves calculating the test statistic G = 2 \sum_{i=1}^k O_i \ln \left( \frac{O_i}{E_i} \right), where O_i denotes the observed frequency in category i. This statistic measures the deviation between observed and expected counts via the log-likelihood ratio. Under the null hypothesis of a good fit, G follows an approximate chi-squared distribution with k-1 degrees of freedom for large samples. To conduct the test, compute G and either compare it to the critical value from the \chi^2_{k-1} table at the desired significance level (e.g., 0.05) or derive the p-value; rejection occurs if G exceeds the critical value or the p-value is sufficiently low. A continuity correction, such as Williams' adjustment G_w = G / \left(1 + \frac{k-1}{6n}\right), may be applied for small expected frequencies to improve accuracy.[1][3][2] In genetics, the G-test is commonly applied to assess Hardy-Weinberg equilibrium by comparing observed genotype counts (e.g., AA, Aa, aa) against expected frequencies under random mating assumptions, aiding detection of population structure or selection pressures. For example, with allele frequency p for A and q = 1 - p for a, expected proportions are p^2, $2pq, and q^2, enabling tests on biallelic markers in large genomic datasets.[20][21]Test of Independence
The G-test of independence assesses whether two categorical variables exhibit an association within an r × c contingency table, where observed frequencies O_{ij} are cross-classified by row and column categories. Under the null hypothesis of independence, expected frequencies are derived from the marginal totals as E_{ij} = \frac{O_{i \cdot} O_{\cdot j}}{n}, with O_{i \cdot} denoting the i-th row total, O_{\cdot j} the j-th column total, and n the overall sample size. This setup tests the fit of the independence model against the observed data, providing a likelihood-based measure of deviation. The test statistic is computed as G = 2 \sum_{i=1}^r \sum_{j=1}^c O_{ij} \ln \left( \frac{O_{ij}}{E_{ij}} \right), where the summation occurs over all cells, and terms with O_{ij} = 0 are conventionally omitted or handled via continuity corrections for small samples. Asymptotically, G follows a \chi^2 distribution with (r-1)(c-1) degrees of freedom, enabling p-value calculation to evaluate evidence against independence; the test performs well even with moderate sample sizes, often outperforming the Pearson chi-squared test for sparse tables due to its multiplicative structure. Extensions to stratified contingency tables involve pooling G statistics across strata to test for overall or partial association, adjusting for confounding factors by fitting hierarchical log-linear models that compare nested hypotheses via likelihood ratios. For ordinal data, the G-test accommodates ordered categories through score-based parameters in generalized linear models, allowing detection of monotonic trends while maintaining the likelihood ratio framework. In epidemiology, the G-test is applied in studies of patient outcomes with missing values, where it integrates with multiple imputation to test associations in contingency tables; for example, as of 2025, it has been used to assess associations between race/ethnicity and flu vaccination.[22]Practical Aspects
Illustrative Examples
To illustrate the application of the G-test for goodness-of-fit, consider testing whether a six-sided die is fair based on 30 rolls. The observed frequencies for each face are as follows:| Face | Observed (O) |
|---|---|
| 1 | 3 |
| 2 | 7 |
| 3 | 5 |
| 4 | 10 |
| 5 | 2 |
| 6 | 3 |
| Helmet Use | Head Injury | Other Injury | Total |
|---|---|---|---|
| Yes | 372 | 4,715 | 5,087 |
| No | 267 | 1,391 | 1,658 |
| Total | 639 | 6,106 | 6,745 |
Limitations and Alternatives
The G-test encounters significant limitations when expected frequencies E_i are small, typically less than 5, as the asymptotic chi-squared approximation becomes unreliable, leading to inaccurate p-values and reduced validity of the test.[3] In cases involving zero observed or expected counts, the logarithmic terms in the formula—specifically O_i \log(O_i / E_i)—require the mathematical convention that \lim_{x \to 0^+} x \log x = 0, allowing computation by omitting such terms or treating them as zero contributions; however, this does not fully mitigate the approximation's poor performance in sparse tables.[24] Additionally, the G-test is sensitive to model misspecification, such as incorrect assumptions about the underlying multinomial distribution, which can inflate type I error rates or diminish power if the fitted model deviates substantially from the true data-generating process.[25] For small samples or sparse data, alternatives like Fisher's exact test provide non-asymptotic exact p-values by enumerating all possible tables under the null, avoiding reliance on approximations.[26] In 2×2 contingency tables, Barnard's test offers higher power than Fisher's exact test while maintaining control over the type I error rate, particularly when marginal totals are not fixed. Bootstrap methods serve as a robust non-parametric alternative for generating empirical distributions of the test statistic, suitable when asymptotic assumptions fail. The G-test should be avoided in highly sparse datasets, where many cells have low expected counts, or when extending to non-categorical data like continuous variables discretized ad hoc, as these violate the multinomial assumptions and exacerbate approximation errors.Software Implementations
In R and Python
The G-test is readily implemented in R using specialized packages for categorical data analysis. Thevcd package provides the assocstats function, which computes the likelihood ratio chi-squared statistic (G²) alongside the Pearson chi-squared for contingency tables, offering a direct way to perform the test of independence. For goodness-of-fit tests, the goodfit function in the same package calculates G² by fitting discrete distributions to count data. Alternatively, the DescTools package offers the GTest function for both contingency tables and goodness-of-fit scenarios, returning the G statistic, degrees of freedom, and p-value.[27]
An example for a 2x2 contingency table in R using vcd::assocstats is as follows:
This yields the G² value and associated p-value, confirming or rejecting independence. For manual computation without packages, one can use base R functions likerlibrary(vcd) cont_table <- matrix(c(10, 20, 30, 40), nrow = 2, byrow = TRUE) assoc_result <- assocstats(cont_table) print(assoc_result$chisq_tests["Likelihood ratio", "statistic"]) # G² statistic print(assoc_result$chisq_tests["Likelihood ratio", "p.value"]) # p-valuelibrary(vcd) cont_table <- matrix(c(10, 20, 30, 40), nrow = 2, byrow = TRUE) assoc_result <- assocstats(cont_table) print(assoc_result$chisq_tests["Likelihood ratio", "statistic"]) # G² statistic print(assoc_result$chisq_tests["Likelihood ratio", "p.value"]) # p-value
log and summation over observed and expected frequencies, though packages are recommended for accuracy and handling. The base stats::chisq.test with correct = FALSE computes the Pearson chi-squared but not the G-test directly.
In Python, the G-test for contingency tables is supported in SciPy via the chi2_contingency function with the lambda_='log-likelihood' parameter, which computes the likelihood ratio statistic and p-value. For goodness-of-fit, scipy.stats.power_divergence with the same lambda option applies the test to observed counts against expected probabilities. Manual implementation is straightforward using NumPy for the core formula, enabling customization.[28]
Here is an example for the same 2x2 contingency table in Python:
For a manual NumPy-based computation:pythonimport numpy as np from scipy.stats import chi2_contingency obs = np.array([[10, 20], [30, 40]]) g_stat, p_value, dof, expected = chi2_contingency(obs, lambda_='log-likelihood') print(f"G² statistic: {g_stat}") print(f"p-value: {p_value}")import numpy as np from scipy.stats import chi2_contingency obs = np.array([[10, 20], [30, 40]]) g_stat, p_value, dof, expected = chi2_contingency(obs, lambda_='log-likelihood') print(f"G² statistic: {g_stat}") print(f"p-value: {p_value}")
When dealing with zero counts, which can lead to undefined logarithms, a standard correction adds 0.5 to all cells in observed and expected tables (Haldane-Anscombe adjustment) before computation; this is implemented in some functions or applied manually to improve stability for small samples. As of 2025, libraries like StatsModels offer optimized tools for contingency table analysis, including tests for independence, which can complement SciPy's G-test implementations for larger datasets.[13]pythondef g_test_manual(observed, expected): mask = observed > 0 return 2 * np.sum(observed[mask] * np.log(observed[mask] / expected[mask])) expected = np.array([[12, 18], [20, 30]]) # Corrected expected under [independence](/page/Independence) g_manual = g_test_manual(obs, expected) print(f"Manual G²: {g_manual}")def g_test_manual(observed, expected): mask = observed > 0 return 2 * np.sum(observed[mask] * np.log(observed[mask] / expected[mask])) expected = np.array([[12, 18], [20, 30]]) # Corrected expected under [independence](/page/Independence) g_manual = g_test_manual(obs, expected) print(f"Manual G²: {g_manual}")
In SAS, Stata, and Other Tools
In SAS, the G-test is implemented through the FREQ procedure with the CHISQ option, which computes the likelihood ratio chi-square statistic as G^2 = 2 \sum O \ln(O/E), where O are observed frequencies and E are expected frequencies under the null hypothesis.[29] This statistic is labeled "Likelihood Ratio Chi-Square" in the output and follows a chi-square distribution asymptotically, providing p-values for goodness-of-fit or independence tests.[30] For example, the syntaxPROC FREQ DATA=dataset; TABLES var1*var2 / CHISQ; RUN; generates the test alongside Pearson's chi-square for comparison.[31]
In Stata, the G-test is available via the tabulate command for two-way tables, using the lrchi2 option to output the likelihood-ratio chi-square statistic, equivalent to G^2.[32] This tests for independence between categorical variables, with results stored in r(chi2_lr) and r(p_lr). For instance, tabulate var1 var2, lrchi2 produces the statistic and its p-value, suitable for contingency tables without weights.[32] The command supports survey data adjustments via svy: tabulate for design-based corrections.[33]
In other statistical software, such as SPSS, the G-test appears as the "Likelihood Ratio" under the Chi-square statistics in the Crosstabs procedure, computed similarly as $2 \sum O \ln(O/E) for tests of independence or goodness-of-fit. Users select it via Analyze > Descriptive Statistics > Crosstabs, then Statistics > Chi-square, yielding the statistic, degrees of freedom, and p-value. In MATLAB, while no built-in function exists specifically for categorical G-tests, the general lratiotest function supports likelihood ratio comparisons for nested models, adaptable for contingency table analysis via custom log-likelihood computation.[34]