Binomial test
The binomial test, also known as the exact binomial test, is a nonparametric statistical hypothesis test used to determine whether the observed proportion of successes in a fixed number of independent binary trials significantly differs from a specified hypothesized probability of success.[1][2] It serves as an exact method for testing the null hypothesis that the true population proportion equals a predefined value, particularly when sample sizes are small and normal approximations may not apply.[3][1] The test relies on the binomial probability distribution, which models the number of successes in n independent trials, each with two possible outcomes (success or failure) and a constant probability p of success under the null hypothesis.[2][3] Key assumptions include binary outcomes for each observation, independence between trials, a fixed sample size, and identical success probability across all trials.[2] It is especially suitable for scenarios involving dichotomous data, such as evaluating fairness in coin tosses or assessing response rates in binary survey questions, where the expected proportion is theoretically known (e.g., 0.5 for a fair coin).[1][3] To conduct the test, one calculates the p-value as the probability of observing the sample number of successes (or a more extreme value) under the null hypothesis, using the binomial probability mass function:P(X = k) = \binom{n}{k} p^k (1-p)^{n-k},
where n is the number of trials, k is the observed successes, and \binom{n}{k} is the binomial coefficient.[2][3] The test can be one-sided (testing if the proportion is greater or less than the hypothesized value) or two-sided, with the latter summing probabilities from both tails of the distribution.[1] If the p-value falls below a chosen significance level (e.g., 0.05), the null hypothesis is rejected in favor of the alternative.[3] Notable applications include quality control in manufacturing (e.g., defect rates), clinical trials for binary endpoints like treatment success, and nonparametric alternatives such as the sign test for paired data, where it evaluates if the direction of differences deviates from random (null probability of 0.5).[3] The test's exact nature avoids reliance on large-sample approximations like the normal or chi-squared tests, making it robust for small n (e.g., n < 30), though computational intensity increases with larger samples.[1][2]
Fundamentals
Definition
The binomial test is an exact statistical procedure used to evaluate hypotheses concerning the success probability p in a sequence of n independent Bernoulli trials, where each trial results in either a success or failure.[2] This test assesses whether the observed number of successes deviates significantly from what would be expected under a specified probability p_0, making it particularly suitable for binary outcome data with a fixed number of trials.[4] Under the null hypothesis H_0: p = p_0, the number of successes X follows a binomial distribution, denoted as X \sim \text{Binomial}(n, p_0).[5] The foundations of the binomial test trace back to early developments in probability theory during the 18th century, where the binomial distribution emerged as a model for repeated independent events, notably through the work of Abraham de Moivre on approximating sums of Bernoulli trials.[6] The exact binomial test, as a method of inference, builds on these probabilistic foundations and is commonly classified as a nonparametric test, although it specifies the binomial distribution family and tests a specific parameter p.[7] This distinction highlights its role in exact testing for discrete data, contrasting with approximation-based methods like the normal test for proportions.[4]Assumptions and Prerequisites
The binomial test is valid under specific assumptions that ensure the underlying sampling model aligns with the binomial distribution. The trials must be independent, such that the outcome of any one trial does not affect the others, preventing issues like correlation or contagion that could bias results. Each trial constitutes a Bernoulli trial, featuring exactly two mutually exclusive outcomes—typically labeled as success or failure—with no intermediate possibilities. The number of trials, denoted as n, must be fixed and predetermined before data collection begins. Additionally, the probability of success, p, remains constant across all trials, avoiding scenarios where this probability varies due to external factors or learning effects.[8][9] Prerequisites for applying the binomial test include data in the form of a simple count of successes k out of the fixed n trials, making it suitable for discrete, categorical outcomes without requiring continuous measurements or large sample sizes for initial applicability. Unlike tests such as the t-test, no assumption of normality in the data distribution is needed, as the test leverages the exact discrete nature of the binomial probabilities. This focus on count data distinguishes it from parametric methods that demand more stringent distributional forms.[9][10] The test assumes absence of overdispersion, where the observed variability in successes exceeds the expected binomial variance of np(1-p), or clustering that undermines trial independence through grouped dependencies. Violations of these conditions, such as in clustered sampling designs or heterogeneous populations, can lead to underestimated standard errors and inflated Type I error rates; in such cases, alternatives like the chi-squared goodness-of-fit test for proportions or overdispersion-adjusted models (e.g., beta-binomial) may be more appropriate. The exact binomial test remains feasible for any sample size n, providing precise inference without approximation, though computational demands escalate with large n due to the intensive summation of binomial coefficients and probabilities required for exact p-value computation.[11][12][13]Hypothesis Testing Framework
Null and Alternative Hypotheses
The binomial test evaluates hypotheses concerning the success probability p in a sequence of independent Bernoulli trials, where each trial results in either success or failure. The null hypothesis, denoted H_0, posits that this probability equals a predetermined value p_0, formally stated as H_0: p = p_0. This specification assumes that under the null, the expected proportion of successes aligns precisely with p_0, serving as the baseline for assessing deviations in observed data.[14][15] The alternative hypothesis H_a contrasts with the null and can take one of three forms depending on the research question. In one-sided tests, H_a: p > p_0 investigates whether the success rate exceeds the hypothesized value, while H_a: p < p_0 examines if it falls below. For scenarios where deviation in either direction is of interest, a two-sided alternative H_a: p \neq p_0 is employed. These formulations allow the test to detect directional or nondirectional differences in the underlying probability.[16][17][15] The choice of p_0 is guided by theoretical or contextual expectations, such as p_0 = 0.5 when testing the fairness of a symmetric process like a coin flip, where equal likelihood of success or failure is anticipated. The binomial test's interpretation centers on determining whether the observed success rate significantly deviates from this expected rate under H_0, thereby providing inferential evidence about the true probability p. This framework relies on the binomial model's assumptions of fixed trial count and independence, ensuring the hypotheses align with the probabilistic structure of the data.[18][17][15]Test Statistic
The test statistic for the binomial test is the observed number of successes X, where X = k represents the count of successes in n independent Bernoulli trials, each with success probability p.[19][20] Under the null hypothesis H_0: p = p_0, the test statistic X follows a binomial distribution with parameters n and p_0, denoted X \sim \text{Binomial}(n, p_0).[20][1] The probability mass function of this distribution is P(X = k) = \binom{n}{k} p_0^k (1 - p_0)^{n-k}, where \binom{n}{k} = \frac{n!}{k!(n-k)!} is the binomial coefficient.[19][1] This statistic forms the foundation for exact inference methods in the binomial test, as its discrete distribution under H_0 allows direct computation of probabilities without requiring transformations, in contrast to statistics derived from continuous distributions like the normal.[19][21] In one-sided tests, the directionality aligns with the alternative hypothesis H_a; for instance, if H_a: p > p_0, the relevant tail probabilities are those for X \geq k.[21][1]Exact Inference Methods
P-value Calculation
The p-value in the exact binomial test quantifies the probability of observing a test statistic at least as extreme as the one obtained, assuming the null hypothesis H_0: p = p_0 is true, where the test statistic X follows a binomial distribution with parameters n (number of trials) and p_0 (hypothesized success probability).[22] For a one-sided alternative hypothesis H_a: p > p_0, the p-value is the sum of the binomial probabilities from the observed number of successes k to n: P(X \geq k) = \sum_{i=k}^{n} \binom{n}{i} p_0^i (1 - p_0)^{n-i} This represents the right-tail probability under the null distribution.[1][23] Similarly, for H_a: p < p_0, the p-value is the left-tail probability: P(X \leq k) = \sum_{i=0}^{k} \binom{n}{i} p_0^i (1 - p_0)^{n-i} These one-sided p-values can be computed directly using the probability mass function of the binomial distribution.[24] For the two-sided alternative H_a: p \neq p_0, the exact p-value is typically calculated by summing the probabilities of all outcomes whose probability under the null is less than or equal to that of the observed outcome k. This method accounts for the asymmetry of the binomial distribution when p_0 \neq 0.5 and avoids simply doubling the one-sided p-value, which may not accurately reflect the tails.[22][25] For small values of n, these p-values can be calculated by hand using the binomial probability formula. For larger n, computation relies on the cumulative distribution function (CDF) of the binomial distribution, available in statistical software, to efficiently sum the relevant tail probabilities.[23]Critical Values and Decision Rules
In the exact binomial test, critical values are determined for a specified significance level \alpha (commonly 0.05) to define the rejection region under the null hypothesis H_0: p = p_0, where p_0 is the hypothesized success probability and the test statistic X follows a binomial distribution with parameters n (number of trials) and p_0. For an upper-tailed test (H_1: p > p_0), the critical value c is the smallest integer such that P(X \geq c \mid p = p_0) \leq \alpha, ensuring the probability of Type I error (falsely rejecting H_0) does not exceed \alpha.[26][27] Similarly, for a lower-tailed test (H_1: p < p_0), the critical value c is the largest integer such that P(X \leq c \mid p = p_0) \leq \alpha. These values are typically obtained from binomial cumulative distribution tables for small n (e.g., n \leq 20) or computed exactly using statistical software for larger n, as the discrete nature of the binomial distribution requires precise tail probabilities.[26][27] For two-tailed tests (H_1: p \neq p_0), critical values consist of a lower bound c_L and an upper bound c_U, selected such that P(X \leq c_L \mid p = p_0) + P(X \geq c_U \mid p = p_0) \leq \alpha, often allocating \alpha/2 to each tail approximately due to the discreteness complicating exact symmetry. The decision rule is to reject H_0 if the observed number of successes k falls within the rejection region: k \geq c for upper-tailed, k \leq c for lower-tailed, or k \leq c_L or k \geq c_U for two-tailed. This fixed-\alpha approach contrasts with p-value methods by pre-specifying the rejection threshold before observing data, providing a clear decision boundary.[26][27] The exact binomial test controls the Type I error rate precisely at or below \alpha, as the critical values are derived directly from the null distribution without approximation, avoiding the conservative or liberal biases that can occur in asymptotic methods. For instance, in an upper-tailed test with n=10, p_0=0.05, and \alpha=0.05, the critical value c=3 yields P(X \geq 3 \mid p=0.05) = 0.0115 \leq 0.05.[28] Regarding power—the probability of correctly rejecting H_0 when H_1 is true—this increases with sample size n and the magnitude of deviation |p - p_0|, as larger n or greater separation amplifies the test statistic's shift from the null, though exact power computation requires evaluating the alternative distribution.[27]Approximations for Large Samples
Normal Approximation
For large sample sizes n, the binomial distribution under the null hypothesis H_0: p = p_0 can be approximated by a normal distribution, facilitating efficient computation in the binomial test. Specifically, the number of successes X \sim \text{Binomial}(n, p_0) is approximately distributed as \text{Normal}(\mu = n p_0, \sigma^2 = n p_0 (1 - p_0)), where \mu is the expected value and \sigma^2 is the variance. This approximation leverages the central limit theorem, which ensures that the distribution of the sum of independent Bernoulli trials converges to normality as n increases.[29][30] The standardized test statistic for the binomial test is derived from this normal approximation as Z = \frac{k - n p_0}{\sqrt{n p_0 (1 - p_0)}}, where k is the observed number of successes. Under H_0, Z follows approximately a standard normal distribution N(0, 1), allowing for straightforward comparison to critical values or p-value calculation. This z-statistic transforms the observed deviation from the expected successes into a scale-free measure, enabling the use of standard normal tables or software for inference.[31] P-values are computed using the standard normal distribution based on the observed z. For a two-sided test, the p-value is $2 \times P(Z > |z_{\text{observed}}|), where Z \sim N(0, 1); for a one-sided upper-tail test, it is P(Z > z_{\text{observed}}); and for a one-sided lower-tail test, P(Z < z_{\text{observed}}). These calculations provide a probabilistic assessment of the evidence against H_0, with smaller p-values indicating stronger evidence of deviation from p_0.[31] The validity of this normal approximation requires sufficiently large n such that n p_0 \geq 5 and n (1 - p_0) \geq 5, ensuring the binomial probabilities are not overly skewed and the normal curve adequately captures the mass of the distribution. These conditions help minimize approximation error, particularly when p_0 is not extreme (close to 0 or 1). When met, the method offers a computationally simpler alternative to exact binomial calculations, especially for large n where exact methods become infeasible due to combinatorial explosion.[30]Continuity Correction
The continuity correction addresses the discreteness of the binomial distribution when approximating it with the continuous normal distribution in the context of the binomial test. This adjustment modifies the boundaries of the discrete outcomes by ±0.5 to better match the continuous approximation, thereby reducing the error in estimated probabilities or p-values.[29] For a one-sided test assessing the upper tail probability P(X \geq k) under the null hypothesis H_0: p = p_0, the corrected test statistic is given by Z_{cc} = \frac{k - 0.5 - n p_0}{\sqrt{n p_0 (1 - p_0)}} where the p-value is then P(Z \geq Z_{cc}) with Z standard normal. Similarly, for the lower tail P(X \leq k), Z_{cc} = \frac{k + 0.5 - n p_0}{\sqrt{n p_0 (1 - p_0)}} and the p-value is P(Z \leq Z_{cc}). For two-sided tests, the correction applies to the absolute deviation, using |k - n p_0| - 0.5 in the numerator.[29][32] This correction enhances the accuracy of the normal approximation by compensating for the "gaps" between discrete binomial probabilities, leading to smaller errors in p-value calculations compared to the uncorrected version, especially in moderate sample sizes where the approximation is employed but exact methods are computationally intensive. Studies show that standard continuity corrections like Yates' reduce approximation errors significantly for tail probabilities, though advanced variants may offer further improvements in extreme cases.[29][33] The continuity correction is recommended whenever the normal approximation to the binomial is used, particularly when n p_0 \geq 5 and n (1 - p_0) \geq 5, as these conditions ensure the approximation is reasonable and the correction provides meaningful refinement for moderate n. It is especially valuable when the variance n p_0 (1 - p_0) falls between 5 and 25, where uncorrected approximations can deviate noticeably from exact values without being so small as to warrant exact inference exclusively.[29][33]Applications and Examples
Common Applications
The binomial test is frequently employed to assess fairness in random processes involving binary outcomes, such as coin flips (p = 0.5 for heads) or rolling an even number on a fair six-sided die (p = 0.5), where the null hypothesis posits the hypothesized probability of success.[34] In these scenarios, the test evaluates whether observed deviations from the expected proportion are statistically significant, ensuring the integrity of gambling devices or simulation tools. In quality control within manufacturing, the binomial test monitors defect rates by testing whether the proportion of defective items in a sample matches an acceptable threshold, such as p_0 = 0.01, to determine if production processes meet standards.[35] This application helps identify systematic issues in assembly lines or material batches without assuming large sample sizes.[36] Medical and clinical trials commonly use the binomial test to compare success rates of binary treatments, such as recovery or response to a drug versus a placebo, where the null hypothesis assumes no difference from a baseline proportion.[37] For instance, it assesses whether a new intervention achieves a higher recovery rate than the expected 20-30% from standard care, aiding decisions on efficacy in phase II studies.[38] In the social sciences, the binomial test analyzes proportions from yes/no survey responses, testing if the observed agreement rate with a statement deviates from a hypothesized value, such as 50% for neutral opinions.[39] This is particularly useful in polling or behavioral studies to validate assumptions about population attitudes.[40] The binomial test is preferred over approximations like the chi-squared test for single proportions when sample sizes are small (e.g., n < 30) or exact inference is required to avoid inflated Type I errors.[41] It assumes independent Bernoulli trials with a constant success probability, making it suitable for these constrained contexts.Worked Example
Consider a scenario where a coin is flipped 20 times to test whether it is fair, resulting in 14 heads observed. The null hypothesis is H_0: p = 0.5 (the probability of heads is 0.5), and the alternative hypothesis is H_a: p \neq 0.5 (the coin is biased). A two-sided test is conducted at significance level \alpha = 0.05.[42] To compute the exact p-value, calculate the probability of observing 14 or more heads (or equivalently, 6 or fewer heads due to symmetry) under H_0, then double it for the two-sided test. The tail probability is P(X \geq 14 \mid n=20, p=0.5) = \sum_{k=14}^{20} \binom{20}{k} (0.5)^{20} = 0.0577. Thus, the two-sided p-value is $2 \times 0.0577 = 0.1154. Since 0.1154 > 0.05, the null hypothesis is not rejected.[43][42] The following table shows the probability mass function under H_0, with probabilities for the extreme tails (k \leq 6 or k \geq 14) highlighted in bold for illustration:| k | P(X = k) |
|---|---|
| 0 | 0.0000 |
| 1 | 0.0000 |
| 2 | 0.0002 |
| 3 | 0.0011 |
| 4 | 0.0046 |
| 5 | 0.0148 |
| 6 | 0.0370 |
| 7 | 0.0739 |
| 8 | 0.1201 |
| 9 | 0.1602 |
| 10 | 0.1762 |
| 11 | 0.1602 |
| 12 | 0.1201 |
| 13 | 0.0739 |
| 14 | 0.0370 |
| 15 | 0.0148 |
| 16 | 0.0046 |
| 17 | 0.0011 |
| 18 | 0.0002 |
| 19 | 0.0000 |
| 20 | 0.0000 |
Software Implementation
Availability in Packages
The binomial test is widely available in major statistical software packages, facilitating both exact and approximate implementations for hypothesis testing on binomial proportions. In the R programming language, thebinom.test function, included in the base stats package, performs the test by specifying the number of successes k, total trials n, and null hypothesis probability p (defaulting to 0.5), along with options for one-sided or two-sided alternatives. This function computes exact p-values using the cumulative distribution function and supports confidence interval estimation via the Clopper-Pearson method. The two-sided p-value is calculated using the equal-tail method: the sum of probabilities in both tails, P(X ≤ 6) + P(X ≥ 14) for the example below.[44]
In Python, the binomtest function within the scipy.stats module of the SciPy library provides similar functionality, taking arguments for observed successes k, total trials n, and expected probability p (default 0.5), with support for alternative hypotheses. It offers exact p-value calculation based on the binomial cumulative distribution using a central method (summing probabilities of outcomes with density ≤ observed density). It can return confidence intervals via the proportion_ci method, making it suitable for integration into data analysis workflows. Note that different software may yield slightly different two-sided p-values due to varying computational methods.[45]
SAS implements the binomial test through the PROC FREQ procedure, particularly when analyzing binomial proportions, where the EXACT statement enables exact computation via the F-distribution or network algorithms for p-values and confidence limits. This approach is commonly used in categorical data analysis, supporting both one- and two-sided tests.
In SPSS, the binomial test is accessible via the Legacy Dialogs under Nonparametric Tests > Binomial, allowing users to specify the test variable, expected probability, and test direction through a graphical interface. The procedure outputs exact p-values and handles small sample sizes appropriately.
Most of these implementations support both exact methods for small samples and normal approximations for larger ones, typically providing output that includes the test statistic, p-value, and confidence intervals for the proportion.
Code Examples
Practical implementations of the binomial test are available in statistical programming languages such as R and Python, allowing users to compute exact p-values and confidence intervals for binomial proportions. These examples demonstrate the test using the coin flip scenario with 14 heads observed in 20 flips under the null hypothesis of a fair coin (p = 0.5). For the two-sided test, R yields an exact p-value of approximately 0.115 (equal-tail method), while Python yields approximately 0.041 (central method). In R, thebinom.test() function from the base stats package performs the exact binomial test. The following script conducts the two-sided test and interprets the output:
This output indicates a p-value of approximately 0.115, which is above the conventional 0.05 threshold, suggesting insufficient evidence to reject the null hypothesis of a fair coin at the 5% significance level. The 95% confidence interval for the true probability spans from about 0.447 to 0.886, encompassing 0.5.[44] In Python, ther# Exact binomial test in R result <- binom.test(14, 20, p = 0.5, alternative = "two.sided") print(result) # Output: # Exact binomial test # # data: 14 and 20 # number of successes = 14, number of trials = 20, p-value = 0.1155 # alternative hypothesis: true probability of success is not equal to 0.5 # 95 percent confidence interval: # 0.4469189 0.8855617 # sample estimates: # probability of success # 0.7# Exact binomial test in R result <- binom.test(14, 20, p = 0.5, alternative = "two.sided") print(result) # Output: # Exact binomial test # # data: 14 and 20 # number of successes = 14, number of trials = 20, p-value = 0.1155 # alternative hypothesis: true probability of success is not equal to 0.5 # 95 percent confidence interval: # 0.4469189 0.8855617 # sample estimates: # probability of success # 0.7
binomtest() function from the scipy.stats module provides similar functionality, defaulting to an exact test. The script below performs the analysis:
The p-value of approximately 0.041 (using the central method) is below 0.05, but note the methodological difference from R's equal-tail approach (0.115); users should be aware of these variations. Thepythonfrom scipy.stats import binomtest # Exact binomial test in Python result = binomtest(14, n=20, p=0.5, alternative='two-sided') print(result) # BinomTestResult(k=14, n=20, alternative='two-sided', statistic=0.7, pvalue=0.04101522381031063)from scipy.stats import binomtest # Exact binomial test in Python result = binomtest(14, n=20, p=0.5, alternative='two-sided') print(result) # BinomTestResult(k=14, n=20, alternative='two-sided', statistic=0.7, pvalue=0.04101522381031063)
statistic reports the observed proportion of successes (0.7).[45]
To compare the exact binomial test with the normal approximation, code snippets can implement both approaches. In R, the exact test uses binom.test(), while the approximation employs pnorm() with continuity correction:
In Python, use manual calculation for the normal approximation, asr# Exact test exact_p <- binom.test(14, 20, p = 0.5)$p.value # 0.1155 # Normal approximation with continuity correction n <- 20; k <- 14; p0 <- 0.5 mu <- n * p0 sigma <- sqrt(n * p0 * (1 - p0)) z <- (k + 0.5 - mu) / sigma # Continuity correction: +0.5 for upper tail approx_p <- 2 * pnorm(-abs(z)) # Two-sided print(c(exact = exact_p, approx = approx_p)) # exact: 0.1155, approx: 0.04547# Exact test exact_p <- binom.test(14, 20, p = 0.5)$p.value # 0.1155 # Normal approximation with continuity correction n <- 20; k <- 14; p0 <- 0.5 mu <- n * p0 sigma <- sqrt(n * p0 * (1 - p0)) z <- (k + 0.5 - mu) / sigma # Continuity correction: +0.5 for upper tail approx_p <- 2 * pnorm(-abs(z)) # Two-sided print(c(exact = exact_p, approx = approx_p)) # exact: 0.1155, approx: 0.04547
binomtest does not support it directly:
The normal approximation p-value (0.0455) deviates from the exact results (0.115 in R or 0.041 in Python), illustrating its limitations for small n; the continuity correction improves accuracy but may still differ from exact methods.[44][45] For one-sided tests, specify the direction in thepythonfrom scipy.stats import binomtest, norm # Exact test (default) exact_result = binomtest(14, n=20, p=0.5, alternative='two-sided') exact_p = exact_result.pvalue # 0.0410 (central method) n, k, p0 = 20, 14, 0.5 mu = n * p0 sigma = (n * p0 * (1 - p0))**0.5 z = (k + 0.5 - mu) / sigma approx_p = 2 * norm.cdf(-abs(z)) # 0.04547 print(f"Exact p-value: {exact_p:.5f}") print(f"Approx p-value: {approx_p:.5f}")from scipy.stats import binomtest, norm # Exact test (default) exact_result = binomtest(14, n=20, p=0.5, alternative='two-sided') exact_p = exact_result.pvalue # 0.0410 (central method) n, k, p0 = 20, 14, 0.5 mu = n * p0 sigma = (n * p0 * (1 - p0))**0.5 z = (k + 0.5 - mu) / sigma approx_p = 2 * norm.cdf(-abs(z)) # 0.04547 print(f"Exact p-value: {exact_p:.5f}") print(f"Approx p-value: {approx_p:.5f}")
alternative parameter: use "greater" for testing if p > 0.5 or "less" for p < 0.5. Additionally, for large n (e.g., n > 100), exact tests may be computationally intensive, as they sum over binomial tail probabilities; in such cases, approximations are recommended.[44][45]