Fact-checked by Grok 2 weeks ago

Hypergeometric distribution


The is a that models the probability of k successes in n draws without replacement from a finite of size N that contains exactly K items of the success type. This distribution arises in sampling scenarios where the is finite and draws affect subsequent probabilities, distinguishing it from the , which assumes independence via replacement or an infinite .
The probability mass function is

where k ranges from max(0, n + K - N) to min(n, K), and the binomial coefficients \binom{a}{b} count combinations of b items from a. The expected value is n K / N, reflecting the proportion of successes in the scaled by sample , while the variance is n (K/N) (1 - K/N) (N - n)/(N - 1), which accounts for the finite population correction that reduces variability compared to the case. For large N relative to n, the hypergeometric approximates the with success probability K/N. Key applications include inspections, where a batch of N items with K defectives is sampled without , and exact tests for in contingency tables, such as in statistics.

Definition

Probability Mass Function

The (PMF) of the hypergeometric distribution specifies the probability \Pr(X = k) that a X, representing the number of observed successes in n draws without from a of size N with K total successes, equals a specific k. This PMF is expressed as p_X(k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}, where \binom{\cdot}{\cdot} denotes the binomial coefficient, defined for k ranging over the integers satisfying \max(0, n + K - N) \leq k \leq \min(n, K), and p_X(k) = 0 otherwise. This formula derives from combinatorial counting principles: the denominator \binom{N}{n} counts all possible ways to select n items from the N available, while the numerator \binom{K}{k} \binom{N-K}{n-k} enumerates the favorable outcomes where exactly k successes are selected from the K successes and n-k non-successes from the N-K non-successes. The ratio yields the exact probability under the uniform assumption over all subsets of size n. The PMF is zero outside the specified support because it is impossible to observe more successes than available in the population (k > K), more than drawn (k > n), or negative successes; the lower bound ensures feasibility given the non-successes. All probabilities sum to 1 over the support, confirming it as a valid PMF for the discrete uniform sampling model.

Parameters and Support

The hypergeometric distribution is parameterized by three non-negative integers: the total N, the number of states (or "marked" items) in the K, and the number of draws (sample ) n. These parameters must satisfy the constraints $0 \leq K \leq N and $0 \leq n \leq N, ensuring the model reflects a finite sampled without where the number of es cannot exceed the totals. The of the X (the number of successes in the sample) consists of all integers k in the range from \max(0, n + K - N) to \min(n, K), inclusive; probabilities are zero outside this interval due to the combinatorial impossibility of exceeding available successes or draws while accounting for the finite non-successes in the . This bounded distinguishes the hypergeometric from distributions like the , as it enforces dependence induced by without-replacement sampling.

Mathematical Properties

Moments and Expectations

The expected value of the hypergeometric random variable X, denoting the number of successes in a sample of size n drawn without replacement from a population of size N containing K successes, is \mathbb{E}[X] = n \frac{K}{N}. This follows from expressing X as the sum of n indicator variables I_j for the j-th draw being a success, where \mathbb{E}[I_j] = \frac{K}{N} for each j by symmetry, and applying linearity of expectation \mathbb{E}[X] = \sum_{j=1}^n \mathbb{E}[I_j] = n \frac{K}{N}, independent of the without-replacement dependence. The variance is \mathrm{Var}(X) = n \frac{K}{N} \left(1 - \frac{K}{N}\right) \frac{N-n}{N-1}. To derive this, compute \mathrm{Var}(X) = \sum_{j=1}^n \mathrm{Var}(I_j) + \sum_{j \neq \ell} \mathrm{Cov}(I_j, I_\ell), where \mathrm{Var}(I_j) = \frac{K}{N} \left(1 - \frac{K}{N}\right) and \mathrm{Cov}(I_j, I_\ell) = \frac{K}{N} \left(1 - \frac{K}{N}\right) \frac{-1}{N-1} for j \neq \ell, yielding n \frac{K}{N} \left(1 - \frac{K}{N}\right) + n(n-1) \frac{K}{N} \left(1 - \frac{K}{N}\right) \frac{-1}{N-1} = n \frac{K}{N} \left(1 - \frac{K}{N}\right) \frac{N-n}{N-1} after simplification; the factor \frac{N-n}{N-1} reflects reduced variability from sampling without replacement relative to the case. Higher moments exist in closed form but grow complex. The skewness is \gamma_1 = \frac{(N - 2K)(N - 2n)}{(N-2) \left[ n \frac{K}{N} \left(1 - \frac{K}{N}\right) \frac{N-n}{N-1} \right]^{3/2}} \sqrt{\frac{(N-1)^3}{N (N-2)}}, measuring asymmetry that is positive if K < N/2 and n < N/2 (or vice versa) and vanishes when K = N/2. The excess kurtosis is \kappa = \frac{N-1}{(N-2)(N-3)} \left[ 1 - 6 \frac{K(N-K)}{N(N-1)} - 6 \frac{n(N-n)}{N(N-1)} + 6 \frac{n K (N-K) (N-n)}{N^2 (N-1)^2 / (N-2)} \right], often less than 3 for moderate n/N, indicating lighter tails than the normal distribution; exact computation for specific parameters requires evaluating these or using the moment-generating function. Recursive relations, such as \mathbb{E}[X^r] = \frac{n K}{N} \mathbb{E}[(Y+1)^{r-1}] where Y \sim \mathrm{Hypergeometric}(N-1, K-1, n-1), facilitate numerical evaluation of raw moments.

Combinatorial Identities and Symmetries

The summation of the probability mass function over its support equals unity, as \sum_{k} \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}} = 1, a direct consequence of Vandermonde's identity \sum_{k} \binom{K}{k} \binom{N-K}{n-k} = \binom{N}{n}. This identity counts the total number of ways to choose n items from N by partitioning the choices into those including k successes from K and n-k failures from N-K, for all feasible k. The hypergeometric distribution possesses a combinatorial symmetry interchanging the roles of the number of successes K and the sample size n: \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}} = \frac{\binom{n}{k} \binom{N-n}{K-k}}{\binom{N}{K}}. This equality holds because both expressions compute the probability of exactly k overlaps between a fixed set of n sample positions and a randomly selected set of K success positions in the population of N, via complementary counting arguments. The right-hand form interprets the scenario as the distribution of successes falling into a prespecified sample when successes are assigned randomly to the population, dual to the standard sampling view.

Tail Bounds and Inequalities

A fundamental tail inequality for the hypergeometric random variable X \sim \mathrm{HG}(N, K, n) with mean \mu = nK/N is Hoeffding's bound, which states that \Pr(X \geq \mu + t) \leq \exp\left( -\frac{2t^2}{n} \right) for all t > 0. This result, derived for sums of bounded random variables including those from sampling , matches the corresponding bound for the and highlights that the negative associations in hypergeometric sampling preserve concentration comparable to trials. Serfling (1974) provided a refinement incorporating the finite correction factor f = (n-1)/N, yielding the tighter upper tail bound \Pr\left(X \geq \left(\frac{K}{N} + t\right)n \right) \leq \exp\left( -\frac{2t^2 n}{1 - f} \right) = \exp\left( -2t^2 n \cdot \frac{N}{N - n + 1} \right) for t > 0. This adjustment accounts for reduced variance in without-replacement sampling relative to the infinite population case, making it superior to Hoeffding's bound when n is a substantial fraction of N. Exponential bounds with higher-order terms further sharpen these estimates. For instance, Bardenet and Maillard (2015) derived improved inequalities for the upper , incorporating factors like (1 - n/N) and quartic terms in the deviation, which outperform Serfling's bound in regimes where more than half the is sampled. More recently, George (2024) unified existing inequalities and proposed refined bounds derived from Serfling's form, such as c = N \sqrt{ -\frac{N-n+1}{2nN} \ln(\delta/2) } for the deviation ensuring \Pr(|X - \mu| \geq c) \leq \delta when n \leq N/2. Simple yet effective recent derivations include a Chernoff-style bound using : \Pr(X \geq d) \leq \exp\left[ -K D\left( \frac{d}{K} \Big\| \frac{n}{N} \right) \right], for integer d \geq \mu + 1, where D(x \| y) = x \ln(x/y) + (1-x) \ln((1-x)/(1-y)), which sensitizes the bound to the sampling fraction n/N and excels when n > K. An alternative \beta-bound expresses the tail as \Pr(X \geq d) \leq I_{n/N}(d, K - d + 1), with I_x(a,b) the regularized incomplete , offering computational advantages and tighter performance over Serfling in symmetric regimes. These advancements, validated via simulations against benchmarks like Hoeffding and (2007), underscore ongoing refinements tailored to specific parameter ranges in hypergeometric tails.

Approximations and Limitations

Binomial Approximation Conditions

The hypergeometric distribution can be approximated by the with parameters n and p = K/N when the N is sufficiently large relative to the sample size n, rendering the dependence between draws negligible and approximating sampling with . This holds because the hypergeometric \Pr(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}} simplifies asymptotically to the form \binom{n}{k} p^k (1-p)^{n-k} as N \to \infty with n and p fixed, since the ratios in the falling factorials approach independence. A practical rule of thumb for the approximation's adequacy is n/N < 0.05, ensuring the relative error in probabilities remains small across the support. Some sources relax this to n/N < 0.10, though accuracy diminishes for values near this threshold, particularly for tail probabilities or when p is extreme (close to 0 or 1). The means coincide exactly as \mathbb{E}[X] = n \cdot (K/N), but the hypergeometric variance n p (1-p) \frac{N-n}{N-1} approaches the binomial variance n p (1-p) only when \frac{N-n}{N-1} \approx 1, reinforcing the n \ll N requirement. Violation of these conditions leads to underestimation of variance and poorer fit in finite samples, as verified in numerical comparisons.

Normal and Other Approximations

The hypergeometric random variable X \sim \text{Hypergeometric}(N, K, n) with mean \mu = n \frac{K}{N} and variance \sigma^2 = n \frac{K}{N} \left(1 - \frac{K}{N}\right) \frac{N-n}{N-1} converges in distribution to a normal random variable with the same mean and variance as N \to \infty and n \to \infty, provided n^2 / N \to 0. Under these conditions, the local limit theorem yields P(X = k) \approx \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left( -\frac{(k - \mu)^2}{2\sigma^2} \right). Stronger uniform convergence bounds, such as those from the Berry–Esseen theorem adapted to the hypergeometric case, hold for a wide range of \frac{K}{N} and \frac{n}{N}, with error rates on the order of O\left( \frac{1}{\sqrt{\min(np, n(1-p))}} \right) where p = \frac{K}{N}. A continuity correction enhances the approximation for tail probabilities: P(X \leq k) \approx \Phi\left( \frac{k + 0.5 - \mu}{\sigma} \right), where \Phi is the standard normal cumulative distribution function; this adjustment accounts for the discreteness of X by expanding the interval to [k+0.5, \infty) or similar. When \frac{n}{N} approaches a constant t \in (0,1), the variance requires adjustment to \sigma^2 (1 - t), and the normal density scales accordingly to reflect the finite population correction. Empirical rules of thumb for practical use include requiring \sigma^2 \geq 9 or np(1-p) \geq 10 (adjusted for the hypergeometric variance) to ensure reasonable accuracy, though these are heuristic and depend on the specific parameter regime. For rare events where \frac{K}{N} \to 0 as N \to \infty while \lambda = n \frac{K}{N} remains fixed and finite, the approximates a with parameter \lambda, as the without-replacement sampling behaves similarly to independent rare trials. This limit arises because the probability mass function P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}} simplifies to \frac{\lambda^k e^{-\lambda}}{k!} under the specified asymptotics, with dependencies between draws becoming negligible. The approximation improves when n is moderate relative to N and K is small, but degrades if depletion effects are significant (i.e., n comparable to K). Bounds like the quantify the total variation distance between the distributions as O\left( \frac{n^2}{N} + \frac{\lambda}{K} \right). Other approximations, such as Edgeworth expansions for higher-order corrections to the normal or saddlepoint approximations for tail probabilities, extend these limits but require more computational effort and are typically used when exact hypergeometric probabilities are intractable for large N. These methods incorporate skewness and kurtosis of the hypergeometric (e.g., skewness \gamma_1 = \frac{(N-2K)(N-1)^{1/2} (N-2n)}{(N-2)(N K n (N-K)(N-n))^{1/2}}) to refine the normal approximation beyond the central limit regime.

Computational and Practical Limitations

Exact evaluation of the hypergeometric probability mass function \Pr(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}} requires computing binomial coefficients, whose values grow rapidly with increasing N, K, and n, often exceeding the dynamic range of double-precision floating-point numbers (approximately $10^{308}) for N > 1000. This overflow occurs because intermediate factorials or products in naive multiplicative formulas for \binom{N}{k} become unrepresentable, yielding infinite or erroneous results. To address numerical instability, modern implementations employ logarithmic transformations, computing \log \Pr(X = k) via differences of log-gamma functions: \log \binom{N}{k} \approx \lgamma(N+1) - \lgamma(k+1) - \lgamma(N-k+1), where \lgamma is evaluated using asymptotic expansions or table lookups for large arguments to maintain precision up to relative errors of $10^{-15} or better. Recursive ratio methods, multiplying successive terms \frac{\Pr(X = k+1)}{\Pr(X = k)} = \frac{(k+1)(N-K-n+k+1)}{(n-k)(K-k)}, further avoid large intermediates by starting from a or boundary and iterating, though they still rely on log-space accumulation for back to probabilities. For cumulative distribution functions or tail probabilities, such as in Fisher's exact test for 2×2 contingency tables, exact computation demands summing over up to \min(n, K) terms, each potentially requiring the above techniques; while single-term evaluation is O(1) with precomputation, full p-values exhibit worst-case time complexity O(N) due to the summation extent and binomial evaluations, becoming prohibitive for N > 10^5 without optimization. In practice, for large-scale applications like gene set enrichment analysis with N in the millions (e.g., human genome ~3×10^7 bases), exact tails involve thousands of terms with minuscule probabilities (~10^{-100}), leading to underflow, rounding error propagation in summation, and excessive runtime, necessitating Monte Carlo simulation or Poisson/binomial approximations despite their asymptotic validity only when n \ll N.

Illustrative Examples

Basic Sampling Example

A prototypical for the hypergeometric distribution involves drawing a fixed-size sample without from a finite divided into two mutually exclusive categories, such as "" and "." Formally, let the be N, with K es and N - K s; a sample of n items is selected, where n ≤ N, and X denotes the number of es observed in the sample, with X ranging from max(0, n + K - N) to min(n, K). The probability that X = k is P(X = k) = \frac{\binom{K}{k} \binom{N - K}{n - k}}{\binom{N}{n}}, where \binom{a}{b} is the representing the number of ways to choose b items from a without regard to order. To illustrate, consider an containing N = 10 balls, of which K = 4 are red (successes) and 6 are blue (failures); draw n = 3 balls without replacement. The possible values of X (number of red balls drawn) are k = 0, 1, 2, 3. The probabilities are computed as follows:
kP(X = k)Calculation
01/6 ≈ 0.1667\frac{\binom{4}{0} \binom{6}{3}}{\binom{10}{3}} = \frac{1 \cdot 20}{120}
11/2 = 0.5\frac{\binom{4}{1} \binom{6}{2}}{\binom{10}{3}} = \frac{4 \cdot 15}{120}
20.3\frac{\binom{4}{2} \binom{6}{1}}{\binom{10}{3}} = \frac{6 \cdot 6}{120}
31/30 ≈ 0.0333\frac{\binom{4}{3} \binom{6}{0}}{\binom{10}{3}} = \frac{4 \cdot 1}{120}
These values sum to 1, confirming the distribution's validity as a probability model. The dependence between draws (due to no ) distinguishes this from the , where probabilities remain constant across trials.

Real-World Scenario Interpretation

In processes, the hypergeometric distribution quantifies the probability of encountering a specific number of defective items when sampling without replacement from a finite batch, accounting for the depletion of the that alters successive draw probabilities unlike independent trials in models. For example, consider a producing 1,000 widgets where reveals K=50 defectives prior to full shipment; inspectors then draw n=100 widgets randomly without replacement to evaluate the lot. The X representing observed defectives follows Hypergeometric(N=1000, K=50, n=100), with P(X=k) = [C(50,k) * C(950,100-k)] / C(1000,100), enabling calculation of risks such as P(X ≥ 10) to inform acceptance thresholds that balance false positives and negatives in lot disposition. This interpretation underscores the distribution's utility in finite-population scenarios where sampling fraction n/N exceeds typical approximations (e.g., here ~10%), as dependencies inflate variance relative to np(1-p). In electoral auditing, the hypergeometric distribution interprets the consistency between sampled ballots and aggregate tallies to detect irregularities in finite vote universes without assumptions. For instance, in a with N=10,000 ballots where K=6,000 validly favor Candidate A per official count, auditors might hand-recount n=500 randomly selected ballots, modeling X=observed A votes as Hypergeometric(N=10,000, K=6,000, n=500); deviations like P(X ≤ 240) could signal probabilities under null hypotheses of accurate reporting, guiding risk-limiting audits that scale sample sizes inversely with desired error bounds. Such applications highlight causal dependencies in vote pools, where early discrepancies propagate evidential weight, prioritizing empirical verification over approximations valid only for negligible sampling fractions.

Statistical Inference

Point and Interval Estimation

The method of moments provides a straightforward point for the success proportion p = K/N, given by the sample proportion \hat{p} = k/n. This follows from equating the observed mean k to the theoretical E[X] = n p, yielding an unbiased since E[\hat{p}] = p. The corresponding for K is \hat{K} = \hat{p} N = k N / n, which is rounded to the nearest when K must be . The maximum likelihood estimator (MLE) for K maximizes the hypergeometric P(X = k \mid N, K, n) over values of K between \max(0, n + K - N) wait, \max(0, k + n - N) and \min(k, n), but typically from 0 to N. involves finding the K where the likelihood ratio L(K+1)/L(K) \leq 1 \leq L(K)/L(K-1), with L(K+1)/L(K) = \frac{(K+1)(N - K - n + k)}{(K + 1 - k)(N - K)}. For large N and n, the MLE approximates the method of moments estimator but incorporates discreteness effects, often computed numerically or via software implementing recursive evaluation. In related capture-recapture contexts modeled by the hypergeometric distribution, bias-reduced MLE variants like \hat{K} = \frac{(n+1)(k+1)}{N+2} - 1 (floored if necessary) are used, though exact form requires case-specific verification. Interval estimation for p or K accounts for the variance \mathrm{Var}(X) = n p (1-p) \frac{N-n}{N-1}, which includes a finite correction factor \frac{N-n}{N-1}. An approximate $1 - \alpha for p is \hat{p} \pm z_{\alpha/2} \sqrt{ \frac{\hat{p} (1 - \hat{p}) (N - n)}{n (N - 1)} }, where z_{\alpha/2} is the $1 - \alpha/2 of the standard ; this performs well when n p (1-p) \geq 5 and N is not too small relative to n. For K, the interval is N times the one for p, clipped to integers [0, N]. Exact intervals, preferred for small samples to achieve nominal coverage despite discreteness, are constructed by inverting hypergeometric tests: the $1 - \alpha for K comprises all integers K' such that the two-sided p-value for testing H_0: K = K' given observed k exceeds \alpha, computed as \min\left( \sum_{j=0}^k P(X=j \mid K'), \sum_{j=k}^{\min(n,K')} P(X=j \mid K') \right) \times 2 + P(X=k \mid K'). Efficient algorithms using tail probability recursions enable fast computation without full enumeration, yielding shortest intervals with guaranteed coverage at least $1 - \alpha. These methods outperform approximations in finite samples and are implemented in statistical software.

Hypothesis Testing with Fisher's Exact Test

Fisher's exact test utilizes the hypergeometric distribution to conduct precise hypothesis testing for independence between two dichotomous variables represented in a 2×2 contingency table, particularly suitable for small sample sizes where asymptotic approximations like the chi-squared test fail. The test conditions on the observed row and column marginal totals, treating one cell entry—such as the count of successes in the first group—as a realization from a hypergeometric distribution with population size N equal to the grand total, K as the total successes in the population, and n as the sample size from the first group. Under the null hypothesis of independence (equivalent to an odds ratio of 1), this conditional distribution holds exactly, without reliance on large-sample assumptions. The for the hypergeometric X (representing the cell count) is given by \Pr(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}, where k ranges from \max(0, n + K - N) to \min(n, K). To compute the , all possible tables with the fixed margins are enumerated, each assigned a hypergeometric probability, and the is the sum of probabilities for tables at least as extreme as the observed one. For a two-sided test, this typically includes tables with probabilities less than or equal to that of the observed table; one-sided variants sum over the tail in the direction of the . This approach ensures the test maintains its nominal significance level exactly, even with sparse data where more than 20% of expected cell frequencies are below 5 or any below 1, conditions under which the chi-squared test's approximation is unreliable. For instance, in analyzing whether a affects outcomes across two groups, the test evaluates against the by quantifying the rarity of the observed under hypergeometric sampling. Computational implementation often involves software like R's fisher.test() function, which handles directly for moderate sizes or for larger ones.

Multivariate Hypergeometric Distribution

The multivariate hypergeometric distribution describes the joint probability distribution of counts obtained when drawing a fixed sample size n without replacement from a finite population of size N divided into k \geq 2 mutually exclusive categories, where category i has N_i items and \sum_{i=1}^k N_i = N./12:_Finite_Sampling_Models/12.03:_The_Multivariate_Hypergeometric_Distribution) Let X_i denote the count of items from category i in the sample for i = 1, \dots, k; then the random vector (X_1, \dots, X_k) follows this distribution, denoted as \text{MultiHyper}(N; N_1, \dots, N_k; n), with the constraint \sum_{i=1}^k X_i = n. The is given by P(X_1 = x_1, \dots, X_k = x_k) = \frac{\prod_{i=1}^k \binom{N_i}{x_i}}{\binom{N}{n}} for non-negative integers x_i satisfying \sum_{i=1}^k x_i = n and $0 \leq x_i \leq N_i for each i, and zero otherwise; here, \binom{\cdot}{\cdot} denotes the ./12:_Finite_Sampling_Models/12.03:_The_Multivariate_Hypergeometric_Distribution) This formulation arises directly from the uniform probability over all \binom{N}{n} possible samples, with the numerator counting favorable outcomes for the specified counts. The marginal distribution of any single X_i is univariate hypergeometric with parameters N, N_i, and n, reducing the multivariate case to the standard hypergeometric when k=2. The mean of X_i is E[X_i] = n \frac{N_i}{N}, reflecting the proportional representation of category i in the population./12:_Finite_Sampling_Models/12.03:_The_Multivariate_Hypergeometric_Distribution) The variance is \text{Var}(X_i) = n \frac{N_i}{N} \left(1 - \frac{N_i}{N}\right) \frac{N-n}{N-1}, which is smaller than the binomial variance n p (1-p) (with p = N_i/N) due to the finite-population correction factor (N-n)/(N-1). For i \neq j, the covariance is \text{Cov}(X_i, X_j) = -n \frac{N_i}{N} \frac{N_j}{N} \frac{N-n}{N-1}, negative as expected from the fixed total sample size inducing dependence. Higher-order moments, including central and noncentral forms, can be derived recursively or via generating functions, with explicit formulas available for practical computation. The models scenarios like randomized allocation or quality sampling across multiple defect types, where dependencies among categories must be accounted for explicitly.

Negative and Noncentral Variants

The negative hypergeometric distribution models the number of failures preceding a predetermined number of successes in sampling without from a finite of N containing K successes, where sampling continues until r successes are obtained. Let Y denote the number of failures observed before the r-th success; then Y follows a negative hypergeometric distribution with parameters N, K, and r, where $0 < r \leq K and Y ranges from 0 to N - K. This distribution arises in scenarios such as quality inspections where defects (failures) are counted until a fixed number of acceptable items (successes) are found, or in gaming contexts like drawing cards until a specific number of a suit is reached. The probability mass function is given by \Pr(Y = k) = \frac{\dbinom{K}{r-1} \dbinom{N-K}{k}}{\dbinom{N}{k+r-1}} \cdot \frac{K - r + 1}{N - k - r + 1}, for k = 0, 1, \dots, N - K, reflecting the hypergeometric probability of r-1 successes in the first k + r - 1 draws multiplied by the conditional probability of a success on the next draw. The expected value is \E[Y] = r \cdot \frac{N - K + 1}{K + 1}, and the variance is \Var(Y) = r \cdot \frac{(N - K + 1)(N + 1)}{(K + 1)^2 (K + 2)} \cdot (K + 1 - r). Noncentral hypergeometric distributions extend the standard (central) hypergeometric by incorporating bias through an odds parameter , which modifies selection probabilities to reflect differential attractiveness or weights of subgroups; when \omega = 1, the distribution reduces to the central case. Two principal variants exist: Fisher's noncentral hypergeometric distribution and , differing in their modeling of bias. Fisher's variant models the conditional distribution of but biased or counts given their fixed sum, yielding the \Pr(X = k) = \frac{\dbinom{K}{k} \dbinom{N-K}{n-k} \omega^k}{\sum_{j} \dbinom{K}{j} \dbinom{N-K}{n-j} \omega^j}, for k in the feasible range, where N is , K subgroup size, n draws, and \omega the favoring the first subgroup; it applies to sampling approximations, such as in genetic associations under . Wallenius' variant, in contrast, describes sequential sampling without replacement where draw probabilities are proportional to current subgroup weights (e.g., \omega times size for the first subgroup), leading to a more complex normalizing constant involving a hypergeometric : \Pr(X = k) = \frac{\dbinom{K}{k} \dbinom{N-K}{n-k}}{\dbinom{N}{n}} \int_0^1 (1 - t^{D_1})^{k} (1 - t^{D_2})^{n-k} \, dt, with D_i derived from weights and sample sizes; this captures competition effects in models with unequal weights, as in ecological or biased urn experiments. Both variants lack closed-form moments in general, requiring numerical computation, and are implemented in statistical software for exact or approximate inference.

Applications

Quality Control and Industrial Sampling

The hypergeometric distribution is applied in quality control to model the exact probability of observing a specific number of defective items in a sample drawn without replacement from a finite production lot, ensuring precise assessment when the sample size is non-negligible relative to the lot size. In this context, the population size N represents the total lot size, K denotes the number of defectives in the lot, n is the sample size, and k is the number of defectives observed in the sample; the probability mass function P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}} computes the likelihood for k ranging from \max(0, n + K - N) to \min(n, K). This approach contrasts with binomial approximations, which assume independence via replacement or infinite populations, but hypergeometric provides superior accuracy for finite lots by accounting for dependency as items are depleted. In industrial , plans specify lot size N, sample size n, and number c, where the lot is if the sample yields at most c defectives; the hypergeometric distribution calculates the operating characteristic () curve, plotting probability against the true defective proportion K/N. For instance, standards like ANSI/ASQ Z1.4 incorporate hypergeometric computations for OC curves in attribute when lot sizes are specified, enabling manufacturers to evaluate risks of producer's and consumer's errors—such as defective lots (Type II error) or rejecting good ones (Type I error). This method optimizes costs by balancing sample size against discrimination power; for a lot of 500 units with sample n=80 and c=3, hypergeometric yields exact probabilities that deviate from estimates by up to 5-10% when n/N > 0.1. Applications extend to defect analysis in , where hypergeometric-based p-charts with dynamic limits monitor fraction defectives over multiple lots, adapting for finite sampling without to reduce false alarms. Empirical studies confirm its utility: in a simulated run of N=1000 items with K=50 defectives, sampling n=100 yields P(X \leq 5) \approx 0.95 under hypergeometric, informing lot disposition and process adjustments to minimize variability. While suffices for large N, hypergeometric's exactness prevents over- or under-estimation in high-value sectors like , where misacceptance costs exceed $10,000 per lot.

Genetics and Bioinformatics

In bioinformatics, the hypergeometric distribution is employed in to determine whether a predefined biological category, such as a term or pathway, is statistically overrepresented in a list of genes identified through experiments like sequencing or genome-wide association studies (GWAS). Here, the total number of genes in the reference set (e.g., the annotated genome) serves as the population size N, the number of genes annotated to the category is K, the number of genes in the experimental list (e.g., differentially expressed genes) is n, and the observed number in both is k. The one-sided , calculated as the sum of hypergeometric probabilities for k or greater, tests the of no enrichment beyond random expectation. This method assumes independence under the null and is computationally efficient for large N, though it can be conservative for highly overlapping sets. The hypergeometric test equates to the one-tailed in the context of 2x2 tables, which compares observed overlaps against hypergeometric expectations. In , this application extends to assessing allelic associations in case-control studies, where rows represent status and columns represent genotypes or alleles, enabling exact inference without relying on large-sample approximations like the , particularly useful for rare variants or small cohorts. Extensions, such as Bayesian variants, incorporate prior weights on s to address biases from gene length or expression levels, improving accuracy in weighted enrichment analyses. In population genetics, the distribution models finite-population sampling of alleles or genotypes without replacement, as in Wright-Fisher models adapted for small, closed populations where genetic drift depletes allele frequencies non-binomially. For instance, it quantifies the probability of observing a specific number of success alleles in gamete pools drawn from diploid individuals, informing exact tests for Hardy-Weinberg deviations in structured populations. Such uses highlight its role in causal inference for inheritance patterns under resource constraints, though approximations like the binomial suffice for large N.

Games, Gambling, and Elections

The hypergeometric distribution arises in card games where hands are dealt without replacement from a finite deck, modeling the probability of obtaining a specific number of cards with desired properties. For example, in five-card poker from a containing 4 aces, the number of aces drawn follows a hypergeometric distribution with N=52, number of success states K=4, and sample size n=5. Similarly, in , the number of cards of a particular suit in a 13-card hand is hypergeometric with N=52, K=13, and n=13. Deck-building games like Magic: The Gathering employ hypergeometric calculators to compute probabilities of drawing key cards, such as lands or specific spells, from shuffled decks without replacement, aiding in optimizing deck composition for competitive play. In gambling contexts involving sequential draws without replacement, the hypergeometric distribution quantifies outcomes in scenarios like games or card-based wagers. A variant might involve spinning to determine the number of cards flipped from a , with payoffs based on jokers or matches drawn, directly following hypergeometric probabilities. The negative hypergeometric distribution, a related variant, models waiting times until a fixed number of successes in such draws, as analyzed in applications where anticipate hitting a , such as drawing a certain number of winning symbols from a finite pool. These models highlight dependencies introduced by depletion of the draw pool, contrasting with approximations valid only for large populations relative to sample size. Election polling and auditing leverage the hypergeometric distribution to assess sample-based inferences from finite voter populations without . In pre-election surveys, the number of supporters for a in a sample mirrors hypergeometric sampling, enabling exact probability calculations for vote shares when population size N is known and comparable to sample n. Post-election audits, such as risk-limiting audits (RLAs) for verifying tallies against ballots, use hypergeometric models to determine the probability that a sample confirms the reported , with parameters reflecting total ballots N, reported votes for the apparent winner K, and audited ballots n. For instance, in a 200,000-vote audit, hypergeometric distributions compute confidence intervals for vote discrepancies, ensuring statistical rigor in . This approach accounts for finite population effects, providing tighter bounds than methods in close races.

References

  1. [1]
    Hypergeometric Distribution - MATLAB & Simulink
    The hypergeometric distribution models the total number of successes in a fixed-size sample drawn without replacement from a finite population.
  2. [2]
    7.4 - Hypergeometric Distribution | STAT 414 - STAT ONLINE
    The probability mass function of the discrete random variable is called the hypergeometric distribution and is of the form:
  3. [3]
    Hypergeometric distribution - Minitab - Support
    The hypergeometric distribution is a discrete distribution that models the number of events in a fixed sample size when you know the total number of items ...
  4. [4]
    Hypergeometric Distribution: Uses, Calculator & Formula
    The hypergeometric distribution is a discrete probability distribution that calculates the likelihood an event happens k times in n trials.
  5. [5]
    [PDF] HYPCDF
    Mar 20, 1997 · DESCRIPTION. The hypergeometric distribution is the probability of selecting LL marked items when a random sample of size KK is taken ...
  6. [6]
    HypergeometricDistribution - Wolfram Language Documentation
    A hypergeometric distribution gives the distribution of the number of successes in n draws from a population of size ntot containing nsucc successes.
  7. [7]
    [PDF] lecture 11 - Washington
    GEOMETRIC DISTRIBUTION. poIsson DISTRIBUTION ... HYPERGEOMETRIC DISTRIBUTION. You have an urn with ... Deriving the PMF. Deriving the Variance. Deriving the PMF.
  8. [8]
    Lesson 12 Hypergeometric Distribution | Introduction to Probability
    There is another formula for the hypergeometric p.m.f. that looks different but is equivalent to (12.1). It is based on counting the number of ordered outcomes ...
  9. [9]
    [PDF] Hypergeometric distribution
    The hypergeometric distribution is used for sam- pling without replacement from a finite population of items. More specifically, a hypergeometric random ...
  10. [10]
    [PDF] MIT - Handout on hypergeometric distribution (D&S 5.3)
    Remember, fX (x|n,N,m) is a PMF, that is, it represents the probability of getting x, given n,N,m. n and m are data, so we are choosing the value of N that.
  11. [11]
    [PDF] The Hypergeometric Distribution
    The Hypergeometric Distribution. Math 394. We detail a few features of the Hypergeometric distribution that are discussed in the book by Ross. 1 Moments. Let P ...Missing: textbook | Show results with:textbook
  12. [12]
    Hypergeometric Distribution -- from Wolfram MathWorld
    (41). The skewness is. gamma_1, = (q-p)/(sqrt(npq))sqrt((N-1. (42). = ((m-n)(m+n-2N))/(m+. (43). and the kurtosis excess is given by a complicated expression.
  13. [13]
    [PDF] Properties of Hypergeometric Distribution
    The Hypergeometric distribution is given by. Pr = w r n - w m - r n m . Firstly, we will show that min(w,m). X r=0. Pr = 1. Using Vandermonde's identity we find.<|control11|><|separator|>
  14. [14]
    Hypergeometric distribution symmetry
    May 23, 2023 · Symmetry in the hypergeometric distribution, swapping the role of the sample and the population of successes.
  15. [15]
  16. [16]
    Approximations to the Hypergeometric distribution - ModelAssist
    Dec 28, 2023 · The Hypergeometric distribution can be approximated by a Binomial as follows: Hypergeometric(D/M, n, M) » Binomial(D/M, n)
  17. [17]
    Binomial and Hypergeometric Distribution - Freie Universität Berlin
    A hyper-geometric distribution can usually be approximated by a binomial distribution. The reason is that, if the sample size does not exceed 5% of the ...
  18. [18]
    probability - Proof that the hypergeometric distribution with large $N ...
    Mar 14, 2013 · Now taking the large N limit for fixed r/N, n and x we get the binomial pmf, since limN→∞(r−x+k)(N−x+k)=limN→∞rN=p. and limN→∞ ...Convergence of Hypergeometric Distribution to Binomialapproximation hypergeometric distribution with binomialMore results from math.stackexchange.com
  19. [19]
    The Binomial Approximation to the Hypergeometric
    As a rule of thumb, if the population size is more than 20 times the sample size (N > 20 n), then we may use binomial probabilities in place of hypergeometric ...
  20. [20]
    Binomial Approximation to Hypergeometric Probability
    Aug 1, 2018 · A common rule of thumb for usefulness of the binomial approximation is to have n/N<0.1.) Let's look at specific numbers to see how this plays ...
  21. [21]
    [PDF] The difference between the hypergeometric and the binomial ...
    whether the rule of thumb n. N. < 0.1 (see e.g. Johnson et al. (1992), p. 257) is justified or not. We have calculated the ratio. N − 1 n − 1. kQ(n;K, N) − P ...<|separator|>
  22. [22]
    [PDF] Treibergs Approximating the Hypergeometric by Binomial Distribution
    Jun 12, 2011 · In an experiment where each trial results in S or F, but the sampling is without replace- ment from a population of size N, if the sample ...<|separator|>
  23. [23]
    [PDF] The normal approximation to the hypergeometric distribution - Chance
    Feller's conditions seem too stringent for applications and are difficult to prove. It is the purpose of this note to re-formulate and prove a suitable ...
  24. [24]
    Normal approximation to the hypergeometric distribution in ...
    In this paper, we derive a non-uniform Berry–Esseen theorem on Normal approximation to the Hypergeometric distribution for a wide range of values of p and f.
  25. [25]
    [PDF] approximations to the poisson, binomial and hvpergeometric ...
    Approximations to well known discrete distribution functions are the subject of a large number of publications, but little has been done to.
  26. [26]
    [PDF] NORMAL APPROXIMATION TO THE HYPERGEOMETRIC ...
    The aim of this paper is to directly derive the normal distribution as an approximation to the hypergeometric distribution; obtain the conditions under which ...
  27. [27]
    Computing hypergeometric probability efficiently in C++
    Oct 28, 2016 · Computing hypergeometric function is a slow and difficult process, often affected by overflow errors as evaluating binomial coefficient may ...Missing: numerical | Show results with:numerical
  28. [28]
    Calculating Hypergeometric Distribution probability in PL/SQL
    Aug 26, 2022 · There is a reasonable expectation of overflow for large values and letting it happen may not be an undesirable result. If this is the case for ...
  29. [29]
    Calculating hypergeometric probability with approximations
    Jul 10, 2014 · Your problem is overflow in fastChoose. With such large numbers the value in the denominator of the calculations in myDHyper is Inf so the ...How to compute hypergeometric distribution probabilities for ...Which probability result is greater? Using hypergeometric ...More results from stats.stackexchange.comMissing: numerical | Show results with:numerical
  30. [30]
    Hypergeometric Distribution - 1.43.0
    Calculation of the product of the primes requires some care to prevent numerical overflow, we use a novel recursive method which splits the calculation into ...
  31. [31]
    New upper bounds for tight and fast approximation of Fisher's exact ...
    The new upper bounds are fast to calculate and approximate Fisher's p -value accurately. In addition, the new approximations are not sensitive to the data size, ...
  32. [32]
    What is the algorithm OR mathematics for Fisher's Exact Test?
    Jun 16, 2011 · This is an old question, but IIRC the test has huge time complexity. It turns out a probabilistic/monte carlo approach is quite accurate and ...
  33. [33]
    Numerical Approximation of Hypergeometric For Maximum ...
    Feb 15, 2022 · Find a transformation of the effective likelihood that avoids overflow. Instead of finding the maximum of the effective likelihood numerically, ...Compute big number of Binomial in Hypergeometric DistributionWhen is the hypergeometric function ever needed if it can be ...More results from math.stackexchange.comMissing: computing probabilities
  34. [34]
    [PDF] 3.2 Hypergeometric Distribution 3.5, 3.9 Mean and Variance
    Repeat n times. On each draw, the probability of green is 700/1000. The # green balls drawn has a binomial distribution, p =.
  35. [35]
    [PDF] 3.5 Hypergeometric and Negative Binomial Distributions Distributions
    The Hypergeometric Distribution​​ Example 1. A lot consists of N =10 articles of which M = 6 are good (S) and N – M = 4 are defective (F). n=5 articles are ...
  36. [36]
    Hypergeometric Distribution: Examples and Formula
    The hypergeometric distribution is very similar to the binomial distribution. In fact, the binomial distribution is a good approximation of the hypergeometric ...
  37. [37]
    Hypergeometric Distribution - Quality Gurus
    The Hypergeometric Distribution is defined by three parameters: the population size (N), the number of successes in the population (K), and the sample size (n).
  38. [38]
    Hypergeometric Distribution model - GeeksforGeeks
    Oct 25, 2024 · The hypergeometric distribution models the probability of obtaining a specific number of successes in a given number of draws from a finite population.
  39. [39]
    ML Interview Q Series: Election Flip Probability: Hypergeometric ...
    ML Interview Q Series: Election Flip Probability: Hypergeometric Analysis of Randomly Removed Illegal Votes. May 08, 2025. Browse all the Probability ...
  40. [40]
    [PDF] Percentage-based vs. SAFE Vote Tabulation Auditing
    Nov 2, 2007 · In this paper, we address just one component of electoral audits: specifying how many randomly selected precincts should undergo hand-counted ...
  41. [41]
    application of the hypergeometric model in electoral disputes ...
    Aug 6, 2025 · presidential elections had its own share of irregularities. The 2008 Presidential Election took place on the 7th of December and included both ...<|control11|><|separator|>
  42. [42]
    2. The Hypergeometric Distribution - Random Services
    In the ball and urn experiment, vary the parameters and switch between sampling without replacement and sampling with replacement. Note the difference between ...
  43. [43]
    [PDF] A Note About Maximum Likelihood Estimator in Hypergeometric ...
    In this paper, rigorous procedures in order to find the maximum likelihood estimator of N and R in a hypergeometric distribution are presented. Key words: ...
  44. [44]
    [PDF] Topic 15: Maximum Likelihood Estimation - Arizona Math
    The likelihood function for N is the hypergeometric distribution. ... Note that the maximum likelihood estimator for the total fish population is ˆN = 1904.
  45. [45]
    [PDF] Optimal and fast confidence intervals for hypergeometric successes
    Apr 28, 2022 · confidence interval. A confidence set C is symmetrical if. C(x) = N ... The tail of the hypergeometric distribution. Discrete.
  46. [46]
    [PDF] A new approach to precise interval estimation for the parameters of ...
    Mar 16, 2020 · the discreteness of the hypergeometric distribution ... the nesting property if for each x, every confidence interval contains any confidence ...
  47. [47]
    Chi-squared test and Fisher's exact test - PMC - NIH
    Mar 30, 2017 · Fisher's exact test assesses the null hypothesis of independence applying hypergeometric distribution of the numbers in the cells of the table.
  48. [48]
    [PDF] Fisher's exact test
    Let x be the number of white balls in the sample. x follows a hypergeometric distribution (w/ parameters K, N, n). Suppose X ∼ Hypergeometric (N, K, n). No.Missing: pmf | Show results with:pmf
  49. [49]
    4.5 - Fisher's Exact Test | STAT 504
    Fisher's Exact Test is an exact test used when row and column totals are fixed, or with small samples, and no large-sample approximations are used.Missing: complexity | Show results with:complexity
  50. [50]
    3. The Multivariate Hypergeometric Distribution - Random Services
    We will compute the mean, variance, covariance, and correlation of the counting variables. Results from the hypergeometric distribution and the ...
  51. [51]
    Better understanding of the multivariate hypergeometric distribution ...
    Jan 3, 2021 · We present a more transparent understanding for the covariance structure of the multivariate hypergeometric distribution.
  52. [52]
    Central and noncentral moments of the multivariate hypergeometric ...
    Apr 14, 2024 · In this short note, explicit formulas are developed for the central and noncentral moments of the multivariate hypergeometric distribution.
  53. [53]
    [PDF] B671-672 Supplemental Notes 2 Hypergeometric, Binomial ... - rafalab
    1 Binomial Approximation to the Hypergeometric. Recall that the ... binomial approximation to the hypergeometric yields. P(x1,x2,...,xk) = n! Qki ...
  54. [54]
    [PDF] A Gaming Application of the Negative Hypergeometric Distribution
    May 1, 2013 · The Negative Hypergeometric distribution represents waiting times when drawing from a finite sample without replacement.
  55. [55]
    A Gaming Application of the Negative Hypergeometric Distribution
    The Negative Hypergeometric distribution represents waiting times when drawing from a finite sample without replacement. It is analogous to the negative ...
  56. [56]
    [PDF] Negative hypergeometric distribution - Mathematics
    The cumulative distribution, survivor function, hazard function, cumulative hazard function, and inverse distribution function, moment generating function, and ...
  57. [57]
    [PDF] Noncentral hypergeometric distribution - Agner Fog
    These two distributions will be called Wallenius' and Fisher's noncentral hypergeometric distribution, respectively. An accompanying paper describes the ...
  58. [58]
    [PDF] Biased Urn Theory
    This scenario will give a distribution of the types of fish caught that is equal to Wallenius' noncentral hypergeometric distribution. You are catching fish as ...<|control11|><|separator|>
  59. [59]
    Sampling Methods for Wallenius' and Fisher's Noncentral ...
    Several methods for generating variates with univariate and multivariate Walleniu' and Fisher's noncentral hypergeometric distributions are developed.
  60. [60]
    [PDF] Hypergeometric Distribution in Quality Sampling: Analyzing Defects ...
    For example, Kume (2012) proposed combining hypergeometric probability calculations with cost–benefit analyses to determine optimal sample sizes for high- cost ...
  61. [61]
    (PDF) Research on the Comparison of Sampling with and Without ...
    Aug 9, 2025 · These two sampling methods lead to binomial and hypergeometric distributions of probabilities, respectively. The differences between them are ...
  62. [62]
    Acceptance Sampling Plans Based on the Hypergeometric Distribution
    Feb 21, 2018 · A FORTRAN program is presented which deals with acceptance sampling plans in the case of sampling from the hypergeometric distribution (finite ...
  63. [63]
    OC Curve with Hypergeometric Method - Accendo Reliability
    The hypergeometric distribution is described here. In this example we have 500 units per lot and using the ANSI/ASQ Z1.4-2008 Sampling Procedures and Tables for ...
  64. [64]
    Attribute Acceptance Sampling Plans - Duke Mathematics Department
    A spreadsheet in which the Hypergeometric distribution is used to calculate the probability of acceptance for any plan (N, n, c), whereas the Binomial ...
  65. [65]
    Hypergeometric p-chart with dynamic probability control limits for ...
    We present in this paper the Hypergeometric p-chart with probability control limits, where fraction data is assumed to follow the Hypergeometric distribution.
  66. [66]
    Hypergeometric Distribution in Quality Sampling: Analyzing Defects ...
    This study investigates the application of the hypergeometric distribution in quality sampling for finite production lots. A mathematical model is developed ...
  67. [67]
    Applications of Generalized Hypergeometric Distribution on ... - MDPI
    The hypergeometric distribution is frequently used in quality control settings, such as estimating the likelihood that a sample taken from a production ...
  68. [68]
    A Bayesian Extension of the Hypergeometric Test for Functional ...
    The hypergeometric P-value has been widely used to investigate whether genes from pre-defined functional terms, e.g., Gene Ontology (GO), are enriched in the DE ...
  69. [69]
    Hypergeometric Testing Used for Gene Set Enrichment Analysis
    In this chapter we show how to carry out Hypergeometric tests to identify potentially interesting gene sets.
  70. [70]
    Enrichment or depletion of a GO category within a class of genes
    We clarify the relationships existing between these tests, in particular the equivalence between the hypergeometric test and Fisher's exact test. We recall that ...
  71. [71]
    6.3 The Hypergeometric Distribution - Probability For Data Science
    P(Na=k)=(4k)( ... That's a very small P-value, which implies that the data support the alternative hypothesis more than they support the null.Missing: range | Show results with:range
  72. [72]
    Analyze Card Games with Hypergeometric Distribution - Wolfram
    Use HypergeometricDistribution[5, 13, 52] for five-card poker and HypergeometricDistribution[13, 13, 52] for bridge.
  73. [73]
    Your Friend the Hypergeometric Distribution - GameTek
    Mar 4, 2023 · The Hypergeometric Distribution covers situations where the thing you draw doesn't get replaced in between pulls.
  74. [74]
    HyperGeometric probability casino game | Free Math Help Forum
    Nov 8, 2019 · So for example if they land the spinner on 3 they get to flip over 3 cards 1 at a time without replacement (hypergeometric). If they get a joker ...
  75. [75]
    [PDF] Random Auditing of E-Voting Systems: How Much is Enough?
    Aug 9, 2006 · Abstract – E-voting systems with a means of independent verification, such as a voter-verified paper record or ballot, can be audited to ...
  76. [76]
    [PDF] Attached find: 1. Requested calculations of statistical confidence of ...
    our election. To do this, we used the hypergeometric distribution. For example, assume 200,000 votes were cast in an election and the VS told us that a ...