Fact-checked by Grok 2 weeks ago

Poisson binomial distribution

The Poisson binomial distribution is a that arises as the sum of a fixed number of independent random variables, each with its own distinct success probability p_i (where $0 \leq p_i \leq 1), generalizing the classical where all probabilities are equal. Named after the French mathematician Siméon Denis Poisson, who introduced it in 1837 as a heterogeneous extension of the binomial model for analyzing jury decisions and moral probabilities, the distribution has since been studied for its combinatorial and probabilistic properties, with modern surveys highlighting its connections to polynomial geometry and approximation theory. The probability mass function for a Poisson binomial random variable X = \sum_{i=1}^n Y_i, where each Y_i \sim \text{Bernoulli}(p_i), is given by
P(X = k) = \sum_{\substack{A \subseteq \{1, \dots, n\} \\ |A| = k}} \prod_{i \in A} p_i \prod_{i \notin A} (1 - p_i),
for k = 0, 1, \dots, n, reflecting all subsets of successes. Its mean is \mu = \sum_{i=1}^n p_i and variance is \sigma^2 = \sum_{i=1}^n p_i(1 - p_i), both straightforward sums unlike the uniform case. The distribution exhibits the strong Rayleigh property, meaning its probability generating function has only real, negative roots, which aids in bounding tail probabilities and stochastic comparisons.
For large n, it can be approximated by a with the above mean and variance, or by a under rare-event conditions (small p_i), with error bounds established in classical results like . Computation of the exact is nontrivial due to the exponential number of terms but can be efficiently handled via discrete Fourier transforms of the . Applications span (e.g., system failure probabilities with heterogeneous components), (e.g., nonresponse adjustments), (e.g., insurance claim aggregates), (e.g., default risk portfolios), and (e.g., distribution learning from heterogeneous data).

Introduction

Overview

The Poisson binomial distribution describes the probability distribution of the number of successes in a sequence of n independent Bernoulli trials, where each trial i has its own distinct success probability p_i \in [0,1]. This contrasts with the standard , which assumes identical success probabilities across all trials. Formally, if X = \sum_{i=1}^n \xi_i where each \xi_i is a random variable with parameter p_i, then X follows a Poisson binomial distribution, denoted X \sim \text{PB}(p_1, \dots, p_n). This distribution arises naturally in contexts involving heterogeneous probabilities, such as reliability analysis in and processes where components may have varying failure rates. It also appears in models, where individual voter preferences can be modeled as choices with differing probabilities of supporting a , aiding in the of outcomes from aggregated . In and , it supports sensitivity analyses in observational studies by accounting for varying treatment effects across units. A intuitive example is the outcome of flipping n biased coins, each with a potentially different bias p_i representing the probability of landing heads; the Poisson binomial distribution then gives the probability of observing exactly k heads. This setup highlights the distribution's flexibility in modeling real-world scenarios where uniformity in probabilities is unrealistic, providing a more accurate representation than the in such cases.

Historical background

The concept of the Poisson binomial distribution originated with Siméon Denis Poisson's work on probability in judicial contexts, where he modeled the aggregate outcome of independent trials with heterogeneous success probabilities, laying the groundwork for distributions beyond the homogeneous case. This early formulation addressed error laws in probabilistic judgments, marking the first recognition of sums of non-identical random variables. The terminology "Poisson binomial distribution" emerged to honor Poisson's pioneering contribution while distinguishing it from the standard involving identical trials; it became standardized in statistical literature by the mid-20th century. Key early developments included Wassily Hoeffding's 1956 inequalities, which quantified differences in spread between the Poisson binomial and . Lucien Le Cam advanced the field in 1960 with the first error bounds for Poisson approximations to the , facilitating practical use in theorems. By the 1970s, the gained recognition in for analyzing systems with components having distinct failure rates. Further progress in the late included H. Chen's 1978 bounds on normal approximations using early method techniques, enhancing convergence analysis. Post-2000 computational advances have enabled efficient exact evaluations, such as recursive algorithms by Chen and Liu (1997, extended in later works) and Fourier-based closed-form expressions by and Williams (2010), supporting applications in high-dimensional settings. These methods, alongside learning algorithms like those by Daskalakis et al. (2015), reflect the distribution's evolution into a computationally tractable tool.

Mathematical formulation

Probability mass function

The Poisson binomial distribution arises as the distribution of the sum X = \sum_{i=1}^n \xi_i, where the \xi_i are independent Bernoulli random variables with success probabilities p_i \in (0,1) for i = 1, \dots, n. The probability mass function (PMF) is given by P(X = k) = \sum_{\substack{A \subseteq \{1,\dots,n\} \\ |A|=k}} \left[ \prod_{i \in A} p_i \prod_{j \notin A} (1 - p_j) \right], for integers k = 0, 1, \dots, n, with support on the discrete set \{0, 1, \dots, n\}. This expression enumerates all subsets of size k and weights each by the product of the corresponding probabilities. In particular, P(X=0) = \prod_{i=1}^n (1 - p_i). When all p_i = p for some fixed p \in (0,1), the PMF simplifies to the form P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, recovering the standard . The PMF is unimodal, possessing either a single mode or two consecutive modes, regardless of the specific values of the p_i. Direct explicit computation of the PMF via the summation requires O(2^n) operations in the worst case, due to the exponential number of terms.

Characteristic function

The characteristic function of a random variable X following the Poisson binomial distribution with success probabilities p_1, p_2, \dots, p_n is defined as \phi_X(t) = \mathbb{E}[e^{itX}] for real t, and takes the explicit form \phi_X(t) = \prod_{i=1}^n \left(1 - p_i + p_i e^{it}\right). This expression follows directly from the independence of the underlying random variables X_i \sim \mathrm{Bernoulli}(p_i), as the of a sum of independent random variables is the product of their individual s, where each \phi_{X_i}(t) = 1 - p_i + p_i e^{it}. The logarithm of the provides a convenient additive structure: \log \phi_X(t) = \sum_{i=1}^n \log\left(1 - p_i + p_i e^{it}\right). This logarithmic form facilitates derivations in approximation theory and analysis, as the of X are obtained from the of \log \phi_X(t) at t = 0. A primary utility of \phi_X(t) lies in its role for inverting to the via the or methods, enabling efficient numerical computation of probabilities and the through algorithms. Additionally, since the moments of X are given by \mathbb{E}[X^k] = \frac{1}{i^k} \phi_X^{(k)}(0), where \phi_X^{(k)} denotes the k-th , the allows exact of all moments by evaluating these at t = 0.

Moments and central properties

Mean and variance

The Poisson binomial distribution arises as the distribution of X = \sum_{i=1}^n \xi_i, where the \xi_i are independent random variables with success probabilities p_i \in [0,1]. The of X is given by \mathbb{E}[X] = \sum_{i=1}^n p_i, which follows directly from the linearity of expectation and the fact that \mathbb{E}[\xi_i] = p_i for each i. This represents the total expected number of successes across the n heterogeneous trials. Due to the independence of the \xi_i, the variance of X is the sum of the individual variances, with no covariance terms: \text{Var}(X) = \sum_{i=1}^n p_i (1 - p_i). Each term p_i (1 - p_i) is the variance of the corresponding Bernoulli random variable \xi_i. This additive form highlights the distribution's structure as a sum of independent but non-identical components. For a fixed mean \mu = \sum_{i=1}^n p_i, the variance \text{Var}(X) = \mu - \sum_{i=1}^n p_i^2 is maximized when all p_i are equal to \bar{p} = \mu / n, reducing the Poisson binomial to the binomial distribution with variance \mu (1 - \mu / n). In this homogeneous case, the variance achieves its upper bound; when the p_i vary, \sum p_i^2 > \mu^2 / n (by the QM-AM inequality), yielding a strictly smaller variance that reflects reduced overall uncertainty compared to the equal-probability scenario.

Higher-order moments

The higher-order moments of the Poisson binomial distribution can be derived from its , which is the product over the individual components: M_X(t) = \prod_{i=1}^n (1 - p_i + p_i e^t). The k-th raw moment E[X^k] is then obtained as the k-th of M_X(t) evaluated at t=0. Since the X is a sum of independent random variables X_i, the moments can also be computed using expansions involving multinomial coefficients, though this becomes computationally intensive for large k and n. Recursive methods based on the product form of the moment-generating function allow for efficient calculation of these moments by building from lower-order derivatives. Cumulants provide a particularly useful representation for the Poisson binomial distribution because they add under summation of random variables. The cumulant-generating is K_X(t) = \log M_X(t) = \sum_{i=1}^n \log(1 - p_i + p_i e^t), so the m-th cumulant \kappa_m is the of the m-th cumulants of the individual distributions: \kappa_m = \sum_{i=1}^n \kappa_m^{(i)}, where \kappa_m^{(i)} is the m-th cumulant of X_i \sim \text{[Bernoulli](/page/Bernoulli)}(p_i). This additivity simplifies computations, as each \kappa_m^{(i)} can be found by differentiating \log(1 - p_i + p_i e^t) at t=0. For example, the first two cumulants are the \kappa_1 = \sum p_i and variance \kappa_2 = \sum p_i(1 - p_i), serving as base cases for higher orders. The third cumulant is \kappa_3 = \sum p_i (1 - p_i) (1 - 2p_i), leading to the \gamma_1 = \kappa_3 / \kappa_2^{3/2} = \left[ \sum p_i (1 - p_i) (1 - 2p_i) \right] / \left[ \sum p_i (1 - p_i) \right]^{3/2}. This is zero p_i = 1/2 for all i, corresponding to a symmetric case equivalent to a scaled . The fourth is \kappa_4 = \sum p_i (1 - p_i) [1 - 6 p_i (1 - p_i)], yielding the excess kurtosis \gamma_2 = \kappa_4 / \kappa_2^2 = \left[ \sum p_i (1 - p_i) [1 - 6 p_i (1 - p_i)] \right] / \left[ \sum p_i (1 - p_i) \right]^2. These measures quantify deviations from , with the excess kurtosis indicating tail heaviness relative to a Gaussian. For the identical p_i = p case (reducing to ), these simplify to \gamma_1 = (1 - 2p) / \sqrt{np(1-p)} and \gamma_2 = [1 - 6p(1-p)] / [np(1-p)], confirming the summation structure.

Information-theoretic properties

Entropy

The Shannon of a Poisson binomial random variable X = \sum_{i=1}^n B_i, where the B_i are independent random variables with success probabilities p_i, is defined as H(X) = -\sum_{k=0}^n P(X = k) \log P(X = k), with the P(X = k) obtained via of the individual distributions. When all p_i = p, the distribution reduces to the \operatorname{Bin}(n, p), and the entropy simplifies to that of the , which admits series expansions and representations for computation. In the general case with heterogeneous p_i, the independence of the B_i implies H(X) \leq \sum_{i=1}^n h(p_i), where h(p) = -p \log p - (1-p) \log(1-p) is the (in ); this follows from the applied to the sum as a function of the joint (B_1, \dots, B_n). The H(X) is maximized when all p_i = 1/2, corresponding to the uniform \operatorname{Bin}(n, 1/2); in this case, H(X) measures the information content of n flips reduced by dependencies in the sum. For a fixed \mu = \sum_{i=1}^n p_i, the H(X) is upper bounded by that of the \operatorname{Bin}(n, \mu/n), with equality when all p_i = \mu/n. An upper bound on the individual entropies gives \sum_{i=1}^n h(p_i) \leq n \log 2, with equality when each p_i = 1/2. The function is in the vector (p_1, \dots, p_n), a result resolving a long-standing . Exact computation of H(X) is intensive, as it requires the full PMF, which can be calculated recursively in O(n^2) time or via in O(n \log n) time, but remains challenging for large n. For practical purposes, approximations leverage the , where X converges to a with mean \mu = \sum p_i and variance \sigma^2 = \sum p_i (1 - p_i), yielding H(X) \approx \frac{1}{2} \log (2 \pi e \sigma^2).

Mutual information aspects

The mutual information between the Poisson binomial random variable X = \sum_{i=1}^n Y_i and the vector \mathbf{Y} = (Y_1, \dots, Y_n) of independent Bernoulli random variables Y_i \sim \text{Bern}(p_i) is I(X; \mathbf{Y}) = H(X) - H(X \mid \mathbf{Y}). Since X is a deterministic function of \mathbf{Y}, the conditional entropy H(X \mid \mathbf{Y}) = 0, yielding I(X; \mathbf{Y}) = H(X). For a subset S \subseteq \{1, \dots, n\}, the mutual information I(X; \mathbf{Y}_S) quantifies the dependence between X and the subvector \mathbf{Y}_S = (Y_i)_{i \in S}. By the chain rule and independence of the Bernoulli components, H(X \mid \mathbf{Y}_S) = H\left( \sum_{i \notin S} Y_i \right), the entropy of the Poisson binomial sum over the complement S^c. Thus, I(X; \mathbf{Y}_S) = H(X) - H\left( \sum_{i \notin S} Y_i \right). Exact computation requires evaluating the entropies of the relevant Poisson binomial sums using recursive or numerical methods. A key property is that the total mutual information equals the marginal entropy, I(X; \mathbf{Y}) = H(X), reflecting complete information transfer from \mathbf{Y} to X. Subset mutual informations enable analysis of partial dependencies, which is valuable in for models with heterogeneous success probabilities, as maximizing I(X; \mathbf{Y}_S) identifies subsets of trials most predictive of the aggregate outcome.

Concentration inequalities

Chernoff bounds

The Chernoff bounds offer powerful exponential for the tail probabilities of a Poisson binomial X = \sum_{i=1}^n X_i, where the X_i are Bernoulli random variables with success probabilities p_i \in [0,1] and mean \mu = \mathbb{E}[X] = \sum_{i=1}^n p_i. These bounds exploit the (MGF) of X, given by M_X(t) = \mathbb{E}[e^{tX}] = \prod_{i=1}^n (1 - p_i + p_i e^t) for t \in \mathbb{R}. For the upper tail, applied to e^{tX} yields, for any t > 0, P(X \geq k) \leq e^{-t k} M_X(t) = e^{-t k} \prod_{i=1}^n (1 - p_i + p_i e^t). The is then obtained by taking the infimum over t > 0, providing a tight rate specific to the heterogeneous p_i. A similar argument applies to the lower tail by considering P(X \leq k) \leq e^{-t k} M_X(t) for t < 0. This method, introduced by Chernoff in 1952 for sums of observations, applies directly to the Poisson binomial case due to the and the multiplicative structure of the MGF. For the multiplicative upper tail deviation, setting k = (1 + \delta) \mu with \delta > 0 gives P(X \geq (1 + \delta) \mu) \leq \inf_{t > 0} e^{-t (1 + \delta) \mu} \prod_{i=1}^n (1 - p_i + p_i e^t). When all p_i = p (reducing to the binomial case), optimizing t = \log(1 + \delta) yields the closed-form bound P(X \geq (1 + \delta) \mu) \leq \exp(-\mu D(1 + \delta \| 1)), where D(q \| p) = q \log(q/p) + (1 - q) \log((1 - q)/(1 - p)) is the Kullback-Leibler divergence. In the heterogeneous case, the product form generally provides a tighter bound than approximating with the average p = \mu/n, as the varying p_i allow for a more refined optimization of the infimum; for instance, if some p_i are near 0 or 1, the effective concentration improves beyond the uniform-p assumption. An approximate exponent for small \delta is -\sum_{i=1}^n p_i \log(1 + \delta), reflecting the contribution of each term. Additive forms of the , adapted via Hoeffding or techniques, bound deviations |X - \mu| \geq \epsilon. The Hoeffding variant, applicable since each X_i \in [0,1], states P(|X - \mu| \geq \epsilon) \leq 2 \exp\left( -\frac{2 \epsilon^2}{n} \right) for \epsilon > 0, which depends only on n and not the specific p_i. A tighter -style bound uses the variance \sigma^2 = \sum_{i=1}^n p_i (1 - p_i) \leq \mu and the bounded range, yielding for the upper tail P(X \geq \mu + \epsilon) \leq \exp\left( -\frac{\epsilon^2}{2 \sigma^2 + (2/3) \epsilon} \right), with a symmetric form for the lower tail; this is sharper when \sigma^2 \ll n/4 (e.g., when p_i are extreme), outperforming Hoeffding by incorporating the actual spread. These additive bounds, derived from MGF expansions, are particularly useful for deviations on the scale of \sqrt{\mu}.

Other tail bounds

The Hoeffding bound provides a sub-Gaussian concentration inequality for the Poisson binomial random variable X = \sum_{i=1}^n Y_i, where the Y_i are independent Bernoulli random variables with parameters p_i \in [0,1] and mean \mu = \sum p_i. Specifically, P(|X - \mu| \geq t) \leq 2 \exp\left( -\frac{2t^2}{n} \right) for any t > 0. This bound holds independently of the individual p_i values, relying only on the boundedness of the Y_i in [0,1]. The Berry–Esseen theorem offers a quantitative bound on the uniform approximation of the cumulative distribution function (CDF) of the standardized Poisson binomial variable by the standard normal CDF. Let \sigma^2 = \sum p_i (1 - p_i) denote the variance. Then, \max_{0 \leq k \leq n} \left| P(X \leq k) - \Phi\left( \frac{k - \mu}{\sigma} \right) \right| \leq \frac{0.7915}{\sigma}, where \Phi is the standard normal CDF. The O(1/\sqrt{n}) order follows from \sigma = \Theta(\sqrt{n}) under mild conditions on the p_i. This result depends on the variance and incorporates the third central moment in the general Berry-Esseen framework for non-identical distributions. The bound improves on classical constants and is derived using Fourier analytic techniques for non-identical distributions. Sanov-type large deviation principles describe the exponential decay of tail probabilities for the Poisson binomial distribution, analogous to empirical measure deviations but adapted to heterogeneous Bernoulli trials. For the event X \approx k with empirical proportion q = k/n, the rate function is given by I(k) = \sum_{i=1}^n \left[ p_i \log \frac{p_i}{q} + (1 - p_i) \log \frac{1 - p_i}{1 - q} \right], which is the sum of Kullback–Leibler divergences D_{\text{KL}}(\text{Bern}(p_i) \| \text{Bern}(q)). This yields asymptotic tail bounds of the form P(X \geq k) \approx \exp(-I(k)) for large deviations above the mean, with similar forms for lower tails by symmetry adjustments. The structure arises from tilting the product measure to match the target mean k, providing a precise exponential rate beyond uniform moment-generating function methods. Recent works since 2015 have developed improved tail bounds tailored to sparse regimes, where many p_i are near 0 or 1, leading to distributions supported on a small number of points relative to n. In such cases, standard bounds like Hoeffding's can be loose due to overestimation of variance; specialized inequalities exploit the effective sparsity (e.g., via arguments on the ) to achieve tighter exponential rates, often reducing the constant factors by incorporating the actual third moments or using recursive decompositions. These refinements are particularly useful in high-dimensional reliability models with heterogeneous failure probabilities clustered near extremes.

Approximations and relations

Binomial approximation

The Poisson binomial distribution arises as the law of the sum of independent but non-identically distributed Bernoulli random variables with success probabilities p_1, p_2, \dots, p_n. When these probabilities are similar, the distribution can be effectively approximated by a binomial distribution \operatorname{Bin}(n, \bar{p}), where \bar{p} = n^{-1} \sum_{i=1}^n p_i is the average probability. This approximation is justified by the matching means—both have expectation n \bar{p}—and nearly matching variances when the p_i exhibit low variability. Specifically, the variance of the Poisson binomial is \sum p_i (1 - p_i ) = n \bar{p} (1 - \bar{p}) - n \sigma_p^2, where \sigma_p^2 = n^{-1} \sum (p_i - \bar{p})^2 is the sample variance of the p_i; thus, the variances coincide exactly when \sigma_p^2 = 0. The accuracy of the is quantified by the distance d_{\mathrm{TV}}, defined as d_{\mathrm{TV}}(P, Q) = \sup_A |P(A) - Q(A)|, where the supremum is over measurable sets A. A bound arises from a construction: couple each (p_i) with an independent (\bar{p}), yielding d_{\mathrm{TV}}(\law(S), \operatorname{Bin}(n, \bar{p})) \leq \sum_{i=1}^n |p_i - \bar{p}|, since the total variation distance between individual marginals is |p_i - \bar{p}| and the applies to the sums. This bound highlights that the error vanishes as the p_i converge to \bar{p}. For clustered probabilities, the error simplifies to O(\max_i |p_i - \bar{p}|). More refined upper and lower bounds for d_{\mathrm{TV}} are provided by Ehm using the Stein-Chen method, establishing equivalence to the Kolmogorov distance in this context. The approximation performs well under the condition that the variance of the p_i is small relative to the binomial variance, specifically \sigma_p^2 / [\bar{p}(1 - \bar{p})] \ll 1. This ensures not only close variances but also alignment of higher moments for moderate n, making the binomial a practical when the trials are nearly homogeneous, such as in reliability models with similar component rates. The Krawtchouk polynomial expansion further refines the approximation by representing the Poisson binomial as the binomial plus signed measures, with error terms controlled by deviations from \bar{p}.

Poisson and normal approximations

The Poisson binomial distribution arises as the sum X = \sum_{i=1}^n X_i of independent random variables X_i with success probabilities p_i, and it admits a approximation under conditions suitable for modeling . Specifically, when \max_i p_i \to 0 and n \to \infty such that the mean \lambda = \sum_{i=1}^n p_i remains fixed, the distribution of X is well-approximated by a with parameter \lambda. This regime corresponds to scenarios where events are infrequent and heterogeneous probabilities are small, allowing the to capture the discreteness and rarity effectively. The accuracy of this approximation is quantified using the Stein-Chen method, which provides an explicit error bound in distance: d_{\mathrm{TV}}( \mathcal{L}(X), \mathrm{Po}(\lambda) ) \leq \min\left\{1, \frac{1}{\lambda}\right\} \sum_{i=1}^n p_i^2. This bound, derived for the independent case, highlights that the error decreases as the p_i become smaller, since \sum p_i^2 \leq (\max p_i) \lambda \to 0. The method originates from Chen's 1975 work on Poisson approximation for sums of dependent indicators, which extended and sharpened Stein's approach originally developed for approximations to dependent variables. In contrast, for large n where the variance \sigma^2 = \sum_{i=1}^n p_i(1 - p_i) \gg 1, the Poisson binomial distribution converges to a N(\mu, \sigma^2) with \mu = \sum_{i=1}^n p_i and the variance given above, by the Lindeberg . This holds because the individual variances p_i(1-p_i) \leq 1/4 are uniformly bounded, satisfying the Lindeberg condition for sums of independent non-identical random variables. The normal approximation is particularly effective in the central region, providing a continuous surrogate for large-scale heterogeneous trials. To improve accuracy beyond the basic normal form, especially accounting for asymmetry, an Edgeworth expansion incorporates higher-order cumulants such as skewness for refinement. The leading correction term involves the third cumulant \kappa_3 = \sum_{i=1}^n p_i (1 - p_i) (1 - 2 p_i), yielding an expansion of the form P\left( \frac{X - \mu}{\sigma} \leq x \right) = \Phi(x) - \frac{\kappa_3}{6 \sigma^3} (x^2 - 1) \phi(x) + O\left( \frac{1}{\sigma^2} \right), where \Phi and \phi are the standard normal CDF and PDF, respectively; this skewness adjustment is valuable when the p_i vary moderately.

Computational approaches

Exact methods

Exact methods for computing the (PMF) and (CDF) of the Poisson binomial distribution rely on direct evaluation of its defining structure as the sum of independent random variables with heterogeneous success probabilities p_1, p_2, \dots, p_n. These approaches guarantee precision without approximation error, making them suitable for moderate n (typically n \leq 100), though their computational cost grows with n. Key techniques include dynamic programming, expansions via , and (DFT) inversion of the . Dynamic programming provides an efficient recursive algorithm to compute the exact PMF. Define P_k(j) as the probability that the sum of the first k trials equals j, with initial conditions P_0(0) = 1 and P_0(j) = 0 for j \neq 0. The is P_k(j) = (1 - p_k) P_{k-1}(j) + p_k P_{k-1}(j-1), for k = 1, \dots, n and j = 0, \dots, k, where terms with negative indices are zero. This builds the full PMF P_n(j) iteratively, requiring O(n^2) time and O(n) space with careful implementation. The CDF follows by cumulative summation. This method originated in reliability analysis during the , with early recursive formulations appearing in works like those of Barlow and Heidtmann (1984). A refined version was presented by Chen and Liu (), emphasizing its exactness for the PMF. The probability generating function (PGF) offers another exact route through coefficient extraction. The PGF of the Poisson binomial distribution is the product G(x) = \prod_{i=1}^n (1 - p_i + p_i x), where the coefficient of x^j in G(x) is precisely P_n(j), the PMF at j. To compute these coefficients exactly, repeated convolution of the individual Bernoulli PGFs can be used, which aligns closely with the dynamic programming approach and also achieves O(n^2) time complexity via sequential multiplication and truncation. This method leverages the convolutional structure inherent to the sum of independent variables and has been detailed in foundational treatments since the 1990s. For faster computation when evaluating the PMF or CDF at multiple points, the exploits the . The is \phi(t) = \prod_{i=1}^n (1 - p_i + p_i e^{it}), and the PMF can be recovered via inversion: P_n(j) = \frac{1}{2\pi} \int_{-\pi}^{\pi} e^{-itj} \phi(t) \, dt. Discretizing this integral over a grid of m = n+1 points and applying the DFT yields an exact for the PMF, as derived by and Williams (2010). Using the (FFT) for the DFT enables full PMF computation in O(n \log n) time. The CDF can then be obtained by summation. This approach was advanced for practical use by (2013), who introduced the DFT-characteristic function (DFT-CF) method for efficient CDF evaluation. More recent methods, such as ShiftConvolvePoibin (Babbi et al., 2021), enable fast exact tail computations using shifts. These methods trace their computational foundations to the in dynamic programming applications, with significant refinements in the and through Fourier techniques. Modern implementations appear in statistical software post-2010, such as the poibin (, 2013), which incorporates recursive and DFT-based exact routines, and the Python library poibin (Straka, 2016), which provides analogous functions for PMF and CDF computation.

Approximation algorithms

Approximation algorithms for the Poisson binomial distribution are essential for computing the (PMF) or (CDF) when the number of trials n is large, as exact methods become computationally prohibitive. These methods trade precision for efficiency, providing controlled error bounds suitable for applications requiring rapid evaluation. The saddlepoint approximation refines the normal approximation by utilizing the cumulant generating function (CGF) of the distribution, defined as K(t) = \sum_{i=1}^n \log(1 - p_i + p_i e^t), where p_i are the success probabilities. To approximate the PMF at a point k, one solves for the saddlepoint t satisfying K'(t) = k, then substitutes into the saddlepoint approximation for the PMF: \hat{f}(k) \approx \frac{1}{\sqrt{2\pi K''(\hat{t})}} \exp\left(K(\hat{t}) - \hat{t} k\right), with the CDF approximated via integration or related expressions; this yields a relative error of O(1/n). Advances in the 2000s have extended saddlepoint methods to reliability engineering, enabling scalable computations for systems up to n=10^6 components at O(n) time per query by efficient numerical solution of the saddlepoint equation. Recursive convolution approximates the distribution by iteratively computing partial convolutions of the Bernoulli PMFs, starting from the first trial and accumulating products up to a desired precision threshold, with base complexity O(n^2) that can be reduced via early stopping when contributions fall below an error tolerance. This approach leverages the convolution property of independent sums, approximating the full PMF by truncating the recursion at intermediate stages for large n. Monte Carlo simulation estimates the PMF or CDF by generating numerous independent samples of the sum S = \sum_{i=1}^n X_i, where each X_i \sim \text{[Bernoulli](/page/Bernoulli)}(p_i), and constructing histograms or empirical from the samples; for instance, the PMF at k is approximated as the proportion of samples equaling k. Variance reduction via reweights samples from a tilted distribution to focus on regions of interest, improving efficiency for tail probabilities. The saddlepoint and normal approximations serve as bases for initializing these simulations or bounding errors.

Applications

Reliability engineering

In reliability engineering, the Poisson binomial distribution models the number of successes in a of but heterogeneous components, each with its own reliability probability p_i, enabling precise assessment of overall when component reliabilities differ. This is particularly valuable for systems where assuming identical reliabilities, as in the , would be unrealistic. For a , which fails if at least one component , the system reliability is the probability that all n components succeed, given by P(X = n), where X follows the Poisson binomial distribution. Consequently, the probability of system is $1 - P(X = n). This allows engineers to compute exact unreliability metrics for systems like electrical circuits or structural chains, where each link has distinct probabilities based on material or environmental factors. In k-out-of-n systems, which operate if at least k components function, the system reliability is P(X \geq k) = \sum_{j=k}^n P(X = j), with probabilities derived from the Poisson binomial distribution. Such systems are common in redundant designs, such as aircraft control units or power grids, and computations often rely on dynamic programming techniques for efficiency. Bounds and stochastic orderings further aid in comparing system reliabilities under varying component parameters. Maintenance models for degrading components incorporate time-varying reliabilities p_i(t), reflecting wear, environmental exposure, or operational stress over time. The Poisson binomial distribution then predicts cumulative system failures by summing independent outcomes adjusted for each component's evolving state. For example, in forecasting field failures for a fleet of high-voltage power transformers under left-truncated and right-censored data from staggered installations and warranties, the total number of failures follows a Poisson binomial distribution, supporting decisions on schedules and replacements.

Actuarial science

In , the Poisson binomial distribution is used to model aggregate insurance claims from heterogeneous policyholders, each with distinct claim probabilities. This allows for more accurate and reserving by accounting for variability in profiles, unlike the uniform assumptions in models. For instance, it supports the calculation of ruin probabilities in portfolios with diverse risks.

Finance

The distribution applies to in assessing default risk in credit portfolios, where each or has its own default probability p_i. The number of defaults follows a Poisson binomial, aiding in value-at-risk computations and under heterogeneous conditions.

Machine learning

In , the Poisson binomial arises in distribution learning from heterogeneous data sources, such as methods where base classifiers have varying accuracies. It facilitates better and calibration in models dealing with non-i.i.d. outcomes.

Statistical inference

Statistical inference for the Poisson binomial distribution involves estimating the success probabilities p_i for i = 1, \dots, n independent trials, testing hypotheses about these parameters, and constructing intervals, typically based on multiple independent observations of the sum S = \sum_{i=1}^n X_i. The of S is a that sums over $2^n terms, complicating direct likelihood evaluation and optimization, especially when individual trial outcomes X_i are unobserved and only the aggregate sum is available. Maximum likelihood estimation (MLE) of the p_i parameters is challenging due to the intractable , but numerical methods such as the or can be employed to approximate the MLE, particularly with grouped or incomplete data where trial-level details are unavailable. The iteratively imputes the missing trial outcomes in the E-step and updates parameter estimates in the M-step to maximize the expected complete-data log-likelihood. For instance, in applications like list experiments, where responses are aggregated to protect privacy, the facilitates MLE by treating unobserved individual responses as latent variables. , or other optimization techniques, can directly maximize the observed-data likelihood when feasible, though may require careful initialization to avoid local optima. The method of moments provides a simpler approach by equating the sample to the theoretical \mu = \sum_{i=1}^n p_i, yielding an \hat{\mu} = \bar{S} for the total success probability ; however, individual p_i remain underidentified without additional constraints or higher moments, as infinitely many p_i vectors produce the same and variance. The sample variance can estimate \sigma^2 = \sum_{i=1}^n p_i (1 - p_i), but solving for unique p_i requires further assumptions, limiting this method to aggregate parameters. Hypothesis testing in the Poisson binomial model often focuses on homogeneity of the p_i, such as testing H_0: p_1 = \dots = p_n = p (reducing to ) against the alternative of heterogeneous probabilities. Likelihood ratio tests compare the maximized likelihood under the full Poisson binomial model to that under the constrained binomial model, with the test statistic approximately following under H_0, though boundary issues may necessitate adjustments for small samples. Bootstrap methods are particularly useful for constructing confidence intervals on the (CDF) of S, by resampling the observed sums to approximate the of the empirical CDF and deriving intervals that account for parameter uncertainty. Bayesian approaches to , prominent in the , often employ Dirichlet priors on the of p_i to incorporate and handle the high-dimensional , enabling posterior via (MCMC) methods. The Dirichlet is conjugate to the multinomial likelihood arising from the complete data, facilitating updates when treating the unobserved trials as latent; for example, in aggregate survey analysis, a Dirichlet-Poisson-binomial model corrects for by integrating over the . However, in high dimensions (n > 100), faces significant challenges, including computational intractability of the posterior due to the in model complexity, sensitivity to specification, and difficulties in achieving MCMC convergence, often requiring or approximate methods like variational .

References

  1. [1]
    The Poisson Binomial Distribution— Old & New - Project Euclid
    This is an expository article on the Poisson binomial distribution. We review lesser known results and recent progress on this topic.<|control11|><|separator|>
  2. [2]
    On computing the distribution function for the Poisson binomial ...
    The Poisson binomial distribution describes the distribution of the sum of independent and non-identically distributed random indicators. Each indicator is a ...
  3. [3]
    [PDF] The Poisson Binomial Distribution-Old & New - Columbia University
    This is an expository article on the Poisson binomial distribution. We review lesser known results and recent progress on this topic, includ- ing geometry of ...
  4. [4]
    [PDF] JIA 85 (1959) 0165-0210
    with. This result is also due to Poisson (1837, pp. 65–6) and has been ... (Poisson Binomial) distribution is. AJ. 12. (3). (4). Page 8. 172 Evidence for the ...Missing: origin | Show results with:origin
  5. [5]
    [PDF] arXiv:1908.10024v1 [math.PR] 27 Aug 2019
    Aug 27, 2019 · The aim of this paper is to provide a guide to lesser known results and recent progress of the Poisson binomial distribution, mostly post. 2000.
  6. [6]
    [PDF] Using Poisson Binomial GLMs to Reveal Voter Preferences - arXiv
    Feb 4, 2018 · The application of the Poisson binomial distribution to the gen- eralized linear model setting has been discussed by Chen and Liu [6], who ...
  7. [7]
    Sparse covers for sums of indicators | Probability Theory and ...
    Nov 2, 2014 · 1.4 Related work. It is believed that Poisson [22] was the first to study the Poisson Binomial distribution, hence its name. Sometimes the ...
  8. [8]
  9. [9]
    On the number of successes in independent trials
    Y. H. Wang. Concordia University. Abstract: A unified combinatorial approach is used to obtain many theorems about S n, the number of successes in n ...
  10. [10]
  11. [11]
  12. [12]
    Bernoulli distribution | Properties, proofs, exercises - StatLect
    The Bernoulli distribution models binary outcomes, where a random variable takes 1 for success (probability p) and 0 for failure (probability 1-p).
  13. [13]
    Cumulant generating function | Formula, derivatives, proofs - StatLect
    The cumulant generating function of a random variable is the natural logarithm of its moment generating function.
  14. [14]
    A proof of the Shepp-Olkin entropy concavity conjecture
    We prove the Shepp–Olkin conjecture, which states that the entropy of the sum of independent Bernoulli random variables is concave in the parameters of the ...
  15. [15]
    Entropy of the Sum of Independent Bernoulli Random Variables and ...
    ABSTRACT. For sums of independent Bernoulli random variables and for the multinomial distribution it is shown that the entropy h gives a measure of the degree ...Missing: Shepp Olkin
  16. [16]
    [PDF] On the Entropy of Sums of Bernoulli Random Variables via the Chen ...
    Sep 21, 2012 · from an approximation of this entropy by the entropy of a ... Le Cam, “An approximation theorem for the Poisson binomial distribution,” Pacific ...
  17. [17]
    [PDF] Expressions for the Entropy of Binomial-Type Distributions - arXiv
    Jul 21, 2018 · Moreover, understanding the entropy of binomial and Poisson distributions is a key step towards a characterization of the capacity of basic ...
  18. [18]
    A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based ...
    A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Herman Chernoff.
  19. [19]
    [PDF] Chernoff bounds, and some applications 1 Preliminaries
    Feb 21, 2015 · We will start with the statement of the bound for the simple case of a sum of independent Bernoulli trials, i.e. the case in which each random ...
  20. [20]
    [PDF] Old and New Concentration Inequalities - UCSD Math
    This covers the case when the limit distribution is Poisson as well as normal. There are many variations of the Chernoff inequalities.<|control11|><|separator|>
  21. [21]
    [PDF] The Poisson Binomial Distribution-Old & New - NSF-PAR
    The aim of this paper is to provide a guide to lesser known results and recent progress of the Poisson binomial distri- bution, mostly post 2000. The rest of ...
  22. [22]
    Improved Inequalities for the Poisson and Binomial Distribution and ...
    Dec 23, 2013 · ... Chernoff exponential bound for the Poisson upper tail CDF by substitution of x with . ... Extension of (29) to cover the Poisson-Binomial ...
  23. [23]
    Binomial approximation to the Poisson binomial distribution
    The proof uses the Stein—Chen technique. Equivalence of the total variation and the Kolmogorov distance is established, and an application to sampling with and ...
  24. [24]
    An approximation theorem for the Poisson binomial distribution.
    An approximation theorem for the Poisson binomial distribution. Lucien Le Cam. DOWNLOAD PDF + SAVE TO MY LIBRARY. Pacific J. Math. 10(4): 1181-1197 (1960).
  25. [25]
    Binomial Approximation to the Poisson Binomial Distribution
    The Poisson binomial distribution is approximated by a binomial distribution and also by finite signed measures resulting from the corresponding Krawtchouk ...
  26. [26]
    Poisson Approximation for Dependent Trials - Project Euclid
    This paper establishes a general method of obtaining and bounding the error in approximating the distribution of ∑ni=1Xi ∑ i = 1 n X i by the Poisson ...
  27. [27]
    [PDF] Poisson Approximation and the Chen-Stein Method - USC Dornsife
    In this section, we will state three Poisson approx- imation theorems, each giving bounds in terms of the total variation distance between two distributions.
  28. [28]
    [PDF] STEIN-CHEN METHOD FOR POISSON APPROXIMATION
    Stein-Chen method is a powerful, modern technique which extends classical Poisson approximation results such as Poisson's law of small numbers, even to cases ...
  29. [29]
    [PDF] Multivariate Poisson–Binomial approximation using Stein's method ...
    Abstract. The paper is concerned with the accuracy in total variation of the approximation of the distribution of a sum of independent Bernoulli distributed.
  30. [30]
    tsakim/poibin: Poisson Binomial Probability Distribution for Python
    The module contains a Python implementation of functions related to the Poisson Binomial probability distribution.Missing: exact post- 2010
  31. [31]
    Ml estimation in the poisson binomial distribution with grouped data ...
    The maximum likelihood estimation of parameters of the Poisson binomial distribution, based on a sample with exact and grouped observations, is considered ...
  32. [32]
    The Likelihood Ratio Test for Poisson versus Binomial Distributions
    We approach it as a “boundary value” estimation and testing problem, where the boundary N = ∞ corresponds to a Poisson distribution for the data, whereas N < ∞ ...
  33. [33]
    Bias correction and Bayesian analysis of aggregate counts in SAGE ...
    Feb 3, 2010 · Dirichlet-Poisson-Binomial estimates with the flat prior. ... Given the value of r and assuming a Dirichlet prior on the unknown initial ...
  34. [34]
    [PDF] High-Dimensional Poisson Structural Equation Model Learning via ...
    Abstract. In this paper, we develop a new approach to learning high-dimensional Poisson structural equation models from only observational data without ...Missing: challenges | Show results with:challenges