Fact-checked by Grok 2 weeks ago

Poisson binomial distribution

The Poisson binomial distribution is a discrete probability distribution that arises as the sum of a fixed number of independent Bernoulli random variables, each with its own distinct success probability p_i (where $0 \leq p_i \leq 1), generalizing the classical binomial distribution where all probabilities are equal.^[1]^[2] Named after the French mathematician Siméon Denis Poisson, who introduced it in 1837 as a heterogeneous extension of the binomial model for analyzing jury decisions and moral probabilities, the distribution has since been studied for its combinatorial and probabilistic properties, with modern surveys highlighting its connections to polynomial geometry and approximation theory.^[3]^[4] The probability mass function for a Poisson binomial random variable X = \sum_{i=1}^n Y_i, where each Y_i \sim \text{Bernoulli}(p_i), is given by
P(X = k) = \sum_{\substack{A \subseteq \{1, \dots, n\} \\ |A| = k}} \prod_{i \in A} p_i \prod_{i \notin A} (1 - p_i),
for k = 0, 1, \dots, n, reflecting all subsets of successes.^[1] Its mean is \mu = \sum_{i=1}^n p_i and variance is \sigma^2 = \sum_{i=1}^n p_i(1 - p_i), both straightforward sums unlike the uniform case.^[3] The distribution exhibits the strong Rayleigh property, meaning its probability generating function has only real, negative roots, which aids in bounding tail probabilities and stochastic comparisons.^[1] For large n, it can be approximated by a normal distribution with the above mean and variance, or by a Poisson distribution under rare-event conditions (small p_i), with error bounds established in classical results like Le Cam's theorem.^[2] Computation of the exact distribution function is nontrivial due to the exponential number of terms but can be efficiently handled via discrete Fourier transforms of the characteristic function.^[2] Applications span reliability engineering (e.g., system failure probabilities with heterogeneous components), survey sampling (e.g., nonresponse adjustments), actuarial science (e.g., insurance claim aggregates), finance (e.g., default risk portfolios), and machine learning (e.g., distribution learning from heterogeneous data).^[3]^[2]

Introduction

Overview

The Poisson binomial distribution describes the probability distribution of the number of successes in a sequence of n independent Bernoulli trials, where each trial i has its own distinct success probability p_i \in [0,1].^[5] This contrasts with the standard binomial distribution, which assumes identical success probabilities across all trials.^[5] Formally, if X = \sum_{i=1}^n \xi_i where each \xi_i is a Bernoulli random variable with parameter p_i, then X follows a Poisson binomial distribution, denoted X \sim \text{PB}(p_1, \dots, p_n).^[5] This distribution arises naturally in contexts involving heterogeneous probabilities, such as reliability analysis in engineering and quality control processes where components may have varying failure rates.^[5] It also appears in voting models, where individual voter preferences can be modeled as independent binary choices with differing probabilities of supporting a candidate, aiding in the inference of election outcomes from aggregated data.^[6] In machine learning and causal inference, it supports sensitivity analyses in observational studies by accounting for varying treatment effects across units.^[5] A intuitive example is the outcome of flipping n biased coins, each with a potentially different bias p_i representing the probability of landing heads; the Poisson binomial distribution then gives the probability of observing exactly k heads.^[5] This setup highlights the distribution's flexibility in modeling real-world scenarios where uniformity in probabilities is unrealistic, providing a more accurate representation than the binomial approximation in such cases.^[5]

Historical background

The concept of the Poisson binomial distribution originated with Siméon Denis Poisson's 1837 work on probability in judicial contexts, where he modeled the aggregate outcome of independent trials with heterogeneous success probabilities, laying the groundwork for distributions beyond the homogeneous binomial case. This early formulation addressed error laws in probabilistic judgments, marking the first recognition of sums of non-identical Bernoulli random variables.^[1] The terminology "Poisson binomial distribution" emerged to honor Poisson's pioneering contribution while distinguishing it from the standard binomial distribution involving identical trials; it became standardized in statistical literature by the mid-20th century.^[1] Key early developments included Wassily Hoeffding's 1956 inequalities, which quantified differences in spread between the Poisson binomial and binomial distributions. Lucien Le Cam advanced the field in 1960 with the first error bounds for Poisson approximations to the distribution, facilitating practical use in limit theorems. By the 1970s, the distribution gained recognition in reliability theory for analyzing systems with components having distinct failure rates. Further progress in the late 20th century included Louis H. Chen's 1978 bounds on normal approximations using early Stein method techniques, enhancing convergence analysis.^[1] Post-2000 computational advances have enabled efficient exact evaluations, such as recursive algorithms by Chen and Liu (1997, extended in later works) and Fourier-based closed-form expressions by Fernández and Williams (2010), supporting applications in high-dimensional settings. These methods, alongside learning algorithms like those by Daskalakis et al. (2015), reflect the distribution's evolution into a computationally tractable tool.

Mathematical formulation

Probability mass function

The Poisson binomial distribution arises as the distribution of the sum X = \sum_{i=1}^n \xi_i, where the \xi_i are independent Bernoulli random variables with success probabilities p_i \in (0,1) for i = 1, \dots, n.^[3] The probability mass function (PMF) is given by

P(X = k) = \sum_{\substack{A \subseteq \{1,\dots,n\} \\ |A|=k}} \left[ \prod_{i \in A} p_i \prod_{j \notin A} (1 - p_j) \right],

for integers k = 0, 1, \dots, n, with support on the discrete set \{0, 1, \dots, n\}.^[7] This expression enumerates all subsets of size k and weights each by the product of the corresponding probabilities. In particular, P(X=0) = \prod_{i=1}^n (1 - p_i).^[7] When all p_i = p for some fixed p \in (0,1), the PMF simplifies to the binomial form P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, recovering the standard binomial distribution.^[7] The PMF is unimodal, possessing either a single mode or two consecutive modes, regardless of the specific values of the p_i.^[8] Direct explicit computation of the PMF via the summation requires O(2^n) operations in the worst case, due to the exponential number of terms.^[7]

Characteristic function

The characteristic function of a random variable X following the Poisson binomial distribution with success probabilities p_1, p_2, \dots, p_n is defined as \phi_X(t) = \mathbb{E}[e^{itX}] for real t, and takes the explicit form

\phi_X(t) = \prod_{i=1}^n \left(1 - p_i + p_i e^{it}\right).

This expression follows directly from the independence of the underlying Bernoulli random variables X_i \sim \mathrm{Bernoulli}(p_i), as the characteristic function of a sum of independent random variables is the product of their individual characteristic functions, where each \phi_{X_i}(t) = 1 - p_i + p_i e^{it}. The logarithm of the characteristic function provides a convenient additive structure:

\log \phi_X(t) = \sum_{i=1}^n \log\left(1 - p_i + p_i e^{it}\right).

This logarithmic form facilitates derivations in approximation theory and cumulant analysis, as the cumulants of X are obtained from the derivatives of \log \phi_X(t) at t = 0. A primary utility of \phi_X(t) lies in its role for inverting to the probability mass function via the Fourier inversion theorem or discrete Fourier transform methods, enabling efficient numerical computation of probabilities and the cumulative distribution function through fast Fourier transform algorithms. Additionally, since the moments of X are given by \mathbb{E}[X^k] = \frac{1}{i^k} \phi_X^{(k)}(0), where \phi_X^{(k)} denotes the k-th derivative, the characteristic function allows exact extraction of all moments by evaluating these derivatives at t = 0.

Moments and central properties

Mean and variance

The Poisson binomial distribution arises as the distribution of X = \sum_{i=1}^n \xi_i, where the \xi_i are independent Bernoulli random variables with success probabilities p_i \in [0,1]. The mean of X is given by

\mathbb{E}[X] = \sum_{i=1}^n p_i,

which follows directly from the linearity of expectation and the fact that \mathbb{E}[\xi_i] = p_i for each i.^[9] This mean represents the total expected number of successes across the n heterogeneous trials.^[10] Due to the independence of the \xi_i, the variance of X is the sum of the individual variances, with no covariance terms:

\text{Var}(X) = \sum_{i=1}^n p_i (1 - p_i).

Each term p_i (1 - p_i) is the variance of the corresponding Bernoulli random variable \xi_i.^[9] This additive form highlights the distribution's structure as a sum of independent but non-identical components.^[10] For a fixed mean \mu = \sum_{i=1}^n p_i, the variance \text{Var}(X) = \mu - \sum_{i=1}^n p_i^2 is maximized when all p_i are equal to \bar{p} = \mu / n, reducing the Poisson binomial to the binomial distribution with variance \mu (1 - \mu / n).^[10] In this homogeneous case, the variance achieves its upper bound; when the p_i vary, \sum p_i^2 > \mu^2 / n (by the QM-AM inequality), yielding a strictly smaller variance that reflects reduced overall uncertainty compared to the equal-probability scenario.^[10]

Higher-order moments

The higher-order moments of the Poisson binomial distribution can be derived from its moment-generating function, which is the product over the individual Bernoulli components: M_X(t) = \prod_{i=1}^n (1 - p_i + p_i e^t). The k-th raw moment E[X^k] is then obtained as the k-th derivative of M_X(t) evaluated at t=0. Since the random variable X is a sum of independent Bernoulli random variables X_i, the moments can also be computed using expansions involving multinomial coefficients, though this becomes computationally intensive for large k and n. Recursive methods based on the product form of the moment-generating function allow for efficient calculation of these moments by building from lower-order derivatives.^[11] Cumulants provide a particularly useful representation for the Poisson binomial distribution because they add under summation of independent random variables. The cumulant-generating function is K_X(t) = \log M_X(t) = \sum_{i=1}^n \log(1 - p_i + p_i e^t), so the m-th cumulant \kappa_m is the sum of the m-th cumulants of the individual Bernoulli distributions: \kappa_m = \sum_{i=1}^n \kappa_m^{(i)}, where \kappa_m^{(i)} is the m-th cumulant of X_i \sim \text{[Bernoulli](/page/Bernoulli)}(p_i). This additivity simplifies computations, as each \kappa_m^{(i)} can be found by differentiating \log(1 - p_i + p_i e^t) at t=0. For example, the first two cumulants are the mean \kappa_1 = \sum p_i and variance \kappa_2 = \sum p_i(1 - p_i), serving as base cases for higher orders.^[12] The third cumulant is \kappa_3 = \sum p_i (1 - p_i) (1 - 2p_i), leading to the skewness \gamma_1 = \kappa_3 / \kappa_2^{3/2} = \left[ \sum p_i (1 - p_i) (1 - 2p_i) \right] / \left[ \sum p_i (1 - p_i) \right]^{3/2}. This skewness is zero if and only if p_i = 1/2 for all i, corresponding to a symmetric case equivalent to a scaled binomial distribution. The fourth cumulant is \kappa_4 = \sum p_i (1 - p_i) [1 - 6 p_i (1 - p_i)], yielding the excess kurtosis \gamma_2 = \kappa_4 / \kappa_2^2 = \left[ \sum p_i (1 - p_i) [1 - 6 p_i (1 - p_i)] \right] / \left[ \sum p_i (1 - p_i) \right]^2. These measures quantify deviations from normality, with the excess kurtosis indicating tail heaviness relative to a Gaussian. For the identical p_i = p case (reducing to binomial), these simplify to \gamma_1 = (1 - 2p) / \sqrt{np(1-p)} and \gamma_2 = [1 - 6p(1-p)] / [np(1-p)], confirming the summation structure.

Information-theoretic properties

Entropy

The Shannon entropy of a Poisson binomial random variable X = \sum_{i=1}^n B_i, where the B_i are independent Bernoulli random variables with success probabilities p_i, is defined as

H(X) = -\sum_{k=0}^n P(X = k) \log P(X = k),

with the probability mass function P(X = k) obtained via convolution of the individual Bernoulli distributions.^[13] When all p_i = p, the distribution reduces to the binomial \operatorname{Bin}(n, p), and the entropy simplifies to that of the binomial distribution, which admits series expansions and integral representations for computation. In the general case with heterogeneous p_i, the independence of the B_i implies H(X) \leq \sum_{i=1}^n h(p_i), where h(p) = -p \log p - (1-p) \log(1-p) is the binary entropy function (in natural units); this follows from the data processing inequality applied to the sum as a function of the joint vector (B_1, \dots, B_n).^[14]^[13] The entropy H(X) is maximized when all p_i = 1/2, corresponding to the uniform binomial \operatorname{Bin}(n, 1/2); in this case, H(X) measures the information content of n fair coin flips reduced by dependencies in the sum. For a fixed mean \mu = \sum_{i=1}^n p_i, the entropy H(X) is upper bounded by that of the binomial distribution \operatorname{Bin}(n, \mu/n), with equality when all p_i = \mu/n.^[15] An upper bound on the individual entropies gives \sum_{i=1}^n h(p_i) \leq n \log 2, with equality when each p_i = 1/2. The entropy function is concave in the vector (p_1, \dots, p_n), a result resolving a long-standing conjecture.^[14]^[13] Exact computation of H(X) is intensive, as it requires the full PMF, which can be calculated recursively in O(n^2) time or via fast Fourier transform in O(n \log n) time, but remains challenging for large n. For practical purposes, approximations leverage the central limit theorem, where X converges to a normal distribution with mean \mu = \sum p_i and variance \sigma^2 = \sum p_i (1 - p_i), yielding H(X) \approx \frac{1}{2} \log (2 \pi e \sigma^2).^[16]

Mutual information aspects

The mutual information between the Poisson binomial random variable X = \sum_{i=1}^n Y_i and the vector \mathbf{Y} = (Y_1, \dots, Y_n) of independent Bernoulli random variables Y_i \sim \text{Bern}(p_i) is I(X; \mathbf{Y}) = H(X) - H(X \mid \mathbf{Y}). Since X is a deterministic function of \mathbf{Y}, the conditional entropy H(X \mid \mathbf{Y}) = 0, yielding I(X; \mathbf{Y}) = H(X). For a subset S \subseteq \{1, \dots, n\}, the mutual information I(X; \mathbf{Y}_S) quantifies the dependence between X and the subvector \mathbf{Y}_S = (Y_i)_{i \in S}. By the chain rule and independence of the Bernoulli components, H(X \mid \mathbf{Y}_S) = H\left( \sum_{i \notin S} Y_i \right), the entropy of the Poisson binomial sum over the complement S^c. Thus, I(X; \mathbf{Y}_S) = H(X) - H\left( \sum_{i \notin S} Y_i \right). Exact computation requires evaluating the entropies of the relevant Poisson binomial sums using recursive or numerical methods. A key property is that the total mutual information equals the marginal entropy, I(X; \mathbf{Y}) = H(X), reflecting complete information transfer from \mathbf{Y} to X. Subset mutual informations enable analysis of partial dependencies, which is valuable in feature selection for models with heterogeneous success probabilities, as maximizing I(X; \mathbf{Y}_S) identifies subsets of trials most predictive of the aggregate outcome.

Concentration inequalities

Chernoff bounds

The Chernoff bounds offer powerful exponential concentration inequalities for the tail probabilities of a Poisson binomial random variable X = \sum_{i=1}^n X_i, where the X_i are independent Bernoulli random variables with success probabilities p_i \in [0,1] and mean \mu = \mathbb{E}[X] = \sum_{i=1}^n p_i. These bounds exploit the moment generating function (MGF) of X, given by

M_X(t) = \mathbb{E}[e^{tX}] = \prod_{i=1}^n (1 - p_i + p_i e^t)

for t \in \mathbb{R}. For the upper tail, Markov's inequality applied to e^{tX} yields, for any t > 0,

P(X \geq k) \leq e^{-t k} M_X(t) = e^{-t k} \prod_{i=1}^n (1 - p_i + p_i e^t).

The Chernoff bound is then obtained by taking the infimum over t > 0, providing a tight exponential decay rate specific to the heterogeneous p_i. A similar argument applies to the lower tail by considering P(X \leq k) \leq e^{-t k} M_X(t) for t < 0. This method, introduced by Chernoff in 1952 for sums of independent observations, applies directly to the Poisson binomial case due to the independence and the multiplicative structure of the MGF.^[17]^[18] For the multiplicative upper tail deviation, setting k = (1 + \delta) \mu with \delta > 0 gives

P(X \geq (1 + \delta) \mu) \leq \inf_{t > 0} e^{-t (1 + \delta) \mu} \prod_{i=1}^n (1 - p_i + p_i e^t).

When all p_i = p (reducing to the binomial case), optimizing t = \log(1 + \delta) yields the closed-form bound P(X \geq (1 + \delta) \mu) \leq \exp(-\mu D(1 + \delta \| 1)), where D(q \| p) = q \log(q/p) + (1 - q) \log((1 - q)/(1 - p)) is the Kullback-Leibler divergence. In the heterogeneous case, the product form generally provides a tighter bound than approximating with the average p = \mu/n, as the varying p_i allow for a more refined optimization of the infimum; for instance, if some p_i are near 0 or 1, the effective concentration improves beyond the uniform-p assumption. An approximate exponent for small \delta is -\sum_{i=1}^n p_i \log(1 + \delta), reflecting the contribution of each term.^[18]^[19] Additive forms of the Chernoff bound, adapted via Hoeffding or Bernstein techniques, bound deviations |X - \mu| \geq \epsilon. The Hoeffding variant, applicable since each X_i \in [0,1], states

P(|X - \mu| \geq \epsilon) \leq 2 \exp\left( -\frac{2 \epsilon^2}{n} \right)

for \epsilon > 0, which depends only on n and not the specific p_i. A tighter Bernstein-style bound uses the variance \sigma^2 = \sum_{i=1}^n p_i (1 - p_i) \leq \mu and the bounded range, yielding for the upper tail

P(X \geq \mu + \epsilon) \leq \exp\left( -\frac{\epsilon^2}{2 \sigma^2 + (2/3) \epsilon} \right),

with a symmetric form for the lower tail; this is sharper when \sigma^2 \ll n/4 (e.g., when p_i are extreme), outperforming Hoeffding by incorporating the actual spread. These additive bounds, derived from MGF expansions, are particularly useful for deviations on the scale of \sqrt{\mu}.^[19]

Other tail bounds

The Hoeffding bound provides a sub-Gaussian concentration inequality for the Poisson binomial random variable X = \sum_{i=1}^n Y_i, where the Y_i are independent Bernoulli random variables with parameters p_i \in [0,1] and mean \mu = \sum p_i. Specifically,

P(|X - \mu| \geq t) \leq 2 \exp\left( -\frac{2t^2}{n} \right)

for any t > 0. This bound holds independently of the individual p_i values, relying only on the boundedness of the Y_i in [0,1]. The Berry–Esseen theorem offers a quantitative bound on the uniform approximation of the cumulative distribution function (CDF) of the standardized Poisson binomial variable by the standard normal CDF. Let \sigma^2 = \sum p_i (1 - p_i) denote the variance. Then,

\max_{0 \leq k \leq n} \left| P(X \leq k) - \Phi\left( \frac{k - \mu}{\sigma} \right) \right| \leq \frac{0.7915}{\sigma},

where \Phi is the standard normal CDF. The O(1/\sqrt{n}) order follows from \sigma = \Theta(\sqrt{n}) under mild conditions on the p_i. This result depends on the variance and incorporates the third central moment in the general Berry-Esseen framework for non-identical distributions. The bound improves on classical constants and is derived using Fourier analytic techniques for non-identical distributions.^[20] Sanov-type large deviation principles describe the exponential decay of tail probabilities for the Poisson binomial distribution, analogous to empirical measure deviations but adapted to heterogeneous Bernoulli trials. For the event X \approx k with empirical proportion q = k/n, the rate function is given by

I(k) = \sum_{i=1}^n \left[ p_i \log \frac{p_i}{q} + (1 - p_i) \log \frac{1 - p_i}{1 - q} \right],

which is the sum of Kullback–Leibler divergences D_{\text{KL}}(\text{Bern}(p_i) \| \text{Bern}(q)). This yields asymptotic tail bounds of the form P(X \geq k) \approx \exp(-I(k)) for large deviations above the mean, with similar forms for lower tails by symmetry adjustments. The structure arises from tilting the product measure to match the target mean k, providing a precise exponential rate beyond uniform moment-generating function methods.^[21] Recent works since 2015 have developed improved tail bounds tailored to sparse regimes, where many p_i are near 0 or 1, leading to distributions supported on a small number of points relative to n. In such cases, standard bounds like Hoeffding's can be loose due to overestimation of variance; specialized inequalities exploit the effective sparsity (e.g., via covering arguments on the support) to achieve tighter exponential rates, often reducing the constant factors by incorporating the actual third moments or using recursive decompositions. These refinements are particularly useful in high-dimensional reliability models with heterogeneous failure probabilities clustered near extremes.^[20]

Approximations and relations

Binomial approximation

The Poisson binomial distribution arises as the law of the sum of independent but non-identically distributed Bernoulli random variables with success probabilities p_1, p_2, \dots, p_n. When these probabilities are similar, the distribution can be effectively approximated by a binomial distribution \operatorname{Bin}(n, \bar{p}), where \bar{p} = n^{-1} \sum_{i=1}^n p_i is the average probability. This approximation is justified by the matching means—both have expectation n \bar{p}—and nearly matching variances when the p_i exhibit low variability. Specifically, the variance of the Poisson binomial is \sum p_i (1 - p_i ) = n \bar{p} (1 - \bar{p}) - n \sigma_p^2, where \sigma_p^2 = n^{-1} \sum (p_i - \bar{p})^2 is the sample variance of the p_i; thus, the variances coincide exactly when \sigma_p^2 = 0.^[22] The accuracy of the binomial approximation is quantified by the total variation distance d_{\mathrm{TV}}, defined as d_{\mathrm{TV}}(P, Q) = \sup_A |P(A) - Q(A)|, where the supremum is over measurable sets A. A fundamental bound arises from a coupling construction: couple each Bernoulli(p_i) with an independent Bernoulli(\bar{p}), yielding d_{\mathrm{TV}}(\law(S), \operatorname{Bin}(n, \bar{p})) \leq \sum_{i=1}^n |p_i - \bar{p}|, since the total variation distance between individual marginals is |p_i - \bar{p}| and the coupling inequality applies to the sums. This bound highlights that the error vanishes as the p_i converge to \bar{p}. For clustered probabilities, the error simplifies to O(\max_i |p_i - \bar{p}|). More refined upper and lower bounds for d_{\mathrm{TV}} are provided by Ehm using the Stein-Chen method, establishing equivalence to the Kolmogorov distance in this context.^[22] The approximation performs well under the condition that the variance of the p_i is small relative to the binomial variance, specifically \sigma_p^2 / [\bar{p}(1 - \bar{p})] \ll 1. This ensures not only close variances but also alignment of higher moments for moderate n, making the binomial a practical surrogate when the trials are nearly homogeneous, such as in reliability models with similar component failure rates. The Krawtchouk polynomial expansion further refines the approximation by representing the Poisson binomial as the binomial plus signed measures, with error terms controlled by deviations from \bar{p}.^[23]

Poisson and normal approximations

The Poisson binomial distribution arises as the sum X = \sum_{i=1}^n X_i of independent Bernoulli random variables X_i with success probabilities p_i, and it admits a Poisson approximation under conditions suitable for modeling rare events. Specifically, when \max_i p_i \to 0 and n \to \infty such that the mean \lambda = \sum_{i=1}^n p_i remains fixed, the distribution of X is well-approximated by a Poisson distribution with parameter \lambda.^[24] This regime corresponds to scenarios where events are infrequent and heterogeneous probabilities are small, allowing the Poisson to capture the discreteness and rarity effectively.^[25] The accuracy of this approximation is quantified using the Stein-Chen method, which provides an explicit error bound in total variation distance:

d_{\mathrm{TV}}( \mathcal{L}(X), \mathrm{Po}(\lambda) ) \leq \min\left\{1, \frac{1}{\lambda}\right\} \sum_{i=1}^n p_i^2.

This bound, derived for the independent case, highlights that the error decreases as the p_i become smaller, since \sum p_i^2 \leq (\max p_i) \lambda \to 0.^[26] The method originates from Chen's 1975 work on Poisson approximation for sums of dependent indicators, which extended and sharpened Stein's 1972 approach originally developed for normal approximations to dependent variables.^[24] In contrast, for large n where the variance \sigma^2 = \sum_{i=1}^n p_i(1 - p_i) \gg 1, the Poisson binomial distribution converges to a normal distribution N(\mu, \sigma^2) with mean \mu = \sum_{i=1}^n p_i and the variance given above, by the Lindeberg central limit theorem. This holds because the individual variances p_i(1-p_i) \leq 1/4 are uniformly bounded, satisfying the Lindeberg condition for sums of independent non-identical random variables. The normal approximation is particularly effective in the central region, providing a continuous surrogate for large-scale heterogeneous trials. To improve accuracy beyond the basic normal form, especially accounting for asymmetry, an Edgeworth expansion incorporates higher-order cumulants such as skewness for refinement. The leading correction term involves the third cumulant \kappa_3 = \sum_{i=1}^n p_i (1 - p_i) (1 - 2 p_i), yielding an expansion of the form

P\left( \frac{X - \mu}{\sigma} \leq x \right) = \Phi(x) - \frac{\kappa_3}{6 \sigma^3} (x^2 - 1) \phi(x) + O\left( \frac{1}{\sigma^2} \right),

where \Phi and \phi are the standard normal CDF and PDF, respectively; this skewness adjustment is valuable when the p_i vary moderately.

Computational approaches

Exact methods

Exact methods for computing the probability mass function (PMF) and cumulative distribution function (CDF) of the Poisson binomial distribution rely on direct evaluation of its defining structure as the sum of independent Bernoulli random variables with heterogeneous success probabilities p_1, p_2, \dots, p_n. These approaches guarantee precision without approximation error, making them suitable for moderate n (typically n \leq 100), though their computational cost grows with n. Key techniques include dynamic programming, generating function expansions via convolution, and discrete Fourier transform (DFT) inversion of the characteristic function. Dynamic programming provides an efficient recursive algorithm to compute the exact PMF. Define P_k(j) as the probability that the sum of the first k Bernoulli trials equals j, with initial conditions P_0(0) = 1 and P_0(j) = 0 for j \neq 0. The recurrence relation is

P_k(j) = (1 - p_k) P_{k-1}(j) + p_k P_{k-1}(j-1),

for k = 1, \dots, n and j = 0, \dots, k, where terms with negative indices are zero. This builds the full PMF P_n(j) iteratively, requiring O(n^2) time and O(n) space with careful implementation. The CDF follows by cumulative summation. This method originated in reliability analysis during the 1980s, with early recursive formulations appearing in works like those of Barlow and Heidtmann (1984). A refined version was presented by Chen and Liu (1997), emphasizing its exactness for the PMF.^[27] The probability generating function (PGF) offers another exact route through coefficient extraction. The PGF of the Poisson binomial distribution is the product

G(x) = \prod_{i=1}^n (1 - p_i + p_i x),

where the coefficient of x^j in G(x) is precisely P_n(j), the PMF at j. To compute these coefficients exactly, repeated convolution of the individual Bernoulli PGFs can be used, which aligns closely with the dynamic programming approach and also achieves O(n^2) time complexity via sequential multiplication and truncation. This method leverages the convolutional structure inherent to the sum of independent variables and has been detailed in foundational treatments since the 1990s.^[7] For faster computation when evaluating the PMF or CDF at multiple points, the discrete Fourier transform exploits the characteristic function. The characteristic function is \phi(t) = \prod_{i=1}^n (1 - p_i + p_i e^{it}), and the PMF can be recovered via inversion:

P_n(j) = \frac{1}{2\pi} \int_{-\pi}^{\pi} e^{-itj} \phi(t) \, dt.

Discretizing this integral over a grid of m = n+1 points and applying the DFT yields an exact closed-form expression for the PMF, as derived by Fernández and Williams (2010). Using the fast Fourier transform (FFT) for the DFT enables full PMF computation in O(n \log n) time. The CDF can then be obtained by summation. This approach was advanced for practical use by Hong (2013), who introduced the DFT-characteristic function (DFT-CF) method for efficient CDF evaluation. More recent methods, such as ShiftConvolvePoibin (Babbi et al., 2021), enable fast exact tail computations using convolution shifts.^[28]^[29]^[30] These methods trace their computational foundations to the 1980s in dynamic programming applications, with significant refinements in the 1990s and 2010s through Fourier techniques. Modern implementations appear in statistical software post-2010, such as the R package poibin (Hong, 2013), which incorporates recursive and DFT-based exact routines, and the Python library poibin (Straka, 2016), which provides analogous functions for PMF and CDF computation.^[31]

Approximation algorithms

Approximation algorithms for the Poisson binomial distribution are essential for computing the probability mass function (PMF) or cumulative distribution function (CDF) when the number of trials n is large, as exact methods become computationally prohibitive. These methods trade precision for efficiency, providing controlled error bounds suitable for applications requiring rapid evaluation. The saddlepoint approximation refines the normal approximation by utilizing the cumulant generating function (CGF) of the distribution, defined as K(t) = \sum_{i=1}^n \log(1 - p_i + p_i e^t), where p_i are the success probabilities. To approximate the PMF at a point k, one solves for the saddlepoint t satisfying K'(t) = k, then substitutes into the saddlepoint approximation for the PMF:

\hat{f}(k) \approx \frac{1}{\sqrt{2\pi K''(\hat{t})}} \exp\left(K(\hat{t}) - \hat{t} k\right),

with the CDF approximated via integration or related expressions; this yields a relative error of O(1/n). Advances in the 2000s have extended saddlepoint methods to reliability engineering, enabling scalable computations for systems up to n=10^6 components at O(n) time per query by efficient numerical solution of the saddlepoint equation. Recursive convolution approximates the distribution by iteratively computing partial convolutions of the Bernoulli PMFs, starting from the first trial and accumulating products up to a desired precision threshold, with base complexity O(n^2) that can be reduced via early stopping when contributions fall below an error tolerance. This approach leverages the convolution property of independent sums, approximating the full PMF by truncating the recursion at intermediate stages for large n. Monte Carlo simulation estimates the PMF or CDF by generating numerous independent samples of the sum S = \sum_{i=1}^n X_i, where each X_i \sim \text{[Bernoulli](/page/Bernoulli)}(p_i), and constructing histograms or empirical distributions from the samples; for instance, the PMF at k is approximated as the proportion of samples equaling k. Variance reduction via importance sampling reweights samples from a tilted distribution to focus on regions of interest, improving efficiency for tail probabilities. The saddlepoint and normal approximations serve as bases for initializing these simulations or bounding errors.

Applications

Reliability engineering

In reliability engineering, the Poisson binomial distribution models the number of successes in a system of independent but heterogeneous components, each with its own reliability probability p_i, enabling precise assessment of overall system performance when component reliabilities differ. This is particularly valuable for complex systems where assuming identical reliabilities, as in the standard binomial distribution, would be unrealistic. For a series system, which fails if at least one component fails, the system reliability is the probability that all n components succeed, given by P(X = n), where X follows the Poisson binomial distribution. Consequently, the probability of system failure is $1 - P(X = n). This formulation allows engineers to compute exact unreliability metrics for systems like electrical circuits or structural chains, where each link has distinct failure probabilities based on material or environmental factors. In k-out-of-n systems, which operate if at least k components function, the system reliability is P(X \geq k) = \sum_{j=k}^n P(X = j), with probabilities derived from the Poisson binomial distribution. Such systems are common in redundant designs, such as aircraft control units or power grids, and computations often rely on dynamic programming techniques for efficiency. Bounds and stochastic orderings further aid in comparing system reliabilities under varying component parameters. Maintenance models for degrading components incorporate time-varying reliabilities p_i(t), reflecting wear, environmental exposure, or operational stress over time. The Poisson binomial distribution then predicts cumulative system failures by summing independent Bernoulli outcomes adjusted for each component's evolving state. For example, in forecasting field failures for a fleet of high-voltage power transformers under left-truncated and right-censored data from staggered installations and warranties, the total number of failures follows a Poisson binomial distribution, supporting decisions on inspection schedules and replacements.^[32]

Actuarial science

In actuarial science, the Poisson binomial distribution is used to model aggregate insurance claims from heterogeneous policyholders, each with distinct claim probabilities. This allows for more accurate pricing and reserving by accounting for variability in risk profiles, unlike the uniform assumptions in binomial models. For instance, it supports the calculation of ruin probabilities in portfolios with diverse risks.^[3]

Finance

The distribution applies to finance in assessing default risk in credit portfolios, where each loan or bond has its own default probability p_i. The number of defaults follows a Poisson binomial, aiding in value-at-risk computations and stress testing under heterogeneous conditions.^[2]

Machine learning

In machine learning, the Poisson binomial arises in distribution learning from heterogeneous data sources, such as ensemble methods where base classifiers have varying accuracies. It facilitates better uncertainty quantification and calibration in models dealing with non-i.i.d. Bernoulli outcomes.^[2]

Statistical inference

Statistical inference for the Poisson binomial distribution involves estimating the success probabilities p_i for i = 1, \dots, n independent Bernoulli trials, testing hypotheses about these parameters, and constructing confidence intervals, typically based on multiple independent observations of the sum S = \sum_{i=1}^n X_i. The probability mass function of S is a convolution that sums over $2^n terms, complicating direct likelihood evaluation and optimization, especially when individual trial outcomes X_i are unobserved and only the aggregate sum is available.^[33] Maximum likelihood estimation (MLE) of the p_i parameters is challenging due to the intractable likelihood function, but numerical methods such as the expectation-maximization (EM) algorithm or gradient descent can be employed to approximate the MLE, particularly with grouped or incomplete data where trial-level details are unavailable. The EM algorithm iteratively imputes the missing trial outcomes in the E-step and updates parameter estimates in the M-step to maximize the expected complete-data log-likelihood. For instance, in applications like list experiments, where responses are aggregated to protect privacy, the EM algorithm facilitates MLE by treating unobserved individual responses as latent variables. Gradient descent, or other optimization techniques, can directly maximize the observed-data likelihood when feasible, though convergence may require careful initialization to avoid local optima.^[33] The method of moments provides a simpler approach by equating the sample mean to the theoretical mean \mu = \sum_{i=1}^n p_i, yielding an estimator \hat{\mu} = \bar{S} for the total success probability sum; however, individual p_i remain underidentified without additional constraints or higher moments, as infinitely many p_i vectors produce the same mean and variance. The sample variance can estimate \sigma^2 = \sum_{i=1}^n p_i (1 - p_i), but solving for unique p_i requires further assumptions, limiting this method to aggregate parameters.^[20] Hypothesis testing in the Poisson binomial model often focuses on homogeneity of the p_i, such as testing H_0: p_1 = \dots = p_n = p (reducing to a binomial distribution) against the alternative of heterogeneous probabilities. Likelihood ratio tests compare the maximized likelihood under the full Poisson binomial model to that under the constrained binomial model, with the test statistic approximately following a chi-squared distribution under H_0, though boundary issues may necessitate adjustments for small samples. Bootstrap methods are particularly useful for constructing confidence intervals on the cumulative distribution function (CDF) of S, by resampling the observed sums to approximate the sampling distribution of the empirical CDF and deriving percentile intervals that account for parameter uncertainty. Bayesian approaches to inference, prominent in the 2010s, often employ Dirichlet priors on the vector of p_i to incorporate prior knowledge and handle the high-dimensional parameter space, enabling posterior inference via Markov chain Monte Carlo (MCMC) methods. The Dirichlet prior is conjugate to the multinomial likelihood arising from the complete data, facilitating updates when treating the unobserved trials as latent; for example, in aggregate survey analysis, a Dirichlet-Poisson-binomial model corrects for sampling bias by integrating over the prior. However, in high dimensions (n > 100), inference faces significant challenges, including computational intractability of the posterior due to the exponential growth in model complexity, sensitivity to prior specification, and difficulties in achieving MCMC convergence, often requiring dimensionality reduction or approximate methods like variational inference.^[34]^[35]

References

[1]
The Poisson Binomial Distribution— Old & New - Project Euclid
This is an expository article on the Poisson binomial distribution. We review lesser known results and recent progress on this topic.<|control11|><|separator|>
[2]
On computing the distribution function for the Poisson binomial ...
The Poisson binomial distribution describes the distribution of the sum of independent and non-identically distributed random indicators. Each indicator is a ...
[3]
[PDF] The Poisson Binomial Distribution-Old & New - Columbia University
This is an expository article on the Poisson binomial distribution. We review lesser known results and recent progress on this topic, includ- ing geometry of ...
[4]
[PDF] JIA 85 (1959) 0165-0210
with. This result is also due to Poisson (1837, pp. 65–6) and has been ... (Poisson Binomial) distribution is. AJ. 12. (3). (4). Page 8. 172 Evidence for the ...Missing: origin | Show results with:origin
[5]
[PDF] arXiv:1908.10024v1 [math.PR] 27 Aug 2019
Aug 27, 2019 · The aim of this paper is to provide a guide to lesser known results and recent progress of the Poisson binomial distribution, mostly post. 2000.
[6]
[PDF] Using Poisson Binomial GLMs to Reveal Voter Preferences - arXiv
Feb 4, 2018 · The application of the Poisson binomial distribution to the gen- eralized linear model setting has been discussed by Chen and Liu [6], who ...
[7]
Sparse covers for sums of indicators | Probability Theory and ...
Nov 2, 2014 · 1.4 Related work. It is believed that Poisson [22] was the first to study the Poisson Binomial distribution, hence its name. Sometimes the ...
[8]
https://www3.stat.sinica.edu.tw/statistica/j3n2/j3n23/j3n23.htm
[9]
On the number of successes in independent trials
Y. H. Wang. Concordia University. Abstract: A unified combinatorial approach is used to obtain many theorems about S n, the number of successes in n ...
[10]
https://arxiv.org/pdf/1908.10024.pdf
[11]
https://www.statlect.com/probability-distributions/Bernoulli-distribution
[12]
Bernoulli distribution | Properties, proofs, exercises - StatLect
The Bernoulli distribution models binary outcomes, where a random variable takes 1 for success (probability p) and 0 for failure (probability 1-p).
[13]
Cumulant generating function | Formula, derivatives, proofs - StatLect
The cumulant generating function of a random variable is the natural logarithm of its moment generating function.
[14]
A proof of the Shepp-Olkin entropy concavity conjecture
We prove the Shepp–Olkin conjecture, which states that the entropy of the sum of independent Bernoulli random variables is concave in the parameters of the ...
[15]
Entropy of the Sum of Independent Bernoulli Random Variables and ...
ABSTRACT. For sums of independent Bernoulli random variables and for the multinomial distribution it is shown that the entropy h gives a measure of the degree ...Missing: Shepp Olkin
[16]
[PDF] On the Entropy of Sums of Bernoulli Random Variables via the Chen ...
Sep 21, 2012 · from an approximation of this entropy by the entropy of a ... Le Cam, “An approximation theorem for the Poisson binomial distribution,” Pacific ...
[17]
[PDF] Expressions for the Entropy of Binomial-Type Distributions - arXiv
Jul 21, 2018 · Moreover, understanding the entropy of binomial and Poisson distributions is a key step towards a characterization of the capacity of basic ...
[18]
A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based ...
A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Herman Chernoff.
[19]
[PDF] Chernoff bounds, and some applications 1 Preliminaries
Feb 21, 2015 · We will start with the statement of the bound for the simple case of a sum of independent Bernoulli trials, i.e. the case in which each random ...
[20]
[PDF] Old and New Concentration Inequalities - UCSD Math
This covers the case when the limit distribution is Poisson as well as normal. There are many variations of the Chernoff inequalities.<|control11|><|separator|>
[21]
[PDF] The Poisson Binomial Distribution-Old & New - NSF-PAR
The aim of this paper is to provide a guide to lesser known results and recent progress of the Poisson binomial distri- bution, mostly post 2000. The rest of ...
[22]
Improved Inequalities for the Poisson and Binomial Distribution and ...
Dec 23, 2013 · ... Chernoff exponential bound for the Poisson upper tail CDF by substitution of x with . ... Extension of (29) to cover the Poisson-Binomial ...
[23]
Binomial approximation to the Poisson binomial distribution
The proof uses the Stein—Chen technique. Equivalence of the total variation and the Kolmogorov distance is established, and an application to sampling with and ...
[24]
An approximation theorem for the Poisson binomial distribution.
An approximation theorem for the Poisson binomial distribution. Lucien Le Cam. DOWNLOAD PDF + SAVE TO MY LIBRARY. Pacific J. Math. 10(4): 1181-1197 (1960).
[25]
Binomial Approximation to the Poisson Binomial Distribution
The Poisson binomial distribution is approximated by a binomial distribution and also by finite signed measures resulting from the corresponding Krawtchouk ...
[26]
Poisson Approximation for Dependent Trials - Project Euclid
This paper establishes a general method of obtaining and bounding the error in approximating the distribution of ∑ni=1Xi ∑ i = 1 n X i by the Poisson ...
[27]
[PDF] Poisson Approximation and the Chen-Stein Method - USC Dornsife
In this section, we will state three Poisson approx- imation theorems, each giving bounds in terms of the total variation distance between two distributions.
[28]
[PDF] STEIN-CHEN METHOD FOR POISSON APPROXIMATION
Stein-Chen method is a powerful, modern technique which extends classical Poisson approximation results such as Poisson's law of small numbers, even to cases ...
[29]
[PDF] Multivariate Poisson–Binomial approximation using Stein's method ...
Abstract. The paper is concerned with the accuracy in total variation of the approximation of the distribution of a sum of independent Bernoulli distributed.
[30]
tsakim/poibin: Poisson Binomial Probability Distribution for Python
The module contains a Python implementation of functions related to the Poisson Binomial probability distribution.Missing: exact post- 2010
[31]
Ml estimation in the poisson binomial distribution with grouped data ...
The maximum likelihood estimation of parameters of the Poisson binomial distribution, based on a sample with exact and grouped observations, is considered ...
[32]
The Likelihood Ratio Test for Poisson versus Binomial Distributions
We approach it as a “boundary value” estimation and testing problem, where the boundary N = ∞ corresponds to a Poisson distribution for the data, whereas N < ∞ ...
[33]
Bias correction and Bayesian analysis of aggregate counts in SAGE ...
Feb 3, 2010 · Dirichlet-Poisson-Binomial estimates with the flat prior. ... Given the value of r and assuming a Dirichlet prior on the unknown initial ...
[34]
[PDF] High-Dimensional Poisson Structural Equation Model Learning via ...
Abstract. In this paper, we develop a new approach to learning high-dimensional Poisson structural equation models from only observational data without ...Missing: challenges | Show results with:challenges