Poisson binomial distribution
The Poisson binomial distribution is a discrete probability distribution that arises as the sum of a fixed number of independent Bernoulli random variables, each with its own distinct success probability p_i (where $0 \leq p_i \leq 1), generalizing the classical binomial distribution where all probabilities are equal.[1][2] Named after the French mathematician Siméon Denis Poisson, who introduced it in 1837 as a heterogeneous extension of the binomial model for analyzing jury decisions and moral probabilities, the distribution has since been studied for its combinatorial and probabilistic properties, with modern surveys highlighting its connections to polynomial geometry and approximation theory.[3][4] The probability mass function for a Poisson binomial random variable X = \sum_{i=1}^n Y_i, where each Y_i \sim \text{Bernoulli}(p_i), is given byP(X = k) = \sum_{\substack{A \subseteq \{1, \dots, n\} \\ |A| = k}} \prod_{i \in A} p_i \prod_{i \notin A} (1 - p_i),
for k = 0, 1, \dots, n, reflecting all subsets of successes.[1] Its mean is \mu = \sum_{i=1}^n p_i and variance is \sigma^2 = \sum_{i=1}^n p_i(1 - p_i), both straightforward sums unlike the uniform case.[3] The distribution exhibits the strong Rayleigh property, meaning its probability generating function has only real, negative roots, which aids in bounding tail probabilities and stochastic comparisons.[1] For large n, it can be approximated by a normal distribution with the above mean and variance, or by a Poisson distribution under rare-event conditions (small p_i), with error bounds established in classical results like Le Cam's theorem.[2] Computation of the exact distribution function is nontrivial due to the exponential number of terms but can be efficiently handled via discrete Fourier transforms of the characteristic function.[2] Applications span reliability engineering (e.g., system failure probabilities with heterogeneous components), survey sampling (e.g., nonresponse adjustments), actuarial science (e.g., insurance claim aggregates), finance (e.g., default risk portfolios), and machine learning (e.g., distribution learning from heterogeneous data).[3][2]