Compound Poisson distribution

In probability theory, the compound Poisson distribution is the probability distribution of the random sum S = \sum_{i=1}^{N} Y_i, where N follows a Poisson distribution with rate parameter \lambda \geq 0, and the Y_i (for i = 1, 2, \dots) are independent and identically distributed random variables, independent of N, with S = 0 if N = 0.^[1]^[2]^[3] This distribution arises naturally in the context of a compound Poisson process, a continuous-time stochastic process \{S(t): t \geq 0\} defined as S(t) = \sum_{i=1}^{N(t)} Y_i, where N(t) is a Poisson process with rate \lambda counting the number of events up to time t, and each event contributes a random size Y_i from a common distribution (often positive for applications like claims).^[1]^[2] The process has stationary and independent increments, meaning the distribution of S(t + h) - S(t) is identical to that of S(h) for any h > 0, and increments over non-overlapping intervals are independent.^[1] The compound Poisson distribution at fixed t inherits these properties in a discrete sense and is infinitely divisible, allowing it to be expressed as the limit of sums of i.i.d. variables.^[1] Key moments of the distribution include the expected value E[S] = \lambda E[Y] and variance \mathrm{Var}(S) = \lambda E[Y^2], where E[Y] and E[Y^2] are the first and second moments of the Y_i distribution, respectively.^[1]^[2]^[3] The moment-generating function is M_S(s) = \exp(\lambda (M_Y(s) - 1)), where M_Y(s) is the moment-generating function of Y, facilitating computations for convolutions and approximations.^[1] Special cases include the standard Poisson distribution when Y_i = 1 almost surely, and the negative binomial distribution when the Y_i follow a logarithmic series distribution.^[1] The compound Poisson distribution is widely applied in actuarial science and risk theory to model aggregate claims or losses, where N represents the number of claims (Poisson-distributed) and each Y_i the claim severity.^[2]^[3] It also appears in queueing theory for batch arrivals, reliability engineering for cumulative damage, and finance for modeling jumps in asset prices.^[1]^[2] Its flexibility in accommodating heavy-tailed severity distributions makes it suitable for rare-event scenarios with variable impacts.^[2]

Fundamentals

Definition

The compound Poisson distribution describes the probability distribution of a random sum S = \sum_{i=1}^N X_i, where N follows a Poisson distribution with rate parameter \lambda > 0, and the X_i (for i = 1, 2, \dots) are independent and identically distributed random variables with a common distribution function F, independent of N.^[4] The parameter \lambda governs the expected number of terms in the sum, while F specifies the distribution of each individual "jump" size X_i, which may take positive, negative, or mixed values depending on the application. This construction models scenarios where the total outcome results from a random number of independent events, each contributing a variable amount, such as aggregate claims in insurance or total displacement in a random walk with Poisson arrivals.^[5] The jumps X_i need not be restricted to positive values; for instance, they can represent net changes that include both gains and losses, allowing the distribution to capture signed increments in stochastic models.^[6] The term 'compound Poisson distribution' was coined by William Feller in 1943 as part of his work on "contagious" distributions, which encompass compound forms arising in stochastic processes with clustering or batch arrivals.^[7] The concept has roots in earlier actuarial applications, such as Filip Lundberg's collective risk theory.^[8] Feller's formulation emphasized the role of such distributions in generalizing classical Poisson models to account for variable event sizes within the framework of infinitely divisible laws.

Parameterization

The compound Poisson distribution is standardly parameterized by a rate parameter \lambda > 0, which denotes the intensity of the underlying Poisson process governing the number of jumps, and the cumulative distribution function F of the independent and identically distributed jump sizes X_i, where the jumps have mean \mu = \mathbb{E}[X_i] and variance \sigma^2 = \mathrm{Var}(X_i). The case \lambda = 0 results in a degenerate distribution with S = 0 almost surely.^[9]^[10] In the framework of Lévy processes, an alternative parameterization employs the Lévy measure \nu(dx) = \lambda F(dx), which captures the expected rate of jumps of size in the infinitesimal interval dx and is particularly useful for specifying the infinitesimal generator of the associated stochastic process.^[11] The support of the jump distribution F varies by application: it is typically defined on the real line \mathbb{R} for general bilateral jumps allowing positive and negative increments, on the non-negative integers \mathbb{Z}^+ for discrete counting models, or on the positive reals \mathbb{R}^+ for subordinators used in time-change representations.^[11] (Sato, 1999, on Lévy processes with positive jumps) The parameters must satisfy \lambda \geq 0 to ensure non-negativity of the Poisson intensity, with F must be a proper probability distribution integrating to 1 over its support to preserve probabilistic normalization.^[11]^[10]

Mathematical Properties

Generating Functions

The compound Poisson random variable S is defined as S = \sum_{i=1}^N X_i, where N follows a Poisson distribution with rate parameter \lambda > 0, the X_i are independent and identically distributed random variables independent of N, and each X_i has a distribution with probability generating function G_X(s) = \mathbb{E}[s^{X_i}].^[12] The probability generating function of S is derived using the law of total expectation:

G_S(s) = \mathbb{E}[s^S] = \mathbb{E}\left[ \mathbb{E}[s^S \mid N] \right] = \mathbb{E}\left[ G_X(s)^N \right] = G_N\left( G_X(s) \right),

where G_N(u) = \exp(\lambda (u - 1)) is the probability generating function of the Poisson random variable N. Substituting yields

G_S(s) = \exp\left( \lambda \left( G_X(s) - 1 \right) \right).

This form holds for |s| \leq 1 when the X_i are nonnegative integer-valued, ensuring convergence.^[12] Similarly, the moment-generating function of S is obtained by conditioning on N:

M_S(t) = \mathbb{E}[e^{tS}] = \mathbb{E}\left[ \mathbb{E}[e^{tS} \mid N] \right] = \mathbb{E}\left[ M_X(t)^N \right] = M_N\left( \log M_X(t) \right),

with M_N(u) = \exp(\lambda (e^u - 1)) for the Poisson N. This simplifies to

M_S(t) = \exp\left( \lambda \left( M_X(t) - 1 \right) \right),

defined for t in the domain where M_X(t) exists and is positive.^[12] The characteristic function of S follows analogously:

\phi_S(t) = \mathbb{E}[e^{itS}] = \mathbb{E}\left[ \mathbb{E}[e^{itS} \mid N] \right] = \mathbb{E}\left[ \phi_X(t)^N \right] = G_N\left( \phi_X(t) \right),

where G_N(u) = \exp(\lambda (u - 1)) is the PGF of N, analytically continued to the complex value \phi_X(t) with |\phi_X(t)| \leq 1. This yields

\phi_S(t) = \exp\left( \lambda \left( \phi_X(t) - 1 \right) \right),

where \phi_X(t) = \mathbb{E}[e^{itX_i}] and t \in \mathbb{R}. This exponential form demonstrates the infinite divisibility of the compound Poisson distribution, as \phi_S(t) = \left[ \exp\left( \frac{\lambda}{m} (\phi_X(t) - 1) \right) \right]^m for any positive integer m, corresponding to the characteristic function of the sum of m i.i.d. copies each with rate \lambda/m.^[6]

Moments and Cumulants

The mean of a compound Poisson random variable S = \sum_{i=1}^N X_i, where N \sim \mathrm{Poisson}(\lambda) and the X_i are i.i.d. with mean \mu = \mathbb{E}[X_i] independent of N, is given by

\mathbb{E}[S] = \lambda \mu.

This follows from the law of total expectation, \mathbb{E}[S] = \mathbb{E}[\mathbb{E}[S \mid N]] = \mathbb{E}[N \mu] = \lambda \mu. The variance is

\mathrm{Var}(S) = \lambda (\sigma^2 + \mu^2) = \lambda \mathbb{E}[X_i^2],

where \sigma^2 = \mathrm{Var}(X_i). This result is obtained using the law of total variance, \mathrm{Var}(S) = \mathbb{E}[\mathrm{Var}(S \mid N)] + \mathrm{Var}(\mathbb{E}[S \mid N]) = \mathbb{E}[N] \sigma^2 + \mathrm{Var}(N) \mu^2 = \lambda \sigma^2 + \lambda \mu^2, or equivalently as the second cumulant of S. The cumulants \kappa_n(S) of S for n \geq 1 are

\kappa_n(S) = \lambda \mathbb{E}[X_i^n].

These arise from the cumulant-generating function \log \mathbb{E}[e^{tS}] = \lambda (\mathbb{E}[e^{t X_i}] - 1), whose Taylor expansion coefficients yield the cumulants scaled by the Poisson rate \lambda and the raw moments of the jump size distribution. All cumulants of S are thus determined solely by the moments of the X_i, reflecting the structure of the compound Poisson as a Lévy process with finite jump activity. Higher-order moments of S can be expressed in terms of its cumulants using Faà di Bruno's formula, which relates the raw moments to complete Bell polynomials in the cumulants, or via the cumulant expansion for central moments. For instance, the third central moment is \kappa_3(S), while the fourth involves \kappa_4(S) + 3 [\kappa_2(S)]^2. These relations highlight how the moment structure of S inherits and scales the moment properties of the compounding distribution.^[13] The form of the cumulants \kappa_n(S) = \lambda \mathbb{E}[X_i^n] for all n \geq 1 also establishes the infinite divisibility of the compound Poisson distribution. Specifically, the cumulant-generating function can be scaled by any \alpha \in (0,1) to \alpha \log \mathbb{E}[e^{tS}] = \lambda' (\mathbb{E}[e^{t X_i}] - 1) with \lambda' = \alpha \lambda > 0, corresponding to the cumulants of an \alpha-th root distribution that remains a valid compound Poisson with the same jump measure but adjusted rate. This property aligns with the general characterization of infinitely divisible distributions via their Lévy-Khintchine representation.

Special Cases

Discrete Compound Poisson Distribution

The discrete compound Poisson distribution arises in the compound Poisson framework when the jump sizes X_i are independent and identically distributed random variables supported on the non-negative integers \mathbb{Z}_{\geq 0} = \{0, 1, 2, \dots \}, with probability mass function p_k = P(X_i = k) for k \geq 0 satisfying \sum_{k=0}^\infty p_k = 1.^[14] The total S = \sum_{i=1}^N X_i, where N \sim \mathrm{[Poisson](/page/Poisson)}(\lambda) is independent of the X_i, then follows a discrete compound Poisson distribution, which is infinitely divisible and supported on a lattice spanning the non-negative integers.^[15] According to Feller's characterization, every infinitely divisible probability distribution on the non-negative integers is a discrete compound Poisson distribution.^[16] The probability mass function of S is given by

P(S = s) = \sum_{n=0}^\infty e^{-\lambda} \frac{\lambda^n}{n!} \, P\left( \sum_{i=1}^n X_i = s \right), \quad s = 0, 1, 2, \dots,

where the inner term is the probability mass function of the n-fold convolution of the distribution of X_i, with the convention that the sum is 0 (probability 1) when n=0.^[14] This form directly mixes the Poisson probabilities with the convolutions of the discrete jump distribution, emphasizing the compound structure and the resulting lattice support. In discrete settings, direct evaluation of this infinite sum can be computationally intensive due to the convolutions, but the Panjer recursion offers an efficient alternative by computing the mass function recursively from lower-order terms.^[17] Introduced by Panjer for compound distributions including the Poisson case with discrete severity, the recursion exploits the structure to iteratively calculate P(S = s) for successive integer values of s, avoiding explicit summation over n and making it suitable for numerical implementation in fields requiring precise tail probabilities.^[17] A specific example is the compound Poisson distribution with Bernoulli jumps, where each X_i \sim \mathrm{[Bernoulli](/page/Bernoulli)}(p) for $0 < p < 1, so P(X_i = 1) = p and P(X_i = 0) = 1 - p. In this case, S follows a \mathrm{[Poisson](/page/Poisson_distribution)}(\lambda p) distribution, as the sum corresponds to a thinned Poisson process where each event is retained independently with probability p.^[18] This special case is distinct from the Poisson binomial distribution, which involves the sum of a fixed number of independent but possibly heterogeneous Bernoulli random variables.^[14] Another prominent special case is the negative binomial distribution, obtained when the jump sizes X_i follow a logarithmic series distribution with parameter \theta ($0 < \theta < 1), which has probability mass function P(X_i = k) = -\frac{\theta^k}{k \ln(1 - \theta)} for k = 1, 2, \dots. In this setup, the compound sum S follows a negative binomial distribution, with parameters related to \lambda and \theta, such as the number of successes r = -\frac{\lambda \ln(1 - \theta)}{\ln \theta} in one common parameterization.^[18]

Continuous Compound Poisson Distributions

The continuous compound Poisson distribution arises when the jump sizes in a compound Poisson random variable follow a continuous probability density function f, typically supported on the positive reals or the full real line depending on the application. Let N \sim \text{Poisson}(\lambda) denote the number of jumps, and let X_1, X_2, \dots, X_N be i.i.d. continuous random variables with density f, independent of N. The total sum S = \sum_{i=1}^N X_i (with S=0 if N=0) then has a continuous compound Poisson distribution for s > 0, with probability density function given by the infinite mixture

f_S(s) = \sum_{n=0}^\infty e^{-\lambda} \frac{\lambda^n}{n!} f^{*n}(s),

where f^{*n} denotes the n-fold convolution of f with itself (and f^{*0}(s) = \delta(s) is the Dirac delta at zero).^[18]^[19] This expression reflects the Poisson-weighted superposition of the densities of sums of n i.i.d. jumps, capturing the inherent mixing of discrete event counts with continuous severities. A prominent special case occurs when the jump density f is that of a gamma distribution, \text{Gamma}(\alpha, \beta), leading to the Tweedie distribution within the exponential dispersion family for the power variance parameter $1 < p < 2. Specifically, the compound Poisson sum with gamma jumps yields a Tweedie random variable Y satisfying E[Y] = \mu and \text{Var}(Y) = \phi \mu^p, where \phi > 0 is the dispersion parameter, making it suitable for modeling overdispersed positive data with a point mass at zero. This connection was established by Tweedie, who parameterized the model to unify Poisson frequency with gamma severity in a single exponential family framework. Jørgensen further elaborated on its properties, showing how the Tweedie arises as a reproductive exponential dispersion model with power variance function. Other notable examples include exponential jumps, where f is exponential with rate \beta (equivalent to gamma with shape \alpha=1), resulting in a distribution whose conditional form given N=n is \text{Gamma}(n, \beta), and the unconditional density follows the Poisson mixture as above; for exponential jumps (equivalent to gamma with shape α=1), this corresponds to the Tweedie distribution with power parameter p=1.5.^[20] The pure gamma distribution arises in the limit as α → 0+ (p → 2-) with suitable scaling to eliminate the point mass at zero. For normal jumps, X_i \sim \mathcal{N}(\mu, \sigma^2), the sum S forms a symmetric infinite mixture of normals centered at n\mu with variance n\sigma^2, weighted by the Poisson probabilities, producing a pure jump process akin to the jump component of a variance-gamma distribution but without the continuous Brownian motion path; such models are used in financial modeling for symmetric Lévy processes. Computing the density f_S(s) poses numerical challenges due to the infinite sum and the complexity of evaluating multiple convolutions f^{*n}(s), which involve nested integrals that grow computationally intensive for large \lambda or intricate f. Direct truncation of the sum to finite n can introduce bias, particularly in the tails, while exact convolution via recursive integration is often infeasible for non-closed-form densities. Approximation methods, such as the fast Fourier transform (FFT) applied to the characteristic function \phi_S(t) = \exp(\lambda (\phi_X(t) - 1)) (where \phi_X is the jump characteristic function), enable efficient inversion to recover the density or cumulative distribution function on a grid, with error controlled by discretization and aliasing mitigation techniques. These FFT-based approaches outperform recursive methods for continuous cases with moderate to high resolution needs, as demonstrated in comparative analyses of computational efficiency.^[21]^[22]

Stochastic Processes

Compound Poisson Process

The compound Poisson process is a continuous-time stochastic process \{S(t), t \geq 0\} defined as S(t) = \sum_{i=1}^{N(t)} X_i, where N(t) is a Poisson process with intensity \lambda > 0, and the X_i are independent and identically distributed random variables with common distribution F, independent of N(t), with S(0) = 0.^[1]^[23] This construction extends the compound Poisson distribution to a dynamic setting, where S(1) follows the compound Poisson distribution with parameters \lambda and F.^[1] The process exhibits independent and stationary increments: for $0 \leq s < t, the increment S(t) - S(s) is independent of \{S(u) : u \leq s\} and follows a compound Poisson distribution with parameters \lambda(t - s) and F.^[1]^[23]^[24] Sample paths of the compound Poisson process are right-continuous with left limits (càdlàg) and feature jumps only at the arrival times of the underlying Poisson process, resulting in finitely many jumps over any finite time interval due to the finite activity of the Poisson process.^[23]^[24] The intensity of the process is governed by \lambda, the rate of the Poisson counting process. The compensator, which is the predictable increasing process making S(t) a martingale when subtracted, is \lambda \mu t, where \mu = \mathbb{E}[X_i]; thus, if the jumps are centered (\mu = 0), the process is a pure jump martingale with no drift term, whereas otherwise it includes a deterministic drift component \lambda \mu t.^[23]^[1]

Lévy Process Connections

The compound Poisson process is a special case of a Lévy process, characterized by its Lévy triplet consisting of zero diffusion coefficient, zero Gaussian component, and a finite jump measure \nu(dx) = \lambda F(dx), where \lambda > 0 is the intensity of the underlying Poisson process and F is the probability distribution of the jump sizes.^[25] This finite activity structure distinguishes it from more general Lévy processes, which may exhibit infinite jump activity, and ensures that the paths are piecewise constant with finitely many jumps in any finite time interval.^[26] The connection to the broader class of Lévy processes is formalized through the Lévy-Khintchine representation of the characteristic function. For a compound Poisson process, the characteristic function at time t takes the form \mathbb{E}[e^{i u X_t}] = \exp\left( t \int_{\mathbb{R}} (e^{i u x} - 1 - i u x \mathbf{1}_{|x|<1}) \nu(dx) \right), where the Lévy measure \nu = \lambda F is finite, \int_{\mathbb{R}} \nu(dx) = \lambda < \infty.^[27] This representation highlights how the compound Poisson process embeds within the infinitely divisible distributions underlying all Lévy processes, with the truncation term i u x \mathbf{1}_{|x|<1} adjusting for small jumps, though in this finite measure case, it aligns closely with the simpler exponent \lambda t \int (e^{i u x} - 1) F(dx).^[27] When the jump sizes are positive almost surely, the compound Poisson process becomes a subordinator, a non-decreasing Lévy process with increasing paths.^[28] In this case, the Laplace transform is \mathbb{E}[e^{-\theta X_t}] = \exp\left\{ -\lambda t \int_0^\infty (e^{-\theta y} - 1) F(dy) \right\} for \theta \geq 0, reflecting its role in modeling non-negative increments such as cumulative damage or arrival times.^[28] Generalizations of the compound Poisson process extend to infinite activity Lévy processes by allowing the jump measure to become infinite while preserving infinite divisibility, often through limits of scaled or subordinated versions. For instance, the variance gamma process arises as a limit of a time-changed Brownian motion subordinated by a gamma process, featuring infinitely many small jumps and a Lévy density that captures both positive and negative jumps with finite variation paths.^[29] Similarly, stable processes emerge as limits of compound Poisson approximations via the generalized central limit theorem, exhibiting heavy tails and self-similarity, with no finite variance when the stability index \alpha < 2, thus broadening applications in modeling phenomena with infinite small jumps like financial returns.^[29]

Applications

Risk and Insurance Modeling

In the classical risk model, known as the Cramér-Lundberg model, the aggregate claims process S(t) over time t is represented as a compound Poisson distribution, where the number of claims N(t) follows a Poisson process with intensity \lambda, and individual claim sizes Y_i are independent and identically distributed according to a severity distribution F with finite mean \mu = \mathbb{E}[Y_i].^[30] Thus, S(t) = \sum_{i=1}^{N(t)} Y_i, capturing the randomness in both claim frequency and size typical of non-life insurance portfolios.^[30] Common choices for F include the Pareto distribution for modeling heavy-tailed losses in property insurance or the lognormal distribution for bodily injury claims, as these reflect empirical patterns of extreme events. A key application is the assessment of ruin probability \psi(u), the likelihood that the insurer's surplus falls below zero starting from initial capital u. Under the Cramér-Lundberg model with positive safety loading, Lundberg's inequality provides an exponential upper bound: \psi(u) \leq e^{-R u}, where R > 0 is the adjustment coefficient solving the Lundberg equation c = \lambda (\hat{m}_Y(R) - 1), with c the constant premium income rate and \hat{m}_Y(r) the moment generating function of Y_i.^[30] For light-tailed claim sizes, the approximation \psi(u) \approx C e^{-R u} (with constant C < 1) holds asymptotically as u \to \infty, aiding solvency analysis in regulatory contexts like Solvency II.^[30] Premium calculation in this framework relies on the expected aggregate claims, given by \mathbb{E}[S(t)] = \lambda t \mu, to ensure long-term profitability. The net premium principle sets the premium rate \pi = \lambda \mu (1 + \theta), incorporating a loading factor \theta > 0 to cover operational costs and provide a safety margin against variability. For estimating rare ruin events where direct Monte Carlo simulation is inefficient, importance sampling techniques adapt the probability measure to oversample paths leading to ruin, reducing variance while targeting the compound Poisson tail behavior.^[31] These methods, often exponentially twisted based on the Lundberg exponent, achieve logarithmic efficiency for light-tailed severities and are implemented in actuarial software for stress testing portfolios.^[31]

Queueing and Operations Research

In queueing theory, the compound Poisson distribution models batch arrivals where customers arrive in groups according to a Poisson process, with the batch size following a general distribution, leading to the total arrival process being a compound Poisson process. This framework is particularly useful for systems like transportation hubs or call centers where multiple entities arrive simultaneously. The M^X/G/1 queue represents a single-server system with Poisson batch arrivals at rate λ, batch sizes X with distribution F (mean β = E[X], second moment E[X^2]), general service times with mean 1/μ, and utilization ρ = λ β / μ < 1. The system's state at departure epochs forms an embedded Markov chain, allowing analysis of stationary queue length and waiting time distributions via generating functions or matrix-analytic methods.^[32] Bulk queue models extend this to incorporate batch arrivals and services, adapting the Pollaczek-Khinchine formula to account for batch variability. In such models, the mean waiting time incorporates the batch size distribution F, yielding an expression of the form E[W] = \frac{λ E[B^2]}{2(1 - ρ)} where B represents the effective batch service time influenced by both arrival and service batches, highlighting how increased batch variance elevates queue congestion. This adaptation reveals that batching amplifies the impact of second moments on performance metrics, with stability requiring ρ < 1, and has been applied to optimize dispatching in bulk service systems like buses or elevators. For discrete batch sizes, the model aligns with integer-valued compound Poisson distributions, enabling exact computations via roots of characteristic equations.^[33]^[34] In inventory theory, demand is often modeled as a compound Poisson process with rate λ and jump sizes from distribution F (mean μ, variance σ²), capturing sporadic bulk orders in retail or manufacturing. The (s, S) policy triggers replenishment when inventory falls to or below s, ordering up to S, minimizing fixed ordering and holding costs while accounting for lead times. Safety stock levels are determined using the demand variance λ(σ² + μ²), which quantifies uncertainty to maintain service levels without excessive inventory. This approach ensures the long-run average cost is optimized, with the policy's structure proven optimal under backorder assumptions.^[35]^[36] Optimization in these systems employs dynamic programming to minimize total costs, including holding, shortage, and setup, by evaluating state-dependent decisions over the compound Poisson demand horizon. The value function satisfies a Bellman equation balancing immediate costs against future expected demands, yielding (s, S)-like thresholds as solutions for continuous review. Seminal formulations demonstrate that setup costs induce hysteresis in ordering, with computational efficiency achieved via recursive algorithms for finite horizons or approximations for infinite cases. High-impact applications include multi-echelon supply chains, where coordinating replenishments reduces overall variance amplification.^[37]^[36]