The probability-generating function (PGF), also known as the probability generating function, is a formal power series associated with a discrete random variable X that takes non-negative integer values, defined as G(s) = \mathbb{E}[s^X] = \sum_{k=0}^{\infty} p_k s^k, where p_k = P(X = k) denotes the probability mass function and |s| \leq 1 to ensure convergence.[1] This function encodes the entire probability distribution of X in its coefficients, which can be recovered by evaluating derivatives at s = 0, such as P(X = k) = \frac{G^{(k)}(0)}{k!}.[2]Key properties of the PGF include its utility in computing moments: the mean is given by \mathbb{E}[X] = G'(1), the second factorial moment by \mathbb{E}[X(X-1)] = G''(1), and higher-order moments via further derivatives evaluated at s = 1, with the variance expressed as \mathrm{Var}(X) = G''(1) + G'(1) - [G'(1)]^2.[3] For independent random variables X and Y, the PGF of their sum X + Y is the product G_{X+Y}(s) = G_X(s) G_Y(s), facilitating the analysis of convolutions and compound distributions.[4] These properties make the PGF particularly suitable for non-negative integer-valued random variables, distinguishing it from the more general moment-generating function used for continuous or unbounded discrete distributions.[2]PGFs find extensive applications in probability theory and related fields, such as deriving the distribution of sums of independent random variables, analyzing branching processes—where the extinction probability satisfies \theta = G(\theta)—and modeling queueing systems or renewal processes.[3] They are especially valuable for common discrete distributions like the binomial, Poisson, and negative binomial, where closed-form expressions simplify computations of probabilities and moments.[5] In stochastic processes, PGFs enable efficient handling of generating functions for waiting times or offspring distributions, providing a bridge between combinatorial methods and probabilistic analysis.[6]
Definition
Univariate case
The probability generating function (PGF) of a univariate discrete random variable X that takes non-negative integer values $0, 1, 2, \dots with probability mass function P(X = k) = p_k (where \sum_{k=0}^\infty p_k = 1 and p_k \geq 0) is defined asG(s) = \mathbb{E}[s^X] = \sum_{k=0}^\infty p_k s^k.This formal power series encodes the probabilities p_k as its coefficients.[1][7]The PGF G(s) represents the expectation of s^X, providing a compact way to summarize the distribution of X. As a power series centered at s = 0, it expands to reveal the probability structure, with G(1) = 1 ensuring normalization when the series converges at the boundary. This formulation is particularly suited to non-negative integer-valued random variables, such as those modeling counts or occurrences in stochastic processes, because the non-negative integers align naturally with the exponents in the power series, facilitating analytic manipulations like differentiation for moments.[8][1]The domain of G(s) is the complex unit disk |s| \leq 1, where the series converges absolutely due to the bounded coefficients p_k \leq 1. The radius of convergence is at least 1, as \limsup_{k \to \infty} |p_k|^{1/k} \leq 1, guaranteeing convergence within and on the boundary of the unit disk.[8][7]Within the radius of convergence, the PGF uniquely determines the probabilities p_k, and vice versa: the coefficients are recoverable via p_k = \frac{G^{(k)}(0)}{k!}, establishing a one-to-one correspondence between the distribution and its generating function.[1][8]
Multivariate case
In the multivariate case, the probability-generating function extends the univariate concept to a random vector (X_1, \dots, X_n) with non-negative integer-valued components and joint probability mass function p_{k_1, \dots, k_n} = P(X_1 = k_1, \dots, X_n = k_n). It is defined asG(s_1, \dots, s_n) = \mathbb{E}\left[s_1^{X_1} \cdots s_n^{X_n}\right] = \sum_{k_1 = 0}^\infty \cdots \sum_{k_n = 0}^\infty p_{k_1, \dots, k_n} s_1^{k_1} \cdots s_n^{k_n},where the series converges absolutely for |s_i| \leq 1 for each i = 1, \dots, n.[9]The marginal probability-generating function for any single component, say X_j, is obtained by setting s_i = 1 for all i \neq j, yielding G_{X_j}(s_j) = G(1, \dots, s_j, \dots, 1). Similarly, the joint PGF for a subset of the variables is recovered by setting the remaining arguments to 1.[10]By the uniqueness of multivariate power series within their domain of convergence, the joint PMF is uniquely determined by the PGF, as the coefficients p_{k_1, \dots, k_n} can be extracted viap_{k_1, \dots, k_n} = \frac{1}{k_1! \cdots k_n!} \left. \frac{\partial^{k_1 + \cdots + k_n} G(s_1, \dots, s_n)}{\partial s_1^{k_1} \cdots \partial s_n^{k_n}} \right|_{s_1 = \cdots = s_n = 0}.This extends the univariate uniqueness result to higher dimensions.[11]When the variables are independent, the joint PGF factors as the product of the individual univariate PGFs, G(s_1, \dots, s_n) = \prod_{i=1}^n G_{X_i}(s_i). The case n=1 reduces directly to the univariate PGF.[10]
Basic Properties
Analytic properties
The probability-generating function (PGF) of a non-negative integer-valued random variable X admits a power series representation G(s) = \sum_{k=0}^\infty p_k s^k, where p_k = P(X = k) \geq 0 are the probability mass function coefficients satisfying \sum_{k=0}^\infty p_k = 1. This formal power series encodes the full distribution of X, and the non-negativity and normalization of the coefficients guarantee that G(s) is analytic (holomorphic) within the open unit disk |s| < 1. The analyticity arises because the series converges uniformly on compact subsets of the disk, allowing term-by-term differentiation and integration, which underpins many manipulations in probability theory.[7][12]The radius of convergence R of the PGF power series satisfies R \geq 1, ensuring convergence at least on the closed unit disk boundary in the real case, though analytic continuation may extend beyond for specific distributions. This lower bound follows from the probability constraints, as the root test yields R = 1 / \limsup_{k \to \infty} p_k^{1/k} and \limsup_{k \to \infty} p_k^{1/k} \leq 1 since \sum p_k = 1. For distributions with finite support, R = \infty; for infinite support like the geometric distribution, R > 1; and for heavy-tailed cases, R = 1. Inside the disk of convergence, the PGF's analytic structure facilitates asymptotic analysis and singularity studies, often revealing tail behavior through the location of the dominant singularity.[13][12]A fundamental decomposition expresses the PGF via the functional equation G(s) = p_0 + s \, G_1(s), where p_0 = P(X=0) = G(0) and G_1(s) is the PGF of the conditional random variable X-1 given X \geq 1. Here, G_1(s) itself is a valid PGF because the conditional probabilities \{p_k / (1 - p_0) : k \geq 1\} form a proper distribution. This equation highlights the recursive structure inherent in non-negative integer distributions and is particularly useful in renewal and branching process contexts, where it aids in solving for extinction probabilities or iterated distributions.[14]Restricted to the real interval [0,1], the PGF G(s) is monotonically increasing and convex, with G(0) = P(X=0). Monotonicity stems from the first derivative G'(s) = \sum_{k=1}^\infty k p_k s^{k-1} \geq 0 for s \in [0,1], reflecting the cumulative nature of the series. Convexity follows from the second derivative G''(s) = \sum_{k=2}^\infty k(k-1) p_k s^{k-2} \geq 0, again due to non-negative coefficients, implying that G(s) lies above its tangents on [0,1]. These properties ensure G(s) maps [0,1] to [p_0, 1] in a smooth, arch-like manner, useful for graphical analysis of fixed points in iterative schemes.[15][11]The power series representation is unique, allowing inversion to recover the original probabilities without relying on derivatives via complex analysis. Specifically, Cauchy's integral formula givesp_k = \frac{1}{2\pi i} \oint_C \frac{G(s)}{s^{k+1}} \, ds,where C is any simple closed contour encircling the origin counterclockwise within the disk of analyticity |s| < R. This contour integral extracts the coefficient p_k directly from the analytic continuation of G(s), providing a rigorous basis for numerical and asymptotic recovery of distribution tails when closed forms are unavailable.[12][16]
Normalization and convergence
The probability generating function G(s) of a non-negative integer-valued random variable X with probability mass function p_k = P(X = k) for k = 0, 1, 2, \dots satisfies the normalization condition G(1) = \sum_{k=0}^\infty p_k = 1, reflecting the total probability of the distribution.[11] This equality holds directly from substituting s = 1 into the definition G(s) = \sum_{k=0}^\infty p_k s^k.[17] For proper probability distributions, the limit \lim_{s \to 1^-} G(s) = 1 also obtains, ensured by the dominated convergence theorem applied to the series, as the terms p_k s^k converge pointwise to p_k and are bounded by p_k.[11]At the lower boundary, G(0) = p_0 = P(X = 0), since all terms with k \geq 1 vanish when s = 0.[11] The function G(s) is monotonically non-decreasing on the interval [0, 1], a consequence of its derivative G'(s) = \sum_{k=1}^\infty k p_k s^{k-1} \geq 0 for s \in [0, 1], given the non-negativity of the coefficients k p_k and powers of s.[18]Regarding convergence, the power series defining G(s) has radius of convergence at least 1, as the coefficients p_k \leq 1 ensure absolute convergence for |s| \leq 1.[11] At s = 1, convergence always holds due to the normalization \sum p_k < \infty. For |s| > 1, the series may diverge if X has infinite support and higher moments do not exist, though analytic continuation is possible in some cases where the distribution permits.[11]Tail probabilities can be analyzed via the associated tail generating function T(s) = \sum_{n=0}^\infty P(X > n) s^n = \frac{1 - G(s)}{1 - s} for $0 \leq s < 1, which encapsulates the cumulative behavior beyond each n and relates to large deviation principles through optimization over s. This form follows from interchanging sums: \sum_n s^n \sum_{k > n} p_k = \sum_k p_k \sum_{n=0}^{k-1} s^n = \sum_k p_k \frac{1 - s^k}{1 - s}.For defective distributions, where the probabilities sum to less than 1 (e.g., sub-probability measures in branching processes with positive absorption probability), the generating function satisfies G(1) < 1, with $1 - G(1) representing the defect or probability of an unattainable state like infinity. In such cases, the monotonicity and boundary behaviors at s = 0 persist, but the limit as s \to 1^- equals the defective total mass rather than 1.
Moments and Derivatives
Ordinary moments
The ordinary moments of a non-negative integer-valued random variable X with probability-generating function G(s) = \mathbb{E}[s^X] can be derived from derivatives of G(s) evaluated at s=1, leveraging the connection to the moment-generating function M(t) = G(e^t). The first ordinary moment, or expected value, is directly \mathbb{E}[X] = G'(1).[11][4]For higher ordinary moments \mathbb{E}[X^n] with n \geq 2, these are given by the nth derivative of the moment-generating function evaluated at t=0, i.e., \mathbb{E}[X^n] = M^{(n)}(0). Since M(t) is the composition G(e^t), the derivatives M^{(n)}(t) can be computed using Faà di Bruno's formula for higher-order chain rules or via recursive differentiation of the composition.[19]A key example is the second ordinary moment, which underlies the variance formula \mathrm{Var}(X) = G''(1) + G'(1) - [G'(1)]^2. To derive this, start with the PGF G(s) = \sum_{k=0}^\infty p_k s^k, where p_k = \mathbb{P}(X=k). The first derivative is G'(s) = \sum_{k=1}^\infty k p_k s^{k-1}, so evaluating at s=1 yields G'(1) = \sum_{k=1}^\infty k p_k = \mathbb{E}[X]. The second derivative is G''(s) = \sum_{k=2}^\infty k(k-1) p_k s^{k-2}, so G''(1) = \sum_{k=2}^\infty k(k-1) p_k = \mathbb{E}[X(X-1)]. Then, \mathbb{E}[X^2] = \mathbb{E}[X(X-1) + X] = G''(1) + G'(1), and the variance follows as \mathbb{E}[X^2] - [\mathbb{E}[X]]^2.[11][4]Central moments, such as the second central moment \mathbb{E}[(X - \mathbb{E}[X])^2] = \mathrm{Var}(X), can be expressed in terms of ordinary moments using binomial expansions. More generally, they relate to cumulants derived from the cumulant-generating function \log M(t), whose derivatives at t=0 give the cumulants; the first cumulant is \mathbb{E}[X], the second is \mathrm{Var}(X), and higher cumulants connect to central moments via standard recursive relations.[20][19]Evaluating derivatives at s=1 assumes the PGF is analytic in a neighborhood of 1, which holds since the radius of convergence is at least 1 (as G(1)=1); if the radius is exactly 1, analytic continuation or limits as s \to 1^- may be needed, but this is typically feasible for distributions with finite moments.[11][4]
Factorial moments
The nth factorial moment of a non-negative integer-valued random variable X is defined as the expected value of the falling factorial, \mathbb{E}[X(X-1)\cdots(X-n+1)], which equals the nth derivative of the probability generating function G(s) evaluated at s=1: \mathbb{E}[X(X-1)\cdots(X-n+1)] = G^{(n)}(1).[21]To see this, consider the form of the nth derivative of G(s) = \sum_{k=0}^\infty p_k s^k:G^{(n)}(s) = \sum_{k=n}^\infty p_k \, k(k-1)\cdots(k-n+1) \, s^{k-n}.Evaluating at s=1 yields precisely the factorial moment, as the coefficient k(k-1)\cdots(k-n+1) is the falling factorial for each term.[21]This direct computation via differentiation provides a straightforward way to obtain factorial moments from the PGF, leveraging the combinatorial structure inherent in the generating function's power series expansion.[21]The ordinary moments can be expressed in terms of factorial moments using Stirling numbers of the second kind S(n,k), which count the number of ways to partition n objects into k non-empty subsets: \mathbb{E}[X^n] = \sum_{k=0}^n S(n,k) \, \mathbb{E}[X(X-1)\cdots(X-k+1)].[22]For discrete distributions, factorial moments derived from the PGF are particularly advantageous, as they simplify calculations involving convolutions of independent variables and applications of the binomial theorem, reflecting the natural discrete combinatorial underpinnings.[21]
Operations on Generating Functions
Sums of independent variables
One of the key advantages of probability-generating functions (PGFs) lies in their ability to simplify the analysis of sums of independent random variables. Specifically, if X and Y are independent non-negative integer-valued random variables with PGFs G_X(s) = \mathbb{E}[s^X] and G_Y(s) = \mathbb{E}[s^Y], respectively, then the PGF of their sum Z = X + Y is the product of the individual PGFs:G_Z(s) = G_X(s) G_Y(s).[23]This property follows directly from the definition of independence and the PGF. To see this, note thatG_Z(s) = \mathbb{E}[s^{X+Y}] = \sum_{k=0}^\infty s^k P(Z = k) = \sum_{r=0}^\infty \sum_{t=0}^\infty s^{r+t} P(X = r, Y = t).By independence, P(X = r, Y = t) = P(X = r) P(Y = t), so the double sum separates intoG_Z(s) = \left( \sum_{r=0}^\infty s^r P(X = r) \right) \left( \sum_{t=0}^\infty s^t P(Y = t) \right) = G_X(s) G_Y(s).[23]The result extends naturally to the sum of any finite number of independent random variables. For n independent and identically distributed (i.i.d.) random variables X_1, \dots, X_n each with PGF G_X(s), the PGF of S_n = \sum_{i=1}^n X_i isG_{S_n}(s) = [G_X(s)]^n.
$$ This multiplicative structure avoids the need to compute the probability mass function (PMF) of the sum directly via convolution, as the PMF of $S_n$ is the $n$-fold convolution of the PMF of $X$, which can be cumbersome for large $n$. Instead, the product's series expansion yields the coefficients corresponding to the convolved PMF efficiently.[](https://www.cl.cam.ac.uk/teaching/0304/Probability/prob06.pdf)
In the multivariate setting, the property holds analogously for independent random vectors. If $\mathbf{X} = (X_1, \dots, X_m)$ and $\mathbf{Y} = (Y_1, \dots, Y_k)$ are independent multivariate non-negative integer-valued random vectors with joint PGFs $G_\mathbf{X}(s_1, \dots, s_m) = \mathbb{E}[s_1^{X_1} \cdots s_m^{X_m}]$ and $G_\mathbf{Y}(t_1, \dots, t_k) = \mathbb{E}[t_1^{Y_1} \cdots t_k^{Y_k}]$, then the joint PGF of the concatenated vector $(\mathbf{X}, \mathbf{Y})$ is the product $G_\mathbf{X}(s_1, \dots, s_m) G_\mathbf{Y}(t_1, \dots, t_k)$. For the componentwise sum of independent multivariate vectors of the same dimension, the joint PGF is likewise the product of the individual joint PGFs.
The product form also facilitates the computation of moments for the sum. Since moments (and factorial moments) are obtained via derivatives of the PGF evaluated at $s = 1$, the product rule applies: for independent $X$ and $Y$, the mean of $Z = X + Y$ is $\mathbb{E}[Z] = G_Z'(1) = G_X'(1) G_Y(1) + G_X(1) G_Y'(1) = \mathbb{E}[X] + \mathbb{E}[Y]$, as $G_X(1) = G_Y(1) = 1$. Similarly, the variance follows from the second derivative: $\mathrm{Var}(Z) = G_Z''(1) + G_Z'(1) - [G_Z'(1)]^2 = \mathrm{Var}(X) + \mathrm{Var}(Y)$, confirming the additivity of variances for independent variables.[](https://www.cl.cam.ac.uk/teaching/0304/Probability/prob06.pdf)
### Compound distributions
In probability theory, a compound distribution arises when considering a random sum $ Z = \sum_{i=1}^N X_i $, where $ N $ is a non-negative integer-valued random variable representing the number of terms, and the $ X_i $ are independent and identically distributed non-negative integer-valued random variables independent of $ N $.[](https://www.stat.auckland.ac.nz/~fewster/325/notes/ch4.pdf)
The probability generating function (PGF) of $ Z $ is given by the functional composition $ G_Z(s) = G_N(G_X(s)) $, where $ G_N(s) $ and $ G_X(s) $ are the PGFs of $ N $ and $ X_1 $, respectively.[](https://www.stat.auckland.ac.nz/~fewster/325/notes/ch4.pdf) This result follows from the law of total expectation: G_Z(s) = \mathbb{E}[s^Z] = \mathbb{E}\left[ \mathbb{E}[s^Z \mid N] \right] = \mathbb{E}\left[ (G_X(s))^N \right] = G_N(G_X(s)),assuming the expectations exist within the radius of convergence.[](https://www.stat.auckland.ac.nz/~fewster/325/notes/ch4.pdf)
Special cases include the geometric compound, which yields the negative binomial distribution when $ N $ follows a geometric distribution, and the compound Poisson distribution when $ N $ is Poisson-distributed.[](https://www.stat.auckland.ac.nz/~fewster/325/notes/ch4.pdf) These compositions simplify the analysis of aggregate processes, such as total claims in insurance where $ N $ counts events and each $ X_i $ represents claim size.[](https://galton.uchicago.edu/~yibi/teaching/stat244/L10.pdf)
The moments of $ Z $ can be derived using differentiation of the PGF and the chain rule. The mean is $ \mathbb{E}[Z] = \mathbb{E}[N] \mathbb{E}[X_1] $, obtained as $ G_Z'(1) = G_N'(G_X(1)) G_X'(1) = G_N'(1) G_X'(1) $.[](https://www.stat.auckland.ac.nz/~fewster/325/notes/ch4.pdf) The variance follows from the law of total variance: \text{Var}(Z) = \mathbb{E}[N] \text{Var}(X_1) + \text{Var}(N) (\mathbb{E}[X_1])^2,which decomposes into the expected variance from the sum given $ N $ and the variance from the randomness in $ N $.[](https://galton.uchicago.edu/~yibi/teaching/stat244/L10.pdf)
For the multivariate case, consider a compound vector sum $ \mathbf{Z} = \sum_{i=1}^N \mathbf{X}_i $, where each $ \mathbf{X}_i $ is a $ k $-dimensional random vector with joint PGF $ G_{\mathbf{X}}(s_1, \dots, s_k) = \mathbb{E}[s_1^{X_{i1}} \cdots s_k^{X_{ik}}] $, and $ N $ is independent of the $ \mathbf{X}_i $. The joint PGF of $ \mathbf{Z} $ is then $ G_{\mathbf{Z}}(s_1, \dots, s_k) = G_N(G_{\mathbf{X}}(s_1, \dots, s_k)) $, extending the univariate composition to joint distributions.[](https://www.sciencedirect.com/science/article/pii/S0377042719303747)
## Examples
### Finite support distributions
For distributions with finite support, the probability-generating function (PGF) takes the form of a polynomial whose degree equals the maximum possible value of the random variable, facilitating straightforward extraction of probabilities as coefficients.[](https://people.engr.tamu.edu/andreas-klappenecker/csce658-s18/genfunc.pdf)
The Bernoulli distribution, where a random variable $X$ takes value 1 with probability $p$ and 0 with probability $1-p$, has PGF $G_X(s) = 1 - p + p s$.[](https://people.engr.tamu.edu/andreas-klappenecker/csce658-s18/genfunc.pdf) The expected value is $E[X] = p$, obtained by differentiating the PGF and evaluating at $s=1$, while the variance is $\operatorname{Var}(X) = p(1-p)$.[](https://www.math.purdue.edu/~stindel/teaching/ma532/slides/generating-functions.pdf)
The binomial distribution arises as the sum of $n$ independent Bernoulli trials with success probability $p$, yielding PGF $G_X(s) = (1 - p + p s)^n$.[](https://people.engr.tamu.edu/andreas-klappenecker/csce658-s18/genfunc.pdf) This form follows directly from the product of individual Bernoulli PGFs under independence.[](https://people.engr.tamu.edu/andreas-klappenecker/csce658-s18/genfunc.pdf)
For a discrete uniform distribution on $\{0, 1, \dots, m\}$, where each outcome has probability $1/(m+1)$, the PGF isG_X(s) = \frac{1 - s^{m+1}}{(m+1)(1 - s)}for $s \neq 1$, derived as the partial sum of a geometric series.[](https://willperkins.org/6221/slides/generating.pdf)
The hypergeometric distribution models the number of successes in $n$ draws without replacement from a finite population of size $N$ containing $K$ successes, with PGF given byG_X(s) = \frac{(N - n)!(N - K)!}{N!(N - K - n)!} , {}_2F_1(-n, -K; N - K - n + 1; s)for $n \leq N - K$.[](https://arxiv.org/abs/2407.20677)
Since the support is bounded, the PGF is a polynomial, allowing probabilities to be recovered by expanding and reading coefficients or using finite differences for inversion.[](https://people.engr.tamu.edu/andreas-klappenecker/csce658-s18/genfunc.pdf) Computations involve direct summation over the support for small sizes or exploiting closed-form expressions via the binomial theorem or hypergeometric identities.[](https://www.stat.auckland.ac.nz/~fewster/325/notes/ch4.pdf)
### Infinite support distributions
The probability-generating function (PGF) for discrete random variables with infinite support typically results in non-polynomial expressions, such as exponentials or rational functions, reflecting the unbounded nature of the support. These forms arise from infinite series expansions of the probability mass functions and often allow for analytic continuation beyond the unit disk, with the radius of convergence determined by the tail behavior of the distribution.[](https://user.eng.umd.edu/~abarg/620/GF.pdf)
A canonical example is the Poisson distribution, where $X \sim \mathrm{Poisson}(\lambda)$ for $\lambda > 0$, with probability mass function $P(X = k) = e^{-\lambda} \frac{\lambda^k}{k!}$ for $k = 0, 1, 2, \dots$. The PGF is derived directly from the series: G(s) = \sum_{k=0}^{\infty} e^{-\lambda} \frac{(\lambda s)^k}{k!} = e^{-\lambda} e^{\lambda s} = e^{\lambda(s-1)},valid for all complex $s$ since the exponential is an entire function, implying an infinite radius of convergence. This exponential form encapsulates the light-tailed decay of the Poisson probabilities. The mean is $E[X] = G'(1) = \lambda$ and variance is $\mathrm{Var}(X) = G''(1) + G'(1) - [G'(1)]^2 = \lambda$.[](https://www.uvm.edu/~mrombach/373_rombach_week5.pdf)[](http://galton.uchicago.edu/~lalley/Courses/312/PoissonProcesses.pdf)
For the geometric distribution, consider $X \sim \mathrm{Geometric}(p)$ counting the number of trials until the first success in independent [Bernoulli](/page/Bernoulli) trials with success probability $p \in (0,1)$, so $P(X = k) = (1-p)^{k-1} p$ for $k = 1, 2, \dots$. The PGF is G(s) = \sum_{k=1}^{\infty} p (1-p)^{k-1} s^k = \frac{p s}{1 - (1-p) s},for $|s| < \frac{1}{1-p}$, with a pole at $s = \frac{1}{1-p} > 1$, indicating convergence beyond the unit interval and reflecting the geometric tail decay. A zero-start variant, counting failures before the first success ($X = 0, 1, 2, \dots$), shifts the support and yields $G(s) = \frac{p}{1 - (1-p) s}$.[](https://www.stat.uchicago.edu/~yibi/teaching/stat317/2021/Lectures/Lecture8.pdf)
The [negative binomial distribution](/page/Negative_binomial_distribution) generalizes the geometric, modeling the number of failures until $r$ successes, where $r > 0$ is the number of successes and $p \in (0,1)$ the success probability, so $X \sim \mathrm{NB}(r, p)$ with $P(X = k) = \binom{k + r - 1}{k} p^r (1-p)^k$ for $k = 0, 1, 2, \dots$. The PGF is G(s) = \left( \frac{p}{1 - (1-p) s} \right)^r,for $|s| < \frac{1}{1-p}$, obtained as the PGF of a sum of $r$ independent geometric random variables (each counting failures until one success). This rational form, akin to the Pascal distribution, highlights the compound structure and heavier tails compared to the Poisson, with convergence radius exceeding 1 due to the exponential decay rate.[](http://www.stat.yale.edu/~pollard/Courses/241.fall2005/notes2005/Generating.pdf)[](https://math.arizona.edu/~tgk/464_10/chap4_9_29.pdf)
## Applications
### Branching processes
In branching processes, particularly the Galton–Watson process, the probability generating function (PGF) provides a powerful tool for analyzing population dynamics over discrete generations. The process begins with an initial population size $ Z_0 = 1 $, where each individual independently produces a random number of offspring according to a fixed probability distribution with PGF $ f(s) = \sum_{k=0}^\infty p_k s^k $, where $ p_k $ is the probability of having $ k $ offspring. The population size at the next generation, $ Z_{n+1} $, is the sum of $ Z_n $ independent copies of the offspring random variable. Consequently, the PGF of $ Z_n $ satisfies the iterative relation $ G_{Z_n}(s) = f^{\circ n}(s) $, where $ f^{\circ n} $ denotes the $ n $-th functional iterate of $ f $. This iteration captures the recursive nature of population growth and enables exact computation of the distribution of $ Z_n $ in principle.[](https://link.springer.com/book/10.1007/978-3-642-65371-1)
A central quantity in the Galton–Watson process is the ultimate extinction probability $ \eta $, defined as the probability that the population eventually dies out, i.e., $ \eta = \lim_{n \to \infty} P(Z_n = 0) $. This probability is the smallest $ s \in [0,1] $ satisfying the fixed-point equation $ \eta = f(\eta) $. Starting from $ s_0 = 0 $, the sequence $ s_{k+1} = f(s_k) $ converges monotonically to $ \eta $, providing a practical method to solve for it. The mean offspring number $ m = f'(1) $ determines the behavior: if $ m \leq 1 $, then $ \eta = 1 $ (certain extinction); if $ m > 1 $, then $ \eta < 1 $ (positive survival probability). These cases classify the process as subcritical ($ m < 1 $), critical ($ m = 1 $), or supercritical ($ m > 1 $).[](https://www.rand.org/content/dam/rand/pubs/reports/2009/R381.pdf)
In the supercritical case, while the expected population size grows as $ m^n $, the variance of $ Z_n $ is $ \sigma^2 m^{n-1} (m^n - 1)/(m - 1) $, where $ \sigma^2 = f''(1) + m - m^2 $ is the offspring variance, scaling asymptotically as $ \sigma^2 m^{2n-1}/(m - 1) $. Conditioned on non-extinction, the variance exhibits similar explosive [growth](/page/Growth) of [order](/page/Order) $ m^{2n} $. This reflects the increasing [uncertainty](/page/Uncertainty) in population trajectories as generations progress, despite the drift toward [growth](/page/Growth).[](https://link.springer.com/book/10.1007/978-3-642-65371-1)
The use of PGFs to model such processes originated with Irénée-Jules Bienaymé, who in 1845 analyzed the extinction of lineages using an early form of the fixed-point equation, though his work remained obscure until rediscovered in the 1960s. Independently, [Francis Galton](/page/Francis_Galton) and Henry [Watson](/page/Watson) developed the model in the [1870s](/page/1870s) to investigate the persistence of aristocratic family names in [Britain](/page/Britain), posing the extinction problem that the PGF iteration resolves. Their 1875 paper laid the foundation, with PGFs proving essential for deriving the extinction criterion and growth behaviors.[](https://www.jstor.org/stable/2841222)
### Queueing and stochastic processes
In queueing theory, probability generating functions (PGFs) transform the infinite system of balance equations for Markov chains into solvable algebraic equations, enabling the derivation of steady-state distributions for queue lengths and waiting times. This approach is especially valuable for analyzing single-server queues where direct solutions are intractable, allowing researchers to obtain closed-form expressions or numerically invert PGFs for probabilities. By focusing on embedded processes at key epochs, such as departures, PGFs capture the [stochastic](/page/Stochastic) behavior of systems under stability conditions like traffic intensity ρ < 1.[](https://www.columbia.edu/~ww2040/6711F12/lect1101.pdf)
A seminal application is the M/G/1 queue, featuring Poisson arrivals at rate λ and general service times with Laplace-Stieltjes transform B^*(s) and mean 1/μ, yielding ρ = λ/μ. The PGF of the stationary queue length distribution, observed at departure epochs via the embedded Markov chain, is provided by the Pollaczek-Khinchine formula:
\pi(z) = (1 - \rho) \frac{(1 - z) B^(\lambda (1 - z))}{B^(\lambda (1 - z)) - z}, \quad |z| \leq 1.
This expression, originally derived in the 1930s and widely used since, fully characterizes the distribution; for instance, the mean queue length follows from π'(1) = ρ + \frac{\lambda^2 E[S^2]}{2(1 - \rho)}, where S is service time. For the discrete analog in the M/D/1 queue with deterministic unit service time, the PGF simplifies by substituting B^*(s) = e^{-s}, facilitating analysis of constant service scenarios common in deterministic scheduling.[](https://www.columbia.edu/~ww2040/6711F12/lect1101.pdf)[](https://iadan.win.tue.nl/que/h9.pdf)
In discrete-time queues like the Geo/Geo/1 model, with Bernoulli arrivals of probability λ and Bernoulli service completions of probability μ per slot (assuming λ < μ for stability), the stationary queue length follows a geometric distribution π_i = (1 - ρ) ρ^i for i ≥ 0, where ρ = \frac{λ (1 - μ)}{μ (1 - λ)}. The corresponding PGF is P(z) = \frac{1 - ρ}{1 - ρ z}, obtained by solving the balance equations of the embedded Markov chain at slot ends. This form arises from the birth-death structure, with PGFs converting the recursive relations into a rational function solvable for moments, such as the mean queue length ρ / (1 - ρ).[](https://www.cs.tulane.edu/~zzheng3/teaching/cmps6750/spring20/queue.pdf)
PGFs also prove essential for embedded Markov chains in analyzing transient behaviors, such as busy periods—the duration from an arrival to an idle state. In the M/M/1 queue (exponential service at rate μ), the PGF f(s) for the number of customers served during a busy period satisfies the functional equation f(s) = s \exp\left(\rho (f(s) - 1)\right), where ρ = λ/μ; this is solved iteratively or numerically to yield the distribution. A discrete-time counterpart appears in Geo/Geo/1, where the busy period PGF solves a similar quadratic or higher-order equation derived from return-time probabilities in the chain. These equations transform the integral or recursive definitions of busy periods into algebraic forms amenable to computation.[](https://www.netlab.tkk.fi/opetus/s383143/kalvot/E_mg1jono.pdf)
Beyond basic queues, PGFs analyze first passage times in random walks modeling queue overflows or delays. For a simple symmetric random walk starting at 1, the PGF F(z) of the first passage time to 0 (analogous to [ruin](/page/Ruin)) satisfies $ F(z) = \frac{1 - \sqrt{1 - z^2}}{z} $, derived from the step generating function and conditioning on the first step; extensions to biased walks yield $ F(z) = \frac{1 - \sqrt{1 - 4 p q z^2}}{2 p z} $, with p + q = 1. In [gambler's ruin](/page/Gambler's_ruin) with finite capital N, absorbing at 0 or N, the PGF for hitting times incorporates boundary conditions, solving via difference equations transformed by generating functions to compute ruin probabilities as $ \frac{ (q/p)^k - (q/p)^N }{ 1 - (q/p)^N } $ for p ≠ q. Infinite extensions model unbounded queues, where PGFs quantify tail probabilities for large deviations.[](http://galton.uchicago.edu/~lalley/Courses/312/RW.pdf)
These techniques extend to practical applications, such as traffic flow modeling where queues at intersections or bottlenecks are analyzed using M/G/1-like PGFs to predict congestion probabilities and delays under variable demand. In reliability engineering, PGFs evaluate stochastic processes for system availability in Markovian repair models, solving for steady-state uptime distributions via embedded chains. Numerical methods, like root-finding or series inversion of PGFs, enable simulation-free computation of performance metrics in these contexts, often referencing compound sums briefly for batch effects in arrivals.[](https://www.sciencedirect.com/science/article/pii/S0307904X10000880)
## Related Generating Functions
### Moment-generating function
The moment-generating function (MGF) of a random variable $X$ is defined as $M_X(t) = \mathbb{E}[e^{tX}]$, where it is typically finite for $t$ in some open interval containing 0.[](https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/04:_Expected_Value/4.06:_Generating_Functions) This function encodes the moments of $X$ through its derivatives evaluated at $t = 0$, with the $n$th moment given by $\mathbb{E}[X^n] = M_X^{(n)}(0)$.[](http://www.columbia.edu/~ww2040/4701Sum07/lec0712.pdf) In contrast to the [probability-generating function (PGF)](/page/probability-generating_function), which is tailored to non-negative integer-valued random variables, the MGF applies more broadly to real-valued random variables, including continuous distributions, and relates to the [Laplace transform](/page/Laplace_transform) of the distribution.[](https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/04:_Expected_Value/4.06:_Generating_Functions)
For a non-negative integer-valued random variable $X$ with PGF $G_X(s) = \mathbb{E}[s^X]$, the MGF and PGF are connected via the substitution $s = e^t$, yielding $M_X(t) = G_X(e^t)$ for $t$ in an appropriate interval where the PGF converges (typically requiring the radius of convergence $R > 1$).[](https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/04:_Expected_Value/4.06:_Generating_Functions)[](http://www.columbia.edu/~ww2040/4701Sum07/lec0712.pdf) This mapping transforms the unit disk $|s| \leq 1$ of the PGF into the right half-plane $\operatorname{Re}(t) \geq 0$ for the MGF, facilitating analysis of moments and tail behavior. The MGF provides ordinary moments directly: $\mathbb{E}[X^n] = M_X^{(n)}(0)$. In contrast, derivatives of the PGF at $s=1$ yield factorial moments $\mathbb{E}[X(X-1)\cdots(X-n+1)] = G_X^{(n)}(1)$, from which ordinary moments can be obtained (e.g., using Stirling numbers of the second kind). The substitution relates the two, enabling ordinary moments to be computed from the PGF via the chain rule applied to the composition.[](https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/04:_Expected_Value/4.06:_Generating_Functions)
The PGF offers advantages over the MGF in [discrete](/page/Discrete) settings by directly encoding the [probability mass function](/page/Probability_mass_function) as coefficients in its [power series](/page/Power_series) expansion, making it straightforward to extract probabilities and handle convolutions for sums of [independent](/page/Independent) [discrete](/page/Discrete) random variables—operations that resemble [multiplication](/page/Multiplication) of polynomials rather than the more Laplace-transform-like manipulations of the MGF.[](https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/04:_Expected_Value/4.06:_Generating_Functions)[](http://www.columbia.edu/~ww2040/4701Sum07/lec0712.pdf) However, the PGF is limited to non-negative [integer](/page/Integer) support and convergence within $|s| \leq 1$, whereas the MGF accommodates continuous distributions, negative values, and broader domains, though it may fail to exist even when all moments do.[](https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/04:_Expected_Value/4.06:_Generating_Functions)
Historically, the MGF has been viewed as more general due to its ties to classical analysis tools like the [Laplace transform](/page/Laplace_transform), finding wide use in applied probability and statistics.[](https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/04:_Expected_Value/4.06:_Generating_Functions) In contrast, the PGF is preferred in [combinatorics](/page/Combinatorics) and discrete probability for its intuitive alignment with enumerative techniques, enabling [asymptotic analysis](/page/Asymptotic_analysis) of combinatorial parameters and limit laws through probabilistic interpretations of generating functions.[](https://ac.cs.princeton.edu/home/AC.pdf)
### Characteristic function
The characteristic function of a random variable $X$ is defined as $\phi(t) = \mathbb{E}[e^{itX}]$ for real $t$, where $i = \sqrt{-1}$. This function exists for every [probability distribution](/page/Probability_distribution) on [the real](/page/The_Real) line and is uniformly continuous, with $\phi(0) = 1$ and $|\phi(t)| \leq 1$. For [discrete](/page/Discrete) random variables supported on the integers, $\phi(t) = \sum_k p_k e^{itk}$, where $p_k = \mathbb{P}(X = k)$, making $\phi(t)$ a trigonometric sum that is periodic with period $2\pi$. More generally, for distributions supported on a [lattice](/page/Lattice) with span $h > 0$, the characteristic function is periodic with period $2\pi/h$.
For non-negative integer-valued random variables, the characteristic function relates directly to the probability-generating function $G(s) = \mathbb{E}[s^X]$ via the substitution $s = e^{it}$ on the unit circle in the [complex plane](/page/Complex_plane), yielding $\phi(t) = G(e^{it})$. This connection highlights how the characteristic function evaluates the PGF along the boundary of its domain of analyticity. However, unlike the PGF, which directly encodes the [probability mass function](/page/Probability_mass_function) as coefficients in its [power series](/page/Power_series) expansion, the characteristic function does not provide straightforward access to these coefficients without additional inversion steps.
The primary advantage of the characteristic function over the probability-generating function lies in its universality: it applies to all real-valued distributions, including continuous ones and those allowing negative values, whereas the PGF is restricted to non-negative [integer](/page/Integer) supports. The characteristic function facilitates recovery of the [probability density function](/page/Probability_density_function) (if it exists) or the [cumulative distribution function](/page/Cumulative_distribution_function) through the [Fourier inversion theorem](/page/Fourier_inversion_theorem), which states that under suitable integrability conditions, the density $f(x)$ satisfies $f(x) = \frac{1}{2\pi} \int_{-\infty}^{\infty} e^{-itx} \phi(t) \, dt$. For discrete cases, analogous inversion formulas recover the [probability mass function](/page/Probability_mass_function). Both the characteristic function and the PGF uniquely determine their respective distributions, but the characteristic function's uniqueness is established via Lévy's continuity theorem, which asserts that [pointwise convergence](/page/Pointwise_convergence) of characteristic functions implies convergence in [distribution](/page/Distribution).
Despite these strengths, the characteristic function has limitations compared to the PGF in discrete settings. Extracting individual probabilities from $\phi(t)$ requires complex inversion rather than simple series coefficients, and while products of characteristic functions handle convolutions for sums of [independent](/page/Independent) variables (just as with PGFs), the complex-valued nature makes numerical computations more involved for lattice distributions. For discrete random variables, the characteristic function is an [entire function](/page/Entire_function) in the complex plane when extended appropriately, reflecting the analytic properties inherited from the underlying exponential terms.