Probability-generating function

The probability-generating function (PGF), also known as the probability generating function, is a formal power series associated with a discrete random variable X that takes non-negative integer values, defined as G(s) = \mathbb{E}[s^X] = \sum_{k=0}^{\infty} p_k s^k, where p_k = P(X = k) denotes the probability mass function and |s| \leq 1 to ensure convergence.^[1] This function encodes the entire probability distribution of X in its coefficients, which can be recovered by evaluating derivatives at s = 0, such as P(X = k) = \frac{G^{(k)}(0)}{k!}.^[2] Key properties of the PGF include its utility in computing moments: the mean is given by \mathbb{E}[X] = G'(1), the second factorial moment by \mathbb{E}[X(X-1)] = G''(1), and higher-order moments via further derivatives evaluated at s = 1, with the variance expressed as \mathrm{Var}(X) = G''(1) + G'(1) - [G'(1)]^2.^[3] For independent random variables X and Y, the PGF of their sum X + Y is the product G_{X+Y}(s) = G_X(s) G_Y(s), facilitating the analysis of convolutions and compound distributions.^[4] These properties make the PGF particularly suitable for non-negative integer-valued random variables, distinguishing it from the more general moment-generating function used for continuous or unbounded discrete distributions.^[2] PGFs find extensive applications in probability theory and related fields, such as deriving the distribution of sums of independent random variables, analyzing branching processes—where the extinction probability satisfies \theta = G(\theta)—and modeling queueing systems or renewal processes.^[3] They are especially valuable for common discrete distributions like the binomial, Poisson, and negative binomial, where closed-form expressions simplify computations of probabilities and moments.^[5] In stochastic processes, PGFs enable efficient handling of generating functions for waiting times or offspring distributions, providing a bridge between combinatorial methods and probabilistic analysis.^[6]

Definition

Univariate case

The probability generating function (PGF) of a univariate discrete random variable X that takes non-negative integer values $0, 1, 2, \dots with probability mass function P(X = k) = p_k (where \sum_{k=0}^\infty p_k = 1 and p_k \geq 0) is defined as

G(s) = \mathbb{E}[s^X] = \sum_{k=0}^\infty p_k s^k.

This formal power series encodes the probabilities p_k as its coefficients.^[1]^[7] The PGF G(s) represents the expectation of s^X, providing a compact way to summarize the distribution of X. As a power series centered at s = 0, it expands to reveal the probability structure, with G(1) = 1 ensuring normalization when the series converges at the boundary. This formulation is particularly suited to non-negative integer-valued random variables, such as those modeling counts or occurrences in stochastic processes, because the non-negative integers align naturally with the exponents in the power series, facilitating analytic manipulations like differentiation for moments.^[8]^[1] The domain of G(s) is the complex unit disk |s| \leq 1, where the series converges absolutely due to the bounded coefficients p_k \leq 1. The radius of convergence is at least 1, as \limsup_{k \to \infty} |p_k|^{1/k} \leq 1, guaranteeing convergence within and on the boundary of the unit disk.^[8]^[7] Within the radius of convergence, the PGF uniquely determines the probabilities p_k, and vice versa: the coefficients are recoverable via p_k = \frac{G^{(k)}(0)}{k!}, establishing a one-to-one correspondence between the distribution and its generating function.^[1]^[8]

Multivariate case

In the multivariate case, the probability-generating function extends the univariate concept to a random vector (X_1, \dots, X_n) with non-negative integer-valued components and joint probability mass function p_{k_1, \dots, k_n} = P(X_1 = k_1, \dots, X_n = k_n). It is defined as

G(s_1, \dots, s_n) = \mathbb{E}\left[s_1^{X_1} \cdots s_n^{X_n}\right] = \sum_{k_1 = 0}^\infty \cdots \sum_{k_n = 0}^\infty p_{k_1, \dots, k_n} s_1^{k_1} \cdots s_n^{k_n},

where the series converges absolutely for |s_i| \leq 1 for each i = 1, \dots, n.^[9] The marginal probability-generating function for any single component, say X_j, is obtained by setting s_i = 1 for all i \neq j, yielding G_{X_j}(s_j) = G(1, \dots, s_j, \dots, 1). Similarly, the joint PGF for a subset of the variables is recovered by setting the remaining arguments to 1.^[10] By the uniqueness of multivariate power series within their domain of convergence, the joint PMF is uniquely determined by the PGF, as the coefficients p_{k_1, \dots, k_n} can be extracted via

p_{k_1, \dots, k_n} = \frac{1}{k_1! \cdots k_n!} \left. \frac{\partial^{k_1 + \cdots + k_n} G(s_1, \dots, s_n)}{\partial s_1^{k_1} \cdots \partial s_n^{k_n}} \right|_{s_1 = \cdots = s_n = 0}.

This extends the univariate uniqueness result to higher dimensions.^[11] When the variables are independent, the joint PGF factors as the product of the individual univariate PGFs, G(s_1, \dots, s_n) = \prod_{i=1}^n G_{X_i}(s_i). The case n=1 reduces directly to the univariate PGF.^[10]

Basic Properties

Analytic properties

The probability-generating function (PGF) of a non-negative integer-valued random variable X admits a power series representation G(s) = \sum_{k=0}^\infty p_k s^k, where p_k = P(X = k) \geq 0 are the probability mass function coefficients satisfying \sum_{k=0}^\infty p_k = 1. This formal power series encodes the full distribution of X, and the non-negativity and normalization of the coefficients guarantee that G(s) is analytic (holomorphic) within the open unit disk |s| < 1. The analyticity arises because the series converges uniformly on compact subsets of the disk, allowing term-by-term differentiation and integration, which underpins many manipulations in probability theory.^[7]^[12] The radius of convergence R of the PGF power series satisfies R \geq 1, ensuring convergence at least on the closed unit disk boundary in the real case, though analytic continuation may extend beyond for specific distributions. This lower bound follows from the probability constraints, as the root test yields R = 1 / \limsup_{k \to \infty} p_k^{1/k} and \limsup_{k \to \infty} p_k^{1/k} \leq 1 since \sum p_k = 1. For distributions with finite support, R = \infty; for infinite support like the geometric distribution, R > 1; and for heavy-tailed cases, R = 1. Inside the disk of convergence, the PGF's analytic structure facilitates asymptotic analysis and singularity studies, often revealing tail behavior through the location of the dominant singularity.^[13]^[12] A fundamental decomposition expresses the PGF via the functional equation G(s) = p_0 + s \, G_1(s), where p_0 = P(X=0) = G(0) and G_1(s) is the PGF of the conditional random variable X-1 given X \geq 1. Here, G_1(s) itself is a valid PGF because the conditional probabilities \{p_k / (1 - p_0) : k \geq 1\} form a proper distribution. This equation highlights the recursive structure inherent in non-negative integer distributions and is particularly useful in renewal and branching process contexts, where it aids in solving for extinction probabilities or iterated distributions.^[14] Restricted to the real interval [0,1], the PGF G(s) is monotonically increasing and convex, with G(0) = P(X=0). Monotonicity stems from the first derivative G'(s) = \sum_{k=1}^\infty k p_k s^{k-1} \geq 0 for s \in [0,1], reflecting the cumulative nature of the series. Convexity follows from the second derivative G''(s) = \sum_{k=2}^\infty k(k-1) p_k s^{k-2} \geq 0, again due to non-negative coefficients, implying that G(s) lies above its tangents on [0,1]. These properties ensure G(s) maps [0,1] to [p_0, 1] in a smooth, arch-like manner, useful for graphical analysis of fixed points in iterative schemes.^[15]^[11] The power series representation is unique, allowing inversion to recover the original probabilities without relying on derivatives via complex analysis. Specifically, Cauchy's integral formula gives

p_k = \frac{1}{2\pi i} \oint_C \frac{G(s)}{s^{k+1}} \, ds,

where C is any simple closed contour encircling the origin counterclockwise within the disk of analyticity |s| < R. This contour integral extracts the coefficient p_k directly from the analytic continuation of G(s), providing a rigorous basis for numerical and asymptotic recovery of distribution tails when closed forms are unavailable.^[12]^[16]

Normalization and convergence

The probability generating function G(s) of a non-negative integer-valued random variable X with probability mass function p_k = P(X = k) for k = 0, 1, 2, \dots satisfies the normalization condition G(1) = \sum_{k=0}^\infty p_k = 1, reflecting the total probability of the distribution.^[11] This equality holds directly from substituting s = 1 into the definition G(s) = \sum_{k=0}^\infty p_k s^k.^[17] For proper probability distributions, the limit \lim_{s \to 1^-} G(s) = 1 also obtains, ensured by the dominated convergence theorem applied to the series, as the terms p_k s^k converge pointwise to p_k and are bounded by p_k.^[11] At the lower boundary, G(0) = p_0 = P(X = 0), since all terms with k \geq 1 vanish when s = 0.^[11] The function G(s) is monotonically non-decreasing on the interval [0, 1], a consequence of its derivative G'(s) = \sum_{k=1}^\infty k p_k s^{k-1} \geq 0 for s \in [0, 1], given the non-negativity of the coefficients k p_k and powers of s.^[18] Regarding convergence, the power series defining G(s) has radius of convergence at least 1, as the coefficients p_k \leq 1 ensure absolute convergence for |s| \leq 1.^[11] At s = 1, convergence always holds due to the normalization \sum p_k < \infty. For |s| > 1, the series may diverge if X has infinite support and higher moments do not exist, though analytic continuation is possible in some cases where the distribution permits.^[11] Tail probabilities can be analyzed via the associated tail generating function T(s) = \sum_{n=0}^\infty P(X > n) s^n = \frac{1 - G(s)}{1 - s} for $0 \leq s < 1, which encapsulates the cumulative behavior beyond each n and relates to large deviation principles through optimization over s. This form follows from interchanging sums: \sum_n s^n \sum_{k > n} p_k = \sum_k p_k \sum_{n=0}^{k-1} s^n = \sum_k p_k \frac{1 - s^k}{1 - s}. For defective distributions, where the probabilities sum to less than 1 (e.g., sub-probability measures in branching processes with positive absorption probability), the generating function satisfies G(1) < 1, with $1 - G(1) representing the defect or probability of an unattainable state like infinity. In such cases, the monotonicity and boundary behaviors at s = 0 persist, but the limit as s \to 1^- equals the defective total mass rather than 1.

Moments and Derivatives

Ordinary moments

The ordinary moments of a non-negative integer-valued random variable X with probability-generating function G(s) = \mathbb{E}[s^X] can be derived from derivatives of G(s) evaluated at s=1, leveraging the connection to the moment-generating function M(t) = G(e^t). The first ordinary moment, or expected value, is directly \mathbb{E}[X] = G'(1).^[11]^[4] For higher ordinary moments \mathbb{E}[X^n] with n \geq 2, these are given by the nth derivative of the moment-generating function evaluated at t=0, i.e., \mathbb{E}[X^n] = M^{(n)}(0). Since M(t) is the composition G(e^t), the derivatives M^{(n)}(t) can be computed using Faà di Bruno's formula for higher-order chain rules or via recursive differentiation of the composition.^[19] A key example is the second ordinary moment, which underlies the variance formula \mathrm{Var}(X) = G''(1) + G'(1) - [G'(1)]^2. To derive this, start with the PGF G(s) = \sum_{k=0}^\infty p_k s^k, where p_k = \mathbb{P}(X=k). The first derivative is G'(s) = \sum_{k=1}^\infty k p_k s^{k-1}, so evaluating at s=1 yields G'(1) = \sum_{k=1}^\infty k p_k = \mathbb{E}[X]. The second derivative is G''(s) = \sum_{k=2}^\infty k(k-1) p_k s^{k-2}, so G''(1) = \sum_{k=2}^\infty k(k-1) p_k = \mathbb{E}[X(X-1)]. Then, \mathbb{E}[X^2] = \mathbb{E}[X(X-1) + X] = G''(1) + G'(1), and the variance follows as \mathbb{E}[X^2] - [\mathbb{E}[X]]^2.^[11]^[4] Central moments, such as the second central moment \mathbb{E}[(X - \mathbb{E}[X])^2] = \mathrm{Var}(X), can be expressed in terms of ordinary moments using binomial expansions. More generally, they relate to cumulants derived from the cumulant-generating function \log M(t), whose derivatives at t=0 give the cumulants; the first cumulant is \mathbb{E}[X], the second is \mathrm{Var}(X), and higher cumulants connect to central moments via standard recursive relations.^[20]^[19] Evaluating derivatives at s=1 assumes the PGF is analytic in a neighborhood of 1, which holds since the radius of convergence is at least 1 (as G(1)=1); if the radius is exactly 1, analytic continuation or limits as s \to 1^- may be needed, but this is typically feasible for distributions with finite moments.^[11]^[4]

Factorial moments

The nth factorial moment of a non-negative integer-valued random variable X is defined as the expected value of the falling factorial, \mathbb{E}[X(X-1)\cdots(X-n+1)], which equals the nth derivative of the probability generating function G(s) evaluated at s=1: \mathbb{E}[X(X-1)\cdots(X-n+1)] = G^{(n)}(1).^[21] To see this, consider the form of the nth derivative of G(s) = \sum_{k=0}^\infty p_k s^k:

G^{(n)}(s) = \sum_{k=n}^\infty p_k \, k(k-1)\cdots(k-n+1) \, s^{k-n}.

Evaluating at s=1 yields precisely the factorial moment, as the coefficient k(k-1)\cdots(k-n+1) is the falling factorial for each term.^[21] This direct computation via differentiation provides a straightforward way to obtain factorial moments from the PGF, leveraging the combinatorial structure inherent in the generating function's power series expansion.^[21] The ordinary moments can be expressed in terms of factorial moments using Stirling numbers of the second kind S(n,k), which count the number of ways to partition n objects into k non-empty subsets: \mathbb{E}[X^n] = \sum_{k=0}^n S(n,k) \, \mathbb{E}[X(X-1)\cdots(X-k+1)].^[22] For discrete distributions, factorial moments derived from the PGF are particularly advantageous, as they simplify calculations involving convolutions of independent variables and applications of the binomial theorem, reflecting the natural discrete combinatorial underpinnings.^[21]

Operations on Generating Functions

Sums of independent variables

One of the key advantages of probability-generating functions (PGFs) lies in their ability to simplify the analysis of sums of independent random variables. Specifically, if X and Y are independent non-negative integer-valued random variables with PGFs G_X(s) = \mathbb{E}[s^X] and G_Y(s) = \mathbb{E}[s^Y], respectively, then the PGF of their sum Z = X + Y is the product of the individual PGFs:

G_Z(s) = G_X(s) G_Y(s).

^[23] This property follows directly from the definition of independence and the PGF. To see this, note that

G_Z(s) = \mathbb{E}[s^{X+Y}] = \sum_{k=0}^\infty s^k P(Z = k) = \sum_{r=0}^\infty \sum_{t=0}^\infty s^{r+t} P(X = r, Y = t).

By independence, P(X = r, Y = t) = P(X = r) P(Y = t), so the double sum separates into

G_Z(s) = \left( \sum_{r=0}^\infty s^r P(X = r) \right) \left( \sum_{t=0}^\infty s^t P(Y = t) \right) = G_X(s) G_Y(s).

^[23] The result extends naturally to the sum of any finite number of independent random variables. For n independent and identically distributed (i.i.d.) random variables X_1, \dots, X_n each with PGF G_X(s), the PGF of S_n = \sum_{i=1}^n X_i is

G_{S_n}(s) = [G_X(s)]^n. $$ This multiplicative structure avoids the need to compute the probability mass function (PMF) of the sum directly via convolution, as the PMF of $S_n$ is the $n$-fold convolution of the PMF of $X$, which can be cumbersome for large $n$. Instead, the product's series expansion yields the coefficients corresponding to the convolved PMF efficiently.[](https://www.cl.cam.ac.uk/teaching/0304/Probability/prob06.pdf) In the multivariate setting, the property holds analogously for independent random vectors. If $\mathbf{X} = (X_1, \dots, X_m)$ and $\mathbf{Y} = (Y_1, \dots, Y_k)$ are independent multivariate non-negative integer-valued random vectors with joint PGFs $G_\mathbf{X}(s_1, \dots, s_m) = \mathbb{E}[s_1^{X_1} \cdots s_m^{X_m}]$ and $G_\mathbf{Y}(t_1, \dots, t_k) = \mathbb{E}[t_1^{Y_1} \cdots t_k^{Y_k}]$, then the joint PGF of the concatenated vector $(\mathbf{X}, \mathbf{Y})$ is the product $G_\mathbf{X}(s_1, \dots, s_m) G_\mathbf{Y}(t_1, \dots, t_k)$. For the componentwise sum of independent multivariate vectors of the same dimension, the joint PGF is likewise the product of the individual joint PGFs. The product form also facilitates the computation of moments for the sum. Since moments (and factorial moments) are obtained via derivatives of the PGF evaluated at $s = 1$, the product rule applies: for independent $X$ and $Y$, the mean of $Z = X + Y$ is $\mathbb{E}[Z] = G_Z'(1) = G_X'(1) G_Y(1) + G_X(1) G_Y'(1) = \mathbb{E}[X] + \mathbb{E}[Y]$, as $G_X(1) = G_Y(1) = 1$. Similarly, the variance follows from the second derivative: $\mathrm{Var}(Z) = G_Z''(1) + G_Z'(1) - [G_Z'(1)]^2 = \mathrm{Var}(X) + \mathrm{Var}(Y)$, confirming the additivity of variances for independent variables.[](https://www.cl.cam.ac.uk/teaching/0304/Probability/prob06.pdf) ### Compound distributions In probability theory, a compound distribution arises when considering a random sum $ Z = \sum_{i=1}^N X_i $, where $ N $ is a non-negative integer-valued random variable representing the number of terms, and the $ X_i $ are independent and identically distributed non-negative integer-valued random variables independent of $ N $.[](https://www.stat.auckland.ac.nz/~fewster/325/notes/ch4.pdf) The probability generating function (PGF) of $ Z $ is given by the functional composition $ G_Z(s) = G_N(G_X(s)) $, where $ G_N(s) $ and $ G_X(s) $ are the PGFs of $ N $ and $ X_1 $, respectively.[](https://www.stat.auckland.ac.nz/~fewster/325/notes/ch4.pdf) This result follows from the law of total expectation:

G_Z(s) = \mathbb{E}[s^Z] = \mathbb{E}\left[ \mathbb{E}[s^Z \mid N] \right] = \mathbb{E}\left[ (G_X(s))^N \right] = G_N(G_X(s)),

assuming the expectations exist within the radius of convergence.[](https://www.stat.auckland.ac.nz/~fewster/325/notes/ch4.pdf) Special cases include the geometric compound, which yields the negative binomial distribution when $ N $ follows a geometric distribution, and the compound Poisson distribution when $ N $ is Poisson-distributed.[](https://www.stat.auckland.ac.nz/~fewster/325/notes/ch4.pdf) These compositions simplify the analysis of aggregate processes, such as total claims in insurance where $ N $ counts events and each $ X_i $ represents claim size.[](https://galton.uchicago.edu/~yibi/teaching/stat244/L10.pdf) The moments of $ Z $ can be derived using differentiation of the PGF and the chain rule. The mean is $ \mathbb{E}[Z] = \mathbb{E}[N] \mathbb{E}[X_1] $, obtained as $ G_Z'(1) = G_N'(G_X(1)) G_X'(1) = G_N'(1) G_X'(1) $.[](https://www.stat.auckland.ac.nz/~fewster/325/notes/ch4.pdf) The variance follows from the law of total variance:

\text{Var}(Z) = \mathbb{E}[N] \text{Var}(X_1) + \text{Var}(N) (\mathbb{E}[X_1])^2,

which decomposes into the expected variance from the sum given $ N $ and the variance from the randomness in $ N $.[](https://galton.uchicago.edu/~yibi/teaching/stat244/L10.pdf) For the multivariate case, consider a compound vector sum $ \mathbf{Z} = \sum_{i=1}^N \mathbf{X}_i $, where each $ \mathbf{X}_i $ is a $ k $-dimensional random vector with joint PGF $ G_{\mathbf{X}}(s_1, \dots, s_k) = \mathbb{E}[s_1^{X_{i1}} \cdots s_k^{X_{ik}}] $, and $ N $ is independent of the $ \mathbf{X}_i $. The joint PGF of $ \mathbf{Z} $ is then $ G_{\mathbf{Z}}(s_1, \dots, s_k) = G_N(G_{\mathbf{X}}(s_1, \dots, s_k)) $, extending the univariate composition to joint distributions.[](https://www.sciencedirect.com/science/article/pii/S0377042719303747) ## Examples ### Finite support distributions For distributions with finite support, the probability-generating function (PGF) takes the form of a polynomial whose degree equals the maximum possible value of the random variable, facilitating straightforward extraction of probabilities as coefficients.[](https://people.engr.tamu.edu/andreas-klappenecker/csce658-s18/genfunc.pdf) The Bernoulli distribution, where a random variable $X$ takes value 1 with probability $p$ and 0 with probability $1-p$, has PGF $G_X(s) = 1 - p + p s$.[](https://people.engr.tamu.edu/andreas-klappenecker/csce658-s18/genfunc.pdf) The expected value is $E[X] = p$, obtained by differentiating the PGF and evaluating at $s=1$, while the variance is $\operatorname{Var}(X) = p(1-p)$.[](https://www.math.purdue.edu/~stindel/teaching/ma532/slides/generating-functions.pdf) The binomial distribution arises as the sum of $n$ independent Bernoulli trials with success probability $p$, yielding PGF $G_X(s) = (1 - p + p s)^n$.[](https://people.engr.tamu.edu/andreas-klappenecker/csce658-s18/genfunc.pdf) This form follows directly from the product of individual Bernoulli PGFs under independence.[](https://people.engr.tamu.edu/andreas-klappenecker/csce658-s18/genfunc.pdf) For a discrete uniform distribution on $\{0, 1, \dots, m\}$, where each outcome has probability $1/(m+1)$, the PGF is

G_X(s) = \frac{1 - s^{m+1}}{(m+1)(1 - s)}

for $s \neq 1$, derived as the partial sum of a geometric series.[](https://willperkins.org/6221/slides/generating.pdf) The hypergeometric distribution models the number of successes in $n$ draws without replacement from a finite population of size $N$ containing $K$ successes, with PGF given by

G_X(s) = \frac{(N - n)!(N - K)!}{N!(N - K - n)!} , {}_2F_1(-n, -K; N - K - n + 1; s)

for $n \leq N - K$.[](https://arxiv.org/abs/2407.20677) Since the support is bounded, the PGF is a polynomial, allowing probabilities to be recovered by expanding and reading coefficients or using finite differences for inversion.[](https://people.engr.tamu.edu/andreas-klappenecker/csce658-s18/genfunc.pdf) Computations involve direct summation over the support for small sizes or exploiting closed-form expressions via the binomial theorem or hypergeometric identities.[](https://www.stat.auckland.ac.nz/~fewster/325/notes/ch4.pdf) ### Infinite support distributions The probability-generating function (PGF) for discrete random variables with infinite support typically results in non-polynomial expressions, such as exponentials or rational functions, reflecting the unbounded nature of the support. These forms arise from infinite series expansions of the probability mass functions and often allow for analytic continuation beyond the unit disk, with the radius of convergence determined by the tail behavior of the distribution.[](https://user.eng.umd.edu/~abarg/620/GF.pdf) A canonical example is the Poisson distribution, where $X \sim \mathrm{Poisson}(\lambda)$ for $\lambda > 0$, with probability mass function $P(X = k) = e^{-\lambda} \frac{\lambda^k}{k!}$ for $k = 0, 1, 2, \dots$. The PGF is derived directly from the series:

G(s) = \sum_{k=0}^{\infty} e^{-\lambda} \frac{(\lambda s)^k}{k!} = e^{-\lambda} e^{\lambda s} = e^{\lambda(s-1)},

valid for all complex $s$ since the exponential is an entire function, implying an infinite radius of convergence. This exponential form encapsulates the light-tailed decay of the Poisson probabilities. The mean is $E[X] = G'(1) = \lambda$ and variance is $\mathrm{Var}(X) = G''(1) + G'(1) - [G'(1)]^2 = \lambda$.[](https://www.uvm.edu/~mrombach/373_rombach_week5.pdf)[](http://galton.uchicago.edu/~lalley/Courses/312/PoissonProcesses.pdf) For the geometric distribution, consider $X \sim \mathrm{Geometric}(p)$ counting the number of trials until the first success in independent [Bernoulli](/page/Bernoulli) trials with success probability $p \in (0,1)$, so $P(X = k) = (1-p)^{k-1} p$ for $k = 1, 2, \dots$. The PGF is

G(s) = \sum_{k=1}^{\infty} p (1-p)^{k-1} s^k = \frac{p s}{1 - (1-p) s},

for $|s| < \frac{1}{1-p}$, with a pole at $s = \frac{1}{1-p} > 1$, indicating convergence beyond the unit interval and reflecting the geometric tail decay. A zero-start variant, counting failures before the first success ($X = 0, 1, 2, \dots$), shifts the support and yields $G(s) = \frac{p}{1 - (1-p) s}$.[](https://www.stat.uchicago.edu/~yibi/teaching/stat317/2021/Lectures/Lecture8.pdf) The [negative binomial distribution](/page/Negative_binomial_distribution) generalizes the geometric, modeling the number of failures until $r$ successes, where $r > 0$ is the number of successes and $p \in (0,1)$ the success probability, so $X \sim \mathrm{NB}(r, p)$ with $P(X = k) = \binom{k + r - 1}{k} p^r (1-p)^k$ for $k = 0, 1, 2, \dots$. The PGF is

G(s) = \left( \frac{p}{1 - (1-p) s} \right)^r,

for $|s| < \frac{1}{1-p}$, obtained as the PGF of a sum of $r$ independent geometric random variables (each counting failures until one success). This rational form, akin to the Pascal distribution, highlights the compound structure and heavier tails compared to the Poisson, with convergence radius exceeding 1 due to the exponential decay rate.[](http://www.stat.yale.edu/~pollard/Courses/241.fall2005/notes2005/Generating.pdf)[](https://math.arizona.edu/~tgk/464_10/chap4_9_29.pdf) ## Applications ### Branching processes In branching processes, particularly the Galton–Watson process, the probability generating function (PGF) provides a powerful tool for analyzing population dynamics over discrete generations. The process begins with an initial population size $ Z_0 = 1 $, where each individual independently produces a random number of offspring according to a fixed probability distribution with PGF $ f(s) = \sum_{k=0}^\infty p_k s^k $, where $ p_k $ is the probability of having $ k $ offspring. The population size at the next generation, $ Z_{n+1} $, is the sum of $ Z_n $ independent copies of the offspring random variable. Consequently, the PGF of $ Z_n $ satisfies the iterative relation $ G_{Z_n}(s) = f^{\circ n}(s) $, where $ f^{\circ n} $ denotes the $ n $-th functional iterate of $ f $. This iteration captures the recursive nature of population growth and enables exact computation of the distribution of $ Z_n $ in principle.[](https://link.springer.com/book/10.1007/978-3-642-65371-1) A central quantity in the Galton–Watson process is the ultimate extinction probability $ \eta $, defined as the probability that the population eventually dies out, i.e., $ \eta = \lim_{n \to \infty} P(Z_n = 0) $. This probability is the smallest $ s \in [0,1] $ satisfying the fixed-point equation $ \eta = f(\eta) $. Starting from $ s_0 = 0 $, the sequence $ s_{k+1} = f(s_k) $ converges monotonically to $ \eta $, providing a practical method to solve for it. The mean offspring number $ m = f'(1) $ determines the behavior: if $ m \leq 1 $, then $ \eta = 1 $ (certain extinction); if $ m > 1 $, then $ \eta < 1 $ (positive survival probability). These cases classify the process as subcritical ($ m < 1 $), critical ($ m = 1 $), or supercritical ($ m > 1 $).[](https://www.rand.org/content/dam/rand/pubs/reports/2009/R381.pdf) In the supercritical case, while the expected population size grows as $ m^n $, the variance of $ Z_n $ is $ \sigma^2 m^{n-1} (m^n - 1)/(m - 1) $, where $ \sigma^2 = f''(1) + m - m^2 $ is the offspring variance, scaling asymptotically as $ \sigma^2 m^{2n-1}/(m - 1) $. Conditioned on non-extinction, the variance exhibits similar explosive [growth](/page/Growth) of [order](/page/Order) $ m^{2n} $. This reflects the increasing [uncertainty](/page/Uncertainty) in population trajectories as generations progress, despite the drift toward [growth](/page/Growth).[](https://link.springer.com/book/10.1007/978-3-642-65371-1) The use of PGFs to model such processes originated with Irénée-Jules Bienaymé, who in 1845 analyzed the extinction of lineages using an early form of the fixed-point equation, though his work remained obscure until rediscovered in the 1960s. Independently, [Francis Galton](/page/Francis_Galton) and Henry [Watson](/page/Watson) developed the model in the [1870s](/page/1870s) to investigate the persistence of aristocratic family names in [Britain](/page/Britain), posing the extinction problem that the PGF iteration resolves. Their 1875 paper laid the foundation, with PGFs proving essential for deriving the extinction criterion and growth behaviors.[](https://www.jstor.org/stable/2841222) ### Queueing and stochastic processes In queueing theory, probability generating functions (PGFs) transform the infinite system of balance equations for Markov chains into solvable algebraic equations, enabling the derivation of steady-state distributions for queue lengths and waiting times. This approach is especially valuable for analyzing single-server queues where direct solutions are intractable, allowing researchers to obtain closed-form expressions or numerically invert PGFs for probabilities. By focusing on embedded processes at key epochs, such as departures, PGFs capture the [stochastic](/page/Stochastic) behavior of systems under stability conditions like traffic intensity ρ < 1.[](https://www.columbia.edu/~ww2040/6711F12/lect1101.pdf) A seminal application is the M/G/1 queue, featuring Poisson arrivals at rate λ and general service times with Laplace-Stieltjes transform B^*(s) and mean 1/μ, yielding ρ = λ/μ. The PGF of the stationary queue length distribution, observed at departure epochs via the embedded Markov chain, is provided by the Pollaczek-Khinchine formula:

\pi(z) = (1 - \rho) \frac{(1 - z) B^(\lambda (1 - z))}{B^(\lambda (1 - z)) - z}, \quad |z| \leq 1.