Moment-generating function

In probability theory and statistics, the moment-generating function (MGF) of a random variable X is defined as M_X(t) = \mathbb{E}[e^{tX}], where the expectation exists for real values of t in some neighborhood of zero.^[1] This function serves as an alternative representation of the probability distribution of X, encapsulating all its moments in a compact form.^[2] The MGF is particularly valuable for computing moments, as the nth raw moment \mathbb{E}[X^n] equals the nth derivative of M_X(t) evaluated at t = 0, i.e., M_X^{(n)}(0) = \mathbb{E}[X^n].^[3] For discrete random variables, the MGF takes the form M_X(t) = \sum_x e^{tx} P(X = x), while for continuous variables, it is M_X(t) = \int_{-\infty}^{\infty} e^{tx} f_X(x) \, dx, where f_X is the probability density function. A key property is its uniqueness: if two random variables have moment-generating functions that agree on an open interval containing zero, then their distributions are identical.^[4] MGFs are especially useful in deriving distributions of sums of independent random variables, as the MGF of their sum is the product of the individual MGFs: if X and Y are independent, then M_{X+Y}(t) = M_X(t) M_Y(t).^[5] This multiplicative property simplifies convolutions and facilitates finding the distribution of sums without direct integration.^[6] Additionally, MGFs aid in proving limit theorems and generating functions for common distributions like the binomial, Poisson, and normal, where explicit forms exist and reveal structural insights.^[7]

Definition and Existence

Formal Definition

The moment-generating function (MGF) of a random variable X is defined as

M_X(t) = \mathbb{E}\left[e^{tX}\right],

where the expectation is taken with respect to the probability distribution of X, and t is a real number in the domain where this expectation exists.^[6] This definition can be expressed in alternative forms depending on the nature of the distribution of X. For a general distribution with cumulative distribution function F_X, the MGF is given by

M_X(t) = \int_{-\infty}^{\infty} e^{tx} \, dF_X(x).

^[2] For a discrete random variable X taking values in a countable set with probability mass function P(X = x), it becomes

M_X(t) = \sum_{x} e^{tx} P(X = x).

^[8] For a continuous random variable X with probability density function f_X, the MGF is

M_X(t) = \int_{-\infty}^{\infty} e^{tx} f_X(x) \, dx.

^[2]

Conditions for Existence

The moment-generating function (MGF) of a random variable X, denoted M_X(t) = \mathbb{E}[e^{tX}], is said to exist for a value t if the expectation is finite, i.e., \mathbb{E}[|e^{tX}|] < \infty.^[9] More precisely, the MGF exists in a domain consisting of an open interval containing 0 if \mathbb{E}[e^{tX}] < \infty for all t in that interval.^[10] This finite-domain condition ensures the MGF is well-defined and differentiable in the interior of its domain.^[11] For non-negative random variables X \geq 0, the MGF always exists for t \leq 0 because e^{tX} \leq 1 in this case, making the expectation bounded.^[12] Existence for t > 0 requires the right tail of X to decay sufficiently fast, specifically that the exponential moment \mathbb{E}[e^{tX}] remains finite, which imposes an exponential decay condition on the tail probability \mathbb{P}(X > x).^[10] For general real-valued X, symmetric conditions apply to both tails: finiteness for t > 0 controls the positive (right) tail, while finiteness for t < 0 controls the negative (left) tail, ensuring neither tail is too heavy.^[12] If the MGF exists in an open interval containing 0, then all power moments \mathbb{E}[|X|^k] < \infty for every integer k \geq 1, as these can be recovered via derivatives of the MGF at 0.^[11] However, the converse does not hold; there exist distributions with all moments finite but no MGF in any neighborhood of 0, such as the lognormal distribution, whose right tail is too heavy for \mathbb{E}[e^{tX}] < \infty when t > 0.^[10]/05%3A_Special_Distributions/5.12%3A_The_Lognormal_Distribution) Existence in such a neighborhood around 0 is sufficient to guarantee that the MGF is analytic there, enabling powerful properties like uniqueness of the distribution.^[10]

Moments and Derivatives

Extracting Moments

The moment-generating function (MGF) of a random variable X, denoted M_X(t) = \mathbb{E}[e^{tX}], provides a systematic way to extract the raw moments of X through differentiation. Specifically, the nth raw moment \mu_n' = \mathbb{E}[X^n] is given by the nth derivative of the MGF evaluated at t = 0: \mathbb{E}[X^n] = M_X^{(n)}(0), where M_X^{(n)}(t) denotes the nth derivative of M_X(t).^[13]^[14] This relationship arises from the Taylor series expansion of the MGF around t = 0. Assuming the MGF exists in a neighborhood of 0, the expansion is

M_X(t) = \sum_{n=0}^{\infty} \frac{M_X^{(n)}(0)}{n!} t^n = \sum_{n=0}^{\infty} \frac{\mathbb{E}[X^n]}{n!} t^n,

where the coefficients directly correspond to the raw moments scaled by factorials.^[13]^[15] For the first raw moment, differentiation yields \mathbb{E}[X] = M_X'(0). For the second raw moment, \mathbb{E}[X^2] = M_X''(0), which enables computation of the variance as \mathrm{Var}(X) = M_X''(0) - [M_X'(0)]^2.^[14]^[10] In practice, these moments are computed by differentiating under the expectation sign: M_X^{(n)}(t) = \mathbb{E}[X^n e^{tX}], evaluated at t=0. This interchange of differentiation and expectation is justified by the dominated convergence theorem when the MGF exists in an open interval containing 0, ensuring the necessary integrability conditions hold.^[16]^[10]

Higher-Order Moments and Factorial Moments

The nth raw moment of a random variable X, denoted \mathbb{E}[X^n], can be extracted from the moment-generating function M(t) as the nth derivative evaluated at t = 0:

\mathbb{E}[X^n] = \left. \frac{d^n}{dt^n} M(t) \right|_{t=0},

provided M(t) exists and is differentiable n times in a neighborhood of 0.^[17] This general formula extends the extraction of lower-order moments, such as the mean and variance, to arbitrary orders, enabling the computation of higher moments like the third moment \mathbb{E}[X^3] for assessing asymmetry or the fourth moment \mathbb{E}[X^4] for tail behavior.^[18] Central moments, which measure deviations from the mean \mu = \mathbb{E}[X], are derived from the raw moments via the binomial theorem:

\mu_n = \mathbb{E}[(X - \mu)^n] = \sum_{k=0}^n \binom{n}{k} \mathbb{E}[X^k] (-\mu)^{n-k}.

This relation allows higher-order central moments \mu_n to be expressed in terms of the raw moments obtained from M(t).^[14] Standardized versions of these provide measures of shape, such as skewness \gamma_1 = \mu_3 / \sigma^3 (where \sigma^2 = \mu_2) and kurtosis \beta_2 = \mu_4 / \sigma^4, both computable using derivatives of M(t).^[3] Positive skewness indicates a right-tailed distribution, while excess kurtosis above 3 signals heavier tails than the normal distribution.^[14] For discrete random variables, factorial moments \mathbb{E}[X(X-1)\cdots(X-k+1)] are particularly useful in combinatorial contexts and relate to the probability-generating function G(s) = \mathbb{E}[s^X] = M(\ln s). The kth factorial moment is the kth derivative of G(s) at s=1.^[19] These moments facilitate computations for distributions like the binomial or Poisson, where they simplify variance expressions, such as \mathrm{Var}(X) = \mathbb{E}[X(X-1)] + \mathbb{E}[X] - (\mathbb{E}[X])^2.^[14] The existence of higher-order moments requires M(t) to be infinitely differentiable at t=0, but not all distributions satisfy this; for instance, the Student's t-distribution with fewer than n+1 degrees of freedom has only finite moments up to order n.^[20]

Examples

Discrete Random Variables

The moment-generating function (MGF) of a discrete random variable X with probability mass function P(X = k) is defined as M_X(t) = \mathbb{E}[e^{tX}] = \sum_{k} e^{t k} P(X = k), where the sum is over the support of X and the expression exists for t in some neighborhood of 0.^[21] This summation form facilitates explicit computation for common discrete distributions, often yielding closed-form expressions that reveal structural properties. For the Bernoulli distribution with success probability p \in (0,1), X takes value 1 with probability p and 0 with probability q = 1 - p. Substituting into the summation gives

M_X(t) = q + p e^t,

valid for all real t.^[18] The binomial distribution with parameters n \in \mathbb{N} and p \in (0,1) arises as the sum of n independent Bernoulli(p) random variables. Its MGF is therefore the product of the individual MGFs:

M_X(t) = (q + p e^t)^n,

also valid for all real t. Direct computation via the summation \sum_{k=0}^n e^{t k} \binom{n}{k} p^k q^{n-k} confirms this closed form using the binomial theorem.^[22] For the Poisson distribution with rate parameter \lambda > 0, P(X = k) = e^{-\lambda} \lambda^k / k! for k = 0, 1, 2, \dots. The MGF is

M_X(t) = \sum_{k=0}^\infty e^{t k} \frac{e^{-\lambda} \lambda^k}{k!} = e^{-\lambda} \sum_{k=0}^\infty \frac{(\lambda e^t)^k}{k!} = \exp(\lambda (e^t - 1)),

valid for all real t, where the series recognizes the exponential function.^[23] The geometric distribution with success probability p \in (0,1) models the number of trials until the first success, so P(X = k) = q^{k-1} p for k = 1, 2, \dots. The MGF computation yields

M_X(t) = \sum_{k=1}^\infty e^{t k} p q^{k-1} = p e^t \sum_{j=0}^\infty (q e^t)^j = \frac{p e^t}{1 - q e^t},

for t < -\ln q, as the sum is a geometric series.^[24] The negative binomial distribution with parameters r \in \mathbb{N} and p \in (0,1) counts the number of trials until the rth success and equals the sum of r independent geometric(p) random variables. Its MGF is thus

M_X(t) = \left( \frac{p e^t}{1 - q e^t} \right)^r,

valid for t < -\ln q. The summation form \sum_{k=r}^\infty e^{t k} \binom{k-1}{r-1} p^r q^{k-r} leads to the same expression via repeated differentiation or generating function properties.^[25]

Continuous Random Variables

For continuous random variables, the moment-generating function (MGF) is defined as M_X(t) = \mathbb{E}[e^{tX}] = \int_{-\infty}^{\infty} e^{tx} f_X(x) \, dx, where f_X(x) is the probability density function (PDF), provided the integral converges for t in some neighborhood of 0. This integral form allows direct computation of the MGF for distributions with known densities, often leveraging techniques such as completing the square, recognizing Laplace transforms, or series expansions.

Exponential Distribution

Consider an exponential random variable X with rate parameter \lambda > 0, so its PDF is f_X(x) = \lambda e^{-\lambda x} for x \geq 0 and 0 otherwise. The MGF is derived by direct integration:

M_X(t) = \int_{0}^{\infty} e^{tx} \lambda e^{-\lambda x} \, dx = \lambda \int_{0}^{\infty} e^{-(\lambda - t)x} \, dx = \frac{\lambda}{\lambda - t}, \quad t < \lambda.

The integral converges for t < \lambda because the exponent -(\lambda - t)x ensures decay as x \to \infty. This form is obtained by recognizing the integrand as the PDF of another exponential distribution scaled by \lambda / (\lambda - t).^[26]

Gamma Distribution

For a gamma random variable X with shape \alpha > 0 and rate \beta > 0, the PDF is f_X(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x} for x > 0. The MGF derivation involves substituting into the integral definition:

M_X(t) = \int_{0}^{\infty} e^{tx} \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x} \, dx = \frac{\beta^\alpha}{\Gamma(\alpha)} \int_{0}^{\infty} x^{\alpha-1} e^{-(\beta - t)x} \, dx.

The integral is the gamma function form: \int_{0}^{\infty} x^{\alpha-1} e^{-(\beta - t)x} \, dx = \frac{\Gamma(\alpha)}{(\beta - t)^\alpha} for t < \beta, yielding

M_X(t) = \left( \frac{\beta}{\beta - t} \right)^\alpha = (1 - \beta t)^{-\alpha}, \quad t < \beta.

(Note the scale parameter is sometimes denoted differently; here rate \beta corresponds to scale $1/\beta.) This computation uses the relationship between the gamma integral and the MGF, akin to a Laplace transform evaluation. The exponential case is a special instance with \alpha = 1.^[27]

Normal Distribution

Let X \sim \mathcal{N}(\mu, \sigma^2), with PDF f_X(x) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left( -\frac{(x - \mu)^2}{2\sigma^2} \right). The MGF is found by completing the square in the exponent of the integrand:

M_X(t) = \int_{-\infty}^{\infty} e^{tx} \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left( -\frac{(x - \mu)^2}{2\sigma^2} \right) \, dx = \frac{1}{\sqrt{2\pi \sigma^2}} \int_{-\infty}^{\infty} \exp\left( -\frac{(x - (\mu + \sigma^2 t))^2}{2\sigma^2} + \frac{\sigma^2 t^2}{2} + \mu t \right) \, dx.

The integral simplifies to e^{\mu t + \sigma^2 t^2 / 2} times the integral of a normal PDF (which equals 1), so

M_X(t) = \exp\left( \mu t + \frac{\sigma^2 t^2}{2} \right),

valid for all real t. This derivation highlights the quadratic exponent's role in shifting the mean without altering the normalizing constant.^[28]

Uniform Distribution

For a uniform random variable X on [a, b] with a < b, the PDF is f_X(x) = \frac{1}{b - a} for a \leq x \leq b and 0 otherwise. Direct integration gives

M_X(t) = \int_{a}^{b} e^{tx} \frac{1}{b - a} \, dx = \frac{1}{b - a} \left[ \frac{e^{tx}}{t} \right]_{a}^{b} = \frac{e^{bt} - e^{at}}{t(b - a)}, \quad t \neq 0.

At t = 0, the limit is M_X(0) = 1 by L'Hôpital's rule or direct evaluation of \mathbb{E}[e^{0 \cdot X}] = 1. The MGF exists for all t, but lacks a simple closed form beyond this expression, reflecting the bounded support.^[29] These derivations typically rely on direct evaluation of the defining integral, with aids like completion of the square for normals or gamma function recognition for gamma distributions; the MGF is essentially the Laplace transform of the PDF evaluated at -t. Notably, some continuous distributions, such as the lognormal, possess all finite moments but lack a closed-form MGF, as the integral \int_{0}^{\infty} e^{tx} f_X(x) \, dx diverges for t > 0 due to the heavy right tail.^[30]

Operations on Random Variables

Linear Transformations

Consider an affine transformation of a random variable X, defined as Y = aX + b, where a and b are constants with a \neq 0. The moment-generating function (MGF) of Y is given by

M_Y(t) = e^{bt} M_X(at),

where M_X(t) is the MGF of X.^[2]^[31] This relation follows directly from the definition of the MGF. Specifically,

M_Y(t) = \mathbb{E}[e^{tY}] = \mathbb{E}[e^{t(aX + b)}] = \mathbb{E}[e^{bt} e^{(at)X}] = e^{bt} \mathbb{E}[e^{(at)X}] = e^{bt} M_X(at),

assuming the expectations exist.^[32]^[33] The transformation affects the moments of Y through differentiation of M_Y(t). The mean shifts by b and scales by a, so \mathbb{E}[Y] = a \mathbb{E}[X] + b, while higher moments scale accordingly, with the variance transforming as \mathrm{Var}(Y) = a^2 \mathrm{Var}(X). The domain of existence for M_Y(t) is preserved but scaled: if M_X(t) exists for |t| < h, then M_Y(t) exists for |t| < h/|a|.^[31]^[34] A key application arises in standardizing random variables to obtain a zero-mean, unit-variance form. For instance, if Z = (X - \mu)/\sigma where \mu = \mathbb{E}[X] and \sigma^2 = \mathrm{Var}(X), then M_Z(t) = e^{-\mu t / \sigma} M_X(t / \sigma), facilitating comparisons across distributions.^[34] Since the MGF uniquely determines the distribution of a random variable when it exists in some neighborhood of zero, the affine transformation preserves this uniqueness property, as the mapping from X to Y is one-to-one.^[9]

Sums of Independent Variables

One of the primary advantages of the moment-generating function (MGF) lies in its application to sums of independent random variables. If X_1, X_2, \dots, X_n are independent random variables, the MGF of their sum S = \sum_{i=1}^n X_i is given by the product of the individual MGFs:

M_S(t) = \prod_{i=1}^n M_{X_i}(t),

where the expression is defined for values of t in the common domain of the individual MGFs.^[2] This property holds because independence allows the expectation to factorize under the exponential transform.^[18] The derivation follows directly from the definition of the MGF. By linearity of the exponent,

M_S(t) = \mathbb{E}\left[e^{tS}\right] = \mathbb{E}\left[e^{t \sum_{i=1}^n X_i}\right] = \mathbb{E}\left[\prod_{i=1}^n e^{t X_i}\right].

Independence implies that the expectation of the product equals the product of the expectations, yielding

\mathbb{E}\left[\prod_{i=1}^n e^{t X_i}\right] = \prod_{i=1}^n \mathbb{E}\left[e^{t X_i}\right] = \prod_{i=1}^n M_{X_i}(t).

This factorization simplifies the analysis of sum distributions significantly.^[35] The domain of M_S(t) is the intersection of the domains of the M_{X_i}(t), which is nonempty and contains an interval around 0 if each individual MGF exists in such an interval.^[2] This product rule serves as an analog to the convolution theorem for probability densities or mass functions. The distribution of the sum S involves the convolution of the individual distributions, but the MGF converts this operation into multiplication, facilitating easier computation of moments and tail probabilities for the sum.^[36] For example, the binomial distribution with parameters n and p emerges as the sum of n i.i.d. Bernoulli random variables each with success probability p; its MGF is [pe^t + (1-p)]^n, obtained by raising the Bernoulli MGF to the power n.^[21] Likewise, the sum of independent Poisson random variables with rate parameters \lambda_1, \dots, \lambda_n follows a Poisson distribution with rate \sum \lambda_i, as the product of their MGFs \prod \exp(\lambda_i (e^t - 1)) simplifies to \exp\left( \left(\sum \lambda_i\right) (e^t - 1) \right).^[37] In the central limit theorem, this product structure plays a key role: for large n, the MGF of the standardized sum \frac{S - n\mu}{\sqrt{n}\sigma} of i.i.d. random variables with mean \mu and variance \sigma^2 > 0 approaches e^{t^2/2}, the MGF of the standard normal distribution, justifying the approximate normality of such sums.^[38]

Multivariate Extensions

Definition for Vectors

The moment-generating function (MGF) of a random vector \mathbf{X} = (X_1, \dots, X_n)^\top \in \mathbb{R}^n is defined as M_\mathbf{X}(\mathbf{t}) = \mathbb{E}[\exp(\mathbf{t}^\top \mathbf{X})], where \mathbf{t} = (t_1, \dots, t_n)^\top \in \mathbb{R}^n is the parameter vector.^[39] Here, \mathbf{t}^\top \mathbf{X} denotes the inner (dot) product \sum_{i=1}^n t_i X_i.^[39] This formulation extends the univariate MGF, which corresponds to the case n=1 with scalar t.^[10] The MGF exists if \mathbb{E}[\exp(\mathbf{t}^\top \mathbf{X})] < \infty for all \mathbf{t} in some open neighborhood of the origin \mathbf{0} in \mathbb{R}^n.^[40] This condition ensures the MGF is finite in a region around \mathbf{0}, analogous to the univariate requirement that the expectation is finite for |t| sufficiently small.^[10] The marginal MGF of a subvector, such as (X_1, \dots, X_k)^\top for k < n, is obtained by setting the remaining components of \mathbf{t} to zero; for example, the MGF of X_1 is M_\mathbf{X}(t_1, 0, \dots, 0).^[40] Thus, the joint MGF determines all marginal MGFs.^[40] For a continuous random vector \mathbf{X} with joint probability density function f_\mathbf{X}(\mathbf{x}), the MGF takes the integral form

M_\mathbf{X}(\mathbf{t}) = \int_{\mathbb{R}^n} \exp(\mathbf{t}^\top \mathbf{x}) f_\mathbf{X}(\mathbf{x}) \, d\mathbf{x}.

^[39]

Joint Moments and Covariance

The joint moment-generating function M(\mathbf{t}) = \mathbb{E}[\exp(\mathbf{t}^\top \mathbf{X})] for a random vector \mathbf{X} = (X_1, \dots, X_n)^\top enables the computation of mixed moments via partial derivatives. Specifically, the cross-moment \mathbb{E}[X_{i_1} X_{i_2} \cdots X_{i_k}] is obtained as the k-th mixed partial derivative of M(\mathbf{t}) with respect to the components t_{i_1}, \dots, t_{i_k}, evaluated at \mathbf{t} = \mathbf{0}:

\mathbb{E}[X_{i_1} X_{i_2} \cdots X_{i_k}] = \left. \frac{\partial^k M(\mathbf{t})}{\partial t_{i_1} \partial t_{i_2} \cdots \partial t_{i_k}} \right|_{\mathbf{t}=\mathbf{0}}.

This generalizes the univariate case by capturing dependencies through higher-order interactions among the components of \mathbf{X}.^[41] For second-order cross-moments, the covariance between X_i and X_j follows directly from the mixed second partial derivative. The covariance is given by

\text{Cov}(X_i, X_j) = \left. \frac{\partial^2 M(\mathbf{t})}{\partial t_i \partial t_j} \right|_{\mathbf{t}=\mathbf{0}} - \mathbb{E}[X_i] \mathbb{E}[X_j],

where \mathbb{E}[X_i] = \partial M / \partial t_i (0) and similarly for \mathbb{E}[X_j]. The full covariance matrix \Sigma of \mathbf{X} has elements \Sigma_{ij} = \text{Cov}(X_i, X_j), providing a complete second-order characterization of the linear dependencies in the vector. This relation is fundamental in multivariate analysis, as it links the curvature of the MGF at the origin to the dispersion structure of \mathbf{X}.^[41] A key property arises for independence: two random vectors \mathbf{X} and \mathbf{Y} are independent if and only if their joint MGF factors as M_{\mathbf{X},\mathbf{Y}}(\mathbf{t}, \mathbf{s}) = M_{\mathbf{X}}(\mathbf{t}) M_{\mathbf{Y}}(\mathbf{s}) for all \mathbf{t}, \mathbf{s} in a neighborhood of the origin where the MGFs exist. This factorization criterion extends the univariate independence condition and implies that all cross-moments between \mathbf{X} and \mathbf{Y} are products of marginal moments.^[41] For sums of independent random vectors, the MGF of the sum \mathbf{Z} = \mathbf{X} + \mathbf{Y} is the product M_{\mathbf{Z}}(\mathbf{t}) = M_{\mathbf{X}}(\mathbf{t}) M_{\mathbf{Y}}(\mathbf{t}), mirroring the univariate convolution property. This multiplicative structure simplifies the derivation of moments for \mathbf{Z}, as the mixed partial derivatives of the product yield the sums of cross-moments from \mathbf{X} and \mathbf{Y} separately.^[41] Higher-order joint moments generalize to moment tensors, where the (p_1, \dots, p_n)-th mixed moment \mathbb{E}[X_1^{p_1} \cdots X_n^{p_n}] is the corresponding multi-index partial derivative of M(\mathbf{t}) at \mathbf{0}. These tensors fully describe the higher-order dependencies in \mathbf{X} and are essential for applications like cumulant analysis in multivariate statistics.^[41]

Analytic Properties

Analyticity and Uniqueness

The moment-generating function M_X(t) = \mathbb{E}[e^{tX}], when defined for all t in an open interval containing the origin, is analytic throughout that interval. This follows from its representation as an expectation of the exponential function, which allows for a Taylor series expansion around t = 0 with radius of convergence at least as large as the interval of existence; the coefficients of this series are the moments of X, and the series converges to M_X(t) within the interval. Furthermore, since M_X(t) is infinitely differentiable in this neighborhood, with the nth derivative at 0 yielding the nth moment \mu_n = M_X^{(n)}(0), it admits analytic continuation to the interior of its maximal domain of definition on the real line.^[10] A key consequence of this analyticity is the uniqueness theorem for moment-generating functions: if two random variables X and Y have moment-generating functions M_X(t) and M_Y(t) that agree on some open interval (-\delta, \delta) with \delta > 0, then X and Y have identical distributions.^[42] The proof proceeds by noting that equality of the MGFs implies equality of all moments \mu_n for n \geq 0, and the existence of the MGF in a neighborhood of 0 guarantees that these moments grow sufficiently slowly to uniquely determine the distribution via the moment problem.^[42] Unlike the characteristic function, which exists for all distributions but requires pointwise convergence and continuity at 0 for distributional limits, the moment-generating function provides direct uniqueness whenever it exists in such an interval, without additional continuity assumptions.^[10] This uniqueness is tied to Cramér's condition, which ensures that moments alone determine the distribution. Cramér's condition holds if and only if there exist constants C > 0 and r > 0 such that |\mu_n| \leq C r^n for all n \geq 0; this is equivalent to the existence of the moment-generating function in a neighborhood of 0.^[43] Under this condition, the power series \sum_{n=0}^\infty \mu_n t^n / n! converges to M_X(t) and uniquely identifies the probability measure.^[43] However, the analyticity and uniqueness properties are conditional on the existence of the moment-generating function, which fails for certain heavy-tailed distributions. For instance, the standard Cauchy distribution has no moment-generating function, as \mathbb{E}[e^{tX}] = \int_{-\infty}^\infty e^{tx} \frac{1}{\pi(1 + x^2)} \, dx = \infty for all t \neq 0.^[10] Thus, while the moment-generating function offers powerful identification when available, its absence limits its applicability, necessitating alternatives like the characteristic function for broader classes of distributions.^[42]

Inversion Formulas

Inversion formulas allow recovery of the probability distribution or density function from the moment-generating function (MGF) M(t), leveraging its connection to the bilateral Laplace transform. For a continuous random variable X with probability density function f(x), the MGF is M(t) = \int_{-\infty}^{\infty} e^{t x} f(x) \, dx, provided the integral converges for t in some interval containing 0. The inversion formula, known as the Bromwich integral, recovers the density as

f(x) = \frac{1}{2\pi i} \int_{c - i\infty}^{c + i\infty} e^{-t x} M(t) \, dt,

where c is chosen such that c lies within the strip of analyticity of M(t) in the complex plane, ensuring the contour avoids singularities.^[44] This formula arises from the inverse bilateral Laplace transform, as the MGF corresponds to the Laplace transform of f(x) evaluated at -t. The cumulative distribution function (CDF) F(x) = P(X \leq x) can then be obtained by integrating the inverted density:

F(x) = \int_{-\infty}^{x} f(u) \, du.

This MGF-specific approach relies on first inverting to the density before integration, contrasting with direct transform-based expressions for other generating functions.^[44] For a discrete random variable X taking integer values with probability mass function p_k = P(X = k), the MGF is M(t) = \sum_{k} p_k e^{t k}. The probabilities are recovered via the contour integral

p_k = \frac{1}{2\pi i} \oint \frac{M(t) e^{-t k}}{t} \, dt,

where the contour is a closed path encircling the origin counterclockwise within a region where M(t) is analytic.^[45] This expression extracts p_k as the residue at t = 0, exploiting the Laurent series expansion of M(t). In practice, direct computation of these integrals is rare due to their complexity; instead, inversion often proceeds via recognition of known MGF forms (e.g., matching to standard distributions) or numerical algorithms adapted from Laplace transform methods, such as the Talbot algorithm for contour deformation or the Post-Widder differencing for real-line approximations.^[46] These techniques are particularly useful when the MGF is available in closed form but the distribution is not. A key limitation arises from the analytic domain of M(t), typically a strip around the imaginary axis; singularities outside this domain, such as branch points or poles, restrict contour choices and can lead to numerical instability or divergence in approximations.

Relations to Other Transforms

Characteristic Function

The characteristic function of a random variable X, denoted \phi_X(t), is defined as \phi_X(t) = \mathbb{E}[e^{i t X}], where i is the imaginary unit.^[47] This function serves as the Fourier transform of the probability distribution of X and is directly related to the moment-generating function (MGF) M_X(t) = \mathbb{E}[e^{t X}] through the relation \phi_X(t) = M_X(i t), representing an analytic continuation of the MGF to the imaginary axis.^[48] For distributions where the MGF exists in a neighborhood of zero on the real line, this substitution allows the characteristic function to extend the MGF's utility to the complex plane. A key advantage of the characteristic function over the MGF is its universal existence for any random variable, as |e^{i t X}| = 1 ensures that |\phi_X(t)| \leq 1 for all real t, bounding the expectation and preventing divergence issues that can plague the MGF for heavy-tailed distributions.^[47] Additionally, inversion formulas based on the Fourier transform enable recovery of the distribution from the characteristic function for all probability measures, whereas MGF inversion is limited to cases where the MGF is defined and analytic.^[49] In practice, when the MGF domain includes a strip around the real axis but not purely imaginary arguments, the characteristic function provides a reliable alternative by evaluating the MGF at imaginary points. Regarding uniqueness, the characteristic function uniquely determines the distribution of X, as distinct distributions yield distinct characteristic functions, a property established via the inversion theorem.^[49] This mirrors the MGF's uniqueness where it exists but extends to all distributions, making the characteristic function more versatile for identification purposes. Moments can be extracted from the characteristic function similarly to the MGF: the nth derivative at zero satisfies \phi_X^{(n)}(0) = i^n \mathbb{E}[X^n], provided the moment exists, though the complex values require adjustment by powers of i compared to the real derivatives of the MGF.^[50]

Cumulant-Generating Function

The cumulant-generating function of a random variable X, denoted K_X(t), is defined as the natural logarithm of its moment-generating function:

K_X(t) = \log M_X(t),

where it is defined for values of t such that M_X(t) > 0.^[20] This function provides a generating mechanism for the cumulants, which are coefficients in its Taylor series expansion around t = 0:

K_X(t) = \sum_{n=1}^\infty \kappa_n \frac{t^n}{n!},

where the cumulants \kappa_n are given by the nth derivative evaluated at zero, \kappa_n = K_X^{(n)}(0).^[20] The first cumulant \kappa_1 equals the mean \mathbb{E}[X], the second \kappa_2 equals the variance \mathrm{Var}(X), the third \kappa_3 measures asymmetry (related to skewness), and higher-order cumulants capture further aspects of the distribution's shape, such as kurtosis for \kappa_4.^[20] A primary advantage of the cumulant-generating function lies in its additivity under convolution: if X and Y are independent random variables, then K_{X+Y}(t) = K_X(t) + K_Y(t).^[51] This property simplifies the analysis of sums compared to the multiplicative form M_{X+Y}(t) = M_X(t) M_Y(t) of the moment-generating function, making cumulants particularly useful for studying limits of sums, such as in large deviation theory or central limit theorem refinements.^[52] The cumulants relate to the raw moments \mu_n = \mathbb{E}[X^n] through explicit combinatorial expressions involving Bell polynomials. Specifically, the nth cumulant is \kappa_n = B_n(\mu_1, \mu_2, \dots, \mu_n), where B_n denotes the nth complete Bell polynomial, and conversely, moments can be expressed as polynomials in the cumulants.^[53] This bidirectional relation facilitates conversions between moment-based and cumulant-based descriptions of distributions.^[53] Cumulants play a key role in asymptotic approximations, notably in the Edgeworth expansion, which refines the central limit theorem by incorporating higher-order cumulants to describe deviations from normality in the distribution of standardized sums.^[52] The domain of K_X(t) coincides with that of M_X(t), typically an interval containing zero, but the logarithm introduces potential branch points where M_X(t) crosses the negative real axis or zero, requiring careful selection of the principal branch for analytic continuation.^[20]