Fact-checked by Grok 2 weeks ago

Characteristic function

In probability theory, the characteristic function of a random variable X is defined as the complex-valued function \phi_X(t) = \mathbb{E}[e^{itX}] for t \in \mathbb{R}, where i is the imaginary unit, representing the Fourier transform of the probability distribution of X.^[1] This function always exists for any random variable, even those without moments, and uniquely determines the distribution of X.^[2] Unlike moment-generating functions, which may not exist for heavy-tailed distributions, characteristic functions are bounded by 1 in absolute value and continuous, providing a robust tool for analyzing convergence and limits of distributions.^[3] Characteristic functions play a central role in probabilistic theorems, such as the Lévy continuity theorem^[4], which states that a sequence of random variables converges in distribution if and only if their characteristic functions converge pointwise to a continuous function. They facilitate the derivation of moments via derivatives at t=0, where the n-th moment is related to \phi_X^{(n)}(0), and enable inversion formulas to recover the cumulative distribution function or probability density from \phi_X.^[5] Applications extend to the central limit theorem, where the characteristic function of standardized sums approaches e^{-t^2/2}, and to studying independence, as the characteristic function of a sum of independent variables is the product of individual characteristic functions.^[6] Characteristic functions remain a cornerstone of modern probability for handling complex distributions and proving limit theorems.

Definition and Formulation

Formal Definition

The characteristic function of a real-valued random variable X is defined as \phi_X(t) = \mathbb{E}[e^{itX}] for t \in \mathbb{R}, where i = \sqrt{-1} is the imaginary unit and \mathbb{E} denotes the expectation operator.^[7]^[8]^[9]^[10] This definition, introduced by Paul Lévy, provides a Fourier-analytic representation of the distribution of X.^[10] Equivalently, in terms of the cumulative distribution function F_X of X, it is given by

\phi_X(t) = \int_{-\infty}^{\infty} e^{itx} \, dF_X(x).

^[9]^[7] For a continuous random variable with probability density function f_X(x), the characteristic function takes the form

\phi_X(t) = \int_{-\infty}^{\infty} e^{itx} f_X(x) \, dx.

^[7]^[8] For a discrete random variable with probability mass function p_X(x), it is the discrete sum \phi_X(t) = \sum_x e^{itx} p_X(x).^[7] The complex exponential in the definition expands as e^{itx} = \cos(tx) + i \sin(tx), employing the imaginary unit i to encode both the amplitude and phase information of the distribution, which facilitates convergence and uniqueness properties absent in real-exponential transforms like the moment-generating function.^[8]^[7] This function fully determines the law of X, as distinct distributions yield distinct characteristic functions.^[9]^[7]

Relation to Fourier Transform

The characteristic function \phi_X(t) of a random variable X is the Fourier–Stieltjes transform of its cumulative distribution function F_X(x), expressed as

\phi_X(t) = \int_{-\infty}^{\infty} e^{i t x} \, dF_X(x).

This formulation establishes the characteristic function as the Fourier transform of the underlying probability measure, enabling the application of Fourier analysis techniques to probability distributions.^[11] Equivalently, \phi_X(t) = \mathbb{E}[e^{i t X}], linking the transform to probabilistic expectations. In probability theory, the argument t is real-valued and the expression is unnormalized, differing from conventions in signal processing and engineering, where the Fourier transform typically employs a frequency variable \omega = 2\pi f in the exponent e^{-j \omega t} and includes normalization factors such as $1/(2\pi) or $1/\sqrt{2\pi} in the forward or inverse directions; the inverse recovery of the probability density from the characteristic function often incorporates a $1/(2\pi) factor.^[11]^[12] A key advantage of the characteristic function over the direct Fourier transform of a density or the moment-generating function is its universal existence for any probability distribution, as |e^{i t x}| = 1 ensures the integral converges without requiring additional integrability conditions on the distribution.^[11] The term "characteristic function" and its systematic use in probability were introduced by Paul Lévy in his 1925 monograph Calcul des probabilités, where he applied Fourier transforms to analyze random variables and their distributions, building on earlier ideas from Laplace transforms.^[10]

Fundamental Properties

Continuity and Boundedness

The characteristic function \phi_X(t) = \mathbb{E}[e^{itX}] of any random variable X exists and is finite for every real number t, since |e^{itX}| = 1 almost surely, ensuring the expectation is well-defined via the bounded convergence theorem.^[13] This property holds universally for all probability distributions, whether continuous, discrete, or mixed, without requiring additional moment conditions.^[14] A key analytic feature is the boundedness of the characteristic function: |\phi_X(t)| \leq 1 for all real t, with strict equality at t=0 where \phi_X(0) = 1 = \mathbb{P}(X \in \mathbb{R}).^[13] This bound arises directly from the definition, as |\mathbb{E}[e^{itX}]| \leq \mathbb{E}[|e^{itX}|] = 1, reflecting the unit modulus of the complex exponential.^[14] The characteristic function is uniformly continuous on \mathbb{R}, meaning that for every \epsilon > 0, there exists \delta > 0 such that |t - s| < \delta implies |\phi_X(t) - \phi_X(s)| < \epsilon for all real s, t.^[13] This uniform continuity is established using the dominated convergence theorem: the difference |\phi_X(t) - \phi_X(s)| = |\mathbb{E}[(e^{itX} - e^{isX})]| \leq \mathbb{E}[|e^{itX} - e^{isX}|], and |e^{itX} - e^{isX}| is dominated by the integrable function $2 as |t - s| \to 0.^[14] Furthermore, the characteristic function satisfies the symmetry relation \phi_X(-t) = \overline{\phi_X(t)}, where \overline{\cdot} denotes the complex conjugate.^[13] Consequently, \phi_X(t) is real-valued for all t if and only if the distribution of X is symmetric about zero.^[13]

Moments from Derivatives

The derivatives of the characteristic function \phi_X(t) = \mathbb{E}[e^{itX}] evaluated at t = 0 yield the raw moments of the random variable X.^[14] Specifically, if \mathbb{E}[|X|^n] < \infty, then the characteristic function is n times differentiable at t = 0, and the nth derivative satisfies

\phi_X^{(n)}(0) = i^n \mathbb{E}[X^n],

where i is the imaginary unit. This relation holds more generally provided the absolute moments \mathbb{E}[|X|^k] < \infty for all k = 1, \dots, n, ensuring the interchange of differentiation and expectation via dominated convergence.^[14] For the first few moments, the conditions simplify accordingly: the first derivative exists if \mathbb{E}[|X|] < \infty, yielding \phi_X'(0) = i \mathbb{E}[X]; the second derivative exists if \mathbb{E}[X^2] < \infty, yielding \phi_X''(0) = i^2 \mathbb{E}[X^2] = -\mathbb{E}[X^2]. Higher-order differentiability follows inductively under the corresponding moment conditions, with the Taylor expansion of \phi_X(t) around t = 0 incorporating these terms up to order n.^[14] Central moments, which measure dispersion relative to the mean, can be obtained by adjusting for the location parameter. Let \mu = \mathbb{E}[X], assuming it exists; the characteristic function of the centered variable Y = X - \mu is \phi_Y(t) = e^{-it\mu} \phi_X(t). The derivatives of \phi_Y(t) at t = 0 then give the central moments directly: \phi_Y^{(n)}(0) = i^n \mathbb{E}[(X - \mu)^n]. For instance, the second central moment (variance) requires the second moment to exist and is derived from the second derivative after centering.^[14] As an example, suppose the first and second moments exist. Then \mu = \phi_X'(0) / i and \mathbb{E}[X^2] = \phi_X''(0) / i^2 = -\phi_X''(0), so the variance is

\operatorname{Var}(X) = \mathbb{E}[X^2] - \mu^2 = -\phi_X''(0) - \left( \frac{\phi_X'(0)}{i} \right)^2.

This formula provides a direct computational link between the curvature of the characteristic function at the origin and the spread of the distribution.

Examples Across Distributions

Continuous Distributions

The characteristic function of a continuous random variable X with probability density function f(x) is given by the integral \phi(t) = \int_{-\infty}^{\infty} e^{i t x} f(x) \, dx, which represents the Fourier transform of the density and exists for all real t since |e^{i t x}| = 1.^[14] For the normal distribution with mean \mu and variance \sigma^2 > 0, the density is f(x) = \frac{1}{\sqrt{2\pi} \sigma} \exp\left( -\frac{(x - \mu)^2}{2\sigma^2} \right). Substituting into the integral yields \phi(t) = e^{i t \mu - \frac{1}{2} \sigma^2 t^2}. This result follows from completing the square in the exponent of the Gaussian integral: the integrand becomes \exp\left( i t x - \frac{(x - \mu)^2}{2\sigma^2} \right) = \exp\left( -\frac{(x - \mu - i t \sigma^2)^2}{2\sigma^2} + \frac{1}{2} \sigma^2 t^2 + i t \mu \right), and the shifted Gaussian integral evaluates to \sqrt{2\pi} \sigma e^{i t \mu - \frac{1}{2} \sigma^2 t^2}, normalized by the density's constant.^[15] The uniform distribution on the interval [a, b] with a < b has density f(x) = \frac{1}{b - a} for x \in [a, b] and zero elsewhere. Its characteristic function is \phi(t) = \frac{e^{i t b} - e^{i t a}}{i t (b - a)} for t \neq 0, and \phi(0) = 1. This expression can be rewritten in sinc form as \phi(t) = e^{i t (a + b)/2} \cdot \operatorname{sinc}\left( \frac{t (b - a)}{2} \right), where \operatorname{sinc}(u) = \sin(u)/u, highlighting the oscillatory decay typical of compact support distributions.^[16] For the exponential distribution with rate parameter \lambda > 0, the density is f(x) = \lambda e^{-\lambda x} for x \geq 0 and zero otherwise. The characteristic function is \phi(t) = \frac{\lambda}{\lambda - i t} for all real t, obtained by direct integration: \int_0^{\infty} e^{i t x} \lambda e^{-\lambda x} \, dx = \lambda \int_0^{\infty} e^{-(\lambda - i t) x} \, dx = \frac{\lambda}{\lambda - i t}.^[17] The Cauchy distribution with location \mu and scale \gamma > 0 has density f(x) = \frac{1}{\pi \gamma} \frac{\gamma^2}{(x - \mu)^2 + \gamma^2}. Its characteristic function is \phi(t) = e^{i t \mu - \gamma |t|}, derived via contour integration or recognizing it as the Fourier transform of the Lorentzian profile. This form reflects the distribution's heavy tails, as the non-differentiability at t = 0 implies the absence of finite moments (e.g., no mean or variance)./05%3A_Special_Distributions/5.32%3A_The_Cauchy_Distribution)

Discrete Distributions

For discrete random variables, the characteristic function is computed as the expected value \phi(t) = \mathbb{E}[e^{itX}] = \sum_{x} e^{itx} P(X = x), where the sum is over all possible values x in the support of X.^[18] This summation form leverages the probability mass function and always exists, inheriting the boundedness property |\phi(t)| \leq 1 for all real t.^[14] The Bernoulli distribution, modeling a single trial with success probability p (and failure probability q = 1 - p), has characteristic function \phi(t) = q + p e^{it}.^[19] This follows directly from the definition, as \phi(t) = q \cdot e^{it \cdot 0} + p \cdot e^{it \cdot 1}. The Poisson distribution with rate parameter \lambda > 0 has characteristic function \phi(t) = e^{\lambda (e^{it} - 1)}.^[20] This expression is derived from the probability generating function G(s) = e^{\lambda (s - 1)} by substituting s = e^{it}.^[21] The binomial distribution, as the sum of n independent Bernoulli trials each with success probability p, has characteristic function \phi(t) = (q + p e^{it})^n.^[22] This product form arises because the characteristic function of the sum of independent random variables is the product of their individual characteristic functions. The geometric distribution, counting the number of failures before the first success in independent Bernoulli trials with success probability p (and q = 1 - p), has characteristic function \phi(t) = \frac{p}{1 - q e^{it}}.^[23] This closed form is obtained by summing the infinite series \sum_{k=0}^{\infty} e^{itk} q^k p, which converges for |q e^{it}| < 1.

Uniqueness and Inversion

Uniqueness Theorem

The characteristic function of a probability distribution on the real line uniquely determines the distribution. That is, if two probability measures \mu and \nu on \mathbb{R} have the same characteristic function \phi(t) = \int_{\mathbb{R}} e^{itx} \, d\mu(x) = \int_{\mathbb{R}} e^{itx} \, d\nu(x) for all t \in \mathbb{R}, then \mu = \nu. This result, often referred to as the uniqueness theorem for characteristic functions, holds for all Borel probability measures on \mathbb{R}. A key generalization is Lévy's continuity theorem, which provides conditions under which pointwise convergence of characteristic functions implies weak convergence of the corresponding distributions. Specifically, let \{\phi_n(t)\}_{n=1}^\infty be a sequence of characteristic functions of probability measures \{\mu_n\}_{n=1}^\infty on \mathbb{R}. If \phi_n(t) \to \phi(t) pointwise for all t \in \mathbb{R}, and \phi is continuous at t=0, then \phi is the characteristic function of some probability measure \mu on \mathbb{R}, and \mu_n \to \mu weakly as n \to \infty. This theorem establishes that continuity at the origin is both necessary and sufficient for the limit function to qualify as a characteristic function, ensuring the uniqueness of the limiting distribution in the sense of weak convergence. The proof of Lévy's continuity theorem proceeds by leveraging inversion techniques to verify weak convergence. Under the given conditions, the continuity of \phi at 0 implies the tightness of the family \{\mu_n\}, allowing the use of inversion to show that \int f(x) \, d\mu_n(x) \to \int f(x) \, d\mu(x) for every bounded continuous function f: \mathbb{R} \to \mathbb{R}. This convergence of integrals directly establishes weak convergence of the measures, thereby confirming that the limiting characteristic function uniquely identifies the limiting distribution. Uniqueness may fail in broader contexts beyond probability measures, such as for non-\sigma-additive set functions or measures concentrated on non-measurable sets, where distinct measures can share the same formal Fourier transform. However, these pathological cases do not arise for probability measures, which are always \sigma-additive, tight, and defined on the Borel \sigma-algebra of \mathbb{R}, rendering such counterexamples irrelevant to probabilistic applications. As an extension, when the moments of the distribution exist (i.e., \mathbb{E}[|X|^k] < \infty for k = 1, 2, \dots), the characteristic function uniquely determines these moments via the relation \mathbb{E}[X^k] = i^{-k} \phi^{(k)}(0), where \phi^{(k)} denotes the k-th derivative. This follows directly from the differentiability of \phi under the moment condition and underscores the characteristic function's role in fully characterizing the distribution's algebraic properties when applicable.

Inversion Formulae

Inversion formulae enable the recovery of the cumulative distribution function (CDF) or probability density function (PDF) of a random variable from its characteristic function \phi(t), leveraging the established uniqueness of the characteristic function in determining the distribution. The Lévy inversion formula provides a means to compute differences in the CDF at continuity points a < b:

F(b) - F(a) = \lim_{T \to \infty} \frac{1}{2\pi} \int_{-T}^{T} \frac{e^{-i t a} - e^{-i t b}}{i t} \phi(t) \, dt.

This formula applies to any probability distribution on \mathbb{R}, requiring no further assumptions beyond the continuity of F at a and b.^[6] If the distribution possesses a continuous PDF f and \phi is integrable over \mathbb{R} (i.e., \int_{-\infty}^{\infty} |\phi(t)| \, dt < \infty), the PDF can be retrieved through the Fourier inversion integral:

f(x) = \frac{1}{2\pi} \int_{-\infty}^{\infty} e^{-i t x} \phi(t) \, dt.

The integrability of \phi guarantees both the existence of a continuous PDF and the convergence of the integral to f(x).^[24] For direct evaluation of the CDF F(x) at any real x, the Gil-Pelaez formula offers a practical expression involving only the positive frequency domain:

F(x) = \frac{1}{2} - \frac{1}{\pi} \int_{0}^{\infty} \frac{\operatorname{Im} \left( e^{-i t x} \phi(t) \right)}{t} \, dt,

where \operatorname{Im}(\cdot) denotes the imaginary part. This integral converges for characteristic functions where \phi(t)/t is integrable over (0, \infty) in the imaginary sense, a condition satisfied by most distributions encountered in applications.

Moments, Cumulants, and Generating Functions

Connection to Moment-Generating Function

The characteristic function of a random variable X, denoted \phi_X(t) = \mathbb{E}[e^{itX}] for real t, is closely related to the moment-generating function M_X(t) = \mathbb{E}[e^{tX}]. When the moment-generating function exists in some neighborhood of the origin, the characteristic function is obtained by substituting it for t in the moment-generating function, yielding \phi_X(t) = M_X(it). This relation arises from the formal substitution of the imaginary unit i into the exponent, and it holds through analytic continuation in the complex plane, where the moment-generating function, if analytic, extends to the imaginary axis.^[25] Unlike the moment-generating function, which may fail to exist for certain distributions, the characteristic function is always defined and finite for all real t, as the expectation involves the bounded function |e^{itX}| = 1. For instance, distributions with heavy tails, such as the Cauchy distribution, do not possess a moment-generating function because the integral \int_{-\infty}^{\infty} e^{tx} f(x) \, dx diverges for any t \neq 0, where f(x) is the probability density. In contrast, the characteristic function for the standard Cauchy distribution is \phi_X(t) = e^{-|t|}, which exists everywhere. This universal existence stems from the characteristic function being the Fourier transform of the probability distribution, ensuring convergence without requiring moment conditions.^[26]^[27]^[28] The primary advantage of the characteristic function over the moment-generating function lies in its applicability to all probability distributions, particularly those lacking finite moments of all orders, such as those with infinite variance. While the moment-generating function facilitates the extraction of moments via derivatives at zero—M_X^{(n)}(0) = \mathbb{E}[X^n]—the characteristic function achieves the same through \mathbb{E}[X^n] = i^{-n} \phi_X^{(n)}(0), provided the derivatives exist. This shared property for moment recovery underscores their complementary roles, with the characteristic function offering broader utility in proofs of limit theorems and convergence results where moment conditions are absent.^[14]^[26]

Cumulants via Log-Characteristic Function

The cumulant-generating function of a random variable X is defined as the natural logarithm of its characteristic function: K(t) = \log \phi_X(t), where \phi_X(t) = \mathbb{E}[e^{itX}].^[29] This function admits a Taylor series expansion around t = 0:

K(t) = \sum_{n=1}^\infty \kappa_n \frac{(it)^n}{n!},

where the coefficients \kappa_n are the cumulants of X.^[29] The nth cumulant is obtained from the nth derivative of K(t) evaluated at zero: \kappa_n = (-i)^n K^{(n)}(0).^[29] The first few cumulants correspond to familiar measures of the distribution: the first cumulant \kappa_1 is the mean \mathbb{E}[X], the second \kappa_2 is the variance \mathrm{Var}(X), and the third \kappa_3 is the third central moment \mathbb{E}[(X - \mathbb{E}[X])^3], which serves as a measure of skewness.^[29] Higher-order cumulants, such as \kappa_4, relate to kurtosis and further deviations from normality.^[29] Unlike raw moments, cumulants exhibit additivity under convolution: for independent random variables X and Y, the cumulant-generating function of their sum satisfies K_{X+Y}(t) = K_X(t) + K_Y(t), implying that each \kappa_n(X+Y) = \kappa_n(X) + \kappa_n(Y).^[29] This property simplifies the analysis of sums of independent variables, as cumulants do not involve cross terms that complicate moment addition. Cumulants derived from the log-characteristic function play a central role in the Edgeworth expansion, which refines the central limit theorem by incorporating higher-order corrections for finite-sample approximations of standardized sums.^[30] The expansion expresses the characteristic function of the sum as a perturbation of the normal characteristic function using polynomials in the cumulants of orders three and higher, yielding asymptotic series for densities or distribution functions that capture skewness, kurtosis, and other non-normal features beyond the leading Gaussian term.^[30] This setup leverages the additive structure of cumulants to systematically improve approximations for distributions of sample means or other aggregates.^[31]

Multivariate Extensions

Definition for Vector-Valued Random Variables

The characteristic function extends naturally to vector-valued random variables, generalizing the univariate case where d=1. For a random vector X = (X_1, \dots, X_d)^T in \mathbb{R}^d with distribution function F_X, the characteristic function \phi_X: \mathbb{R}^d \to \mathbb{C} is defined as

\phi_X(t) = \mathbb{E}\left[\exp(i t^T X)\right],

where t = (t_1, \dots, t_d)^T \in \mathbb{R}^d and t^T X = \sum_{j=1}^d t_j X_j denotes the dot product.^[32] This expectation can be expressed in integral form as

\phi_X(t) = \int_{\mathbb{R}^d} \exp(i t^T x) \, dF_X(x),

where the integral is taken with respect to the probability measure induced by F_X.^[33] The resulting \phi_X(t) is a complex-valued function, continuous in t, with \phi_X(0) = 1 and satisfying the bound |\phi_X(t)| \leq 1 for all t \in \mathbb{R}^d.^[3] This bound follows from the fact that \left|\exp(i t^T X)\right| = 1 almost surely, so the modulus of the expectation is at most the expectation of the modulus by the triangle inequality for integrals.^[3] A key example arises for the multivariate normal distribution X \sim \mathcal{N}_d(\mu, \Sigma), where \mu \in \mathbb{R}^d is the mean vector and \Sigma is the d \times d positive semidefinite covariance matrix. In this case,

\phi_X(t) = \exp\left(i \mu^T t - \frac{1}{2} t^T \Sigma t \right).

This form is derived by completing the square in the exponent after expressing the expectation via the density function.^[34]

Properties in Higher Dimensions

In the multivariate setting, the characteristic function \phi(\mathbf{t}) = \mathbb{E}[\exp(i \mathbf{t}^\top \mathbf{X})], where \mathbf{t} \in \mathbb{R}^d and \mathbf{X} is a d-dimensional random vector, serves as the Fourier transform of the probability measure induced by \mathbf{X}. Since this measure is positive, Bochner's theorem implies that \phi(\mathbf{t}) is a positive definite function on \mathbb{R}^d, meaning that for any finite set of points \mathbf{t}_1, \dots, \mathbf{t}_n \in \mathbb{R}^d and complex coefficients c_1, \dots, c_n \in \mathbb{C}, the quadratic form \sum_{j=1}^n \sum_{k=1}^n c_j \overline{c_k} \phi(\mathbf{t}_j - \mathbf{t}_k) \geq 0. This property extends the univariate case and underpins the characterization of valid characteristic functions in higher dimensions.^[33] The marginal characteristic function of a subvector \mathbf{X}_J, corresponding to a subset J \subseteq \{1, \dots, d\}, is obtained by setting the components of \mathbf{t} outside J to zero in the joint characteristic function, i.e., \phi_{\mathbf{X}_J}(\mathbf{t}_J) = \phi(\mathbf{t}) where t_j = 0 for j \notin J. These relations facilitate the derivation of marginal properties without direct integration over the density.^[5] Independence of subvectors \mathbf{X} and \mathbf{Y} holds if and only if the joint characteristic function factors as \phi_{\mathbf{X}, \mathbf{Y}}(\mathbf{t}, \mathbf{s}) = \phi_{\mathbf{X}}(\mathbf{t}) \phi_{\mathbf{Y}}(\mathbf{s}) for all \mathbf{t}, \mathbf{s}. This criterion provides a direct test for independence in the multivariate framework, generalizing the univariate product property. Regarding uniqueness, the Cramér-Wold theorem establishes that the joint distribution of \mathbf{X} is uniquely determined by the one-dimensional characteristic functions of all linear projections \mathbf{t}^\top \mathbf{X} for \mathbf{t} \in \mathbb{R}^d, i.e., two random vectors have the same distribution if and only if \phi_{\mathbf{t}^\top \mathbf{X}_1}(u) = \phi_{\mathbf{t}^\top \mathbf{X}_2}(u) for all \mathbf{t} \in \mathbb{R}^d and u \in \mathbb{R}. Boundedness by 1 and continuity at the origin extend directly from the univariate properties to the multivariate case.^[5]

Applications in Probability and Statistics

Proofs of Limit Theorems

Characteristic functions play a central role in establishing limit theorems for sums of random variables, as their algebraic properties and continuity theorem facilitate convergence arguments without direct manipulation of densities or distributions. One of the most prominent applications is the proof of the central limit theorem (CLT) for independent and identically distributed (i.i.d.) random variables. Consider i.i.d. random variables X_1, X_2, \dots with mean \mu = 0 and finite positive variance \sigma^2. Let S_n = X_1 + \dots + X_n be the partial sum and define the normalized sum Z_n = (S_n - n\mu)/\sqrt{n \sigma^2}. The characteristic function of Z_n is \phi_{Z_n}(t) = [\phi(t / \sqrt{n})]^n, where \phi(t) = \mathbb{E}[e^{it X_1}] is the characteristic function of each X_i. To prove convergence in distribution to the standard normal N(0,1), expand the logarithm of the characteristic function near zero: since \mathbb{E}[X_1] = 0 and \mathrm{Var}(X_1) = \sigma^2 < \infty, the Taylor expansion yields \log \phi(u) = -\frac{\sigma^2 u^2}{2} + o(u^2) as u \to 0. Substituting u = t / \sqrt{n} gives

n \log \phi\left( \frac{t}{\sqrt{n}} \right) = n \left( -\frac{\sigma^2 t^2}{2n} + o\left( \frac{1}{n} \right) \right) = -\frac{\sigma^2 t^2}{2} + o(1),

so \log \phi_{Z_n}(t) \to -\frac{t^2}{2} and thus \phi_{Z_n}(t) \to e^{-t^2/2}, the characteristic function of N(0,1). By Lévy's continuity theorem, Z_n converges in distribution to N(0,1).^[35] The CLT extends to non-identical distributions via the Lindeberg-Feller theorem, which applies to triangular arrays of independent random variables X_{n1}, \dots, X_{nn} with zero means, finite variances, and total variance s_n^2 = \sum_{k=1}^n \mathbb{E}[X_{nk}^2] \to \infty. The key Lindeberg condition requires that for every \epsilon > 0,

\frac{1}{s_n^2} \sum_{k=1}^n \mathbb{E}\left[ X_{nk}^2 \mathbf{1}_{|X_{nk}| > \epsilon s_n} \right] \to 0

as n \to \infty, ensuring no single term dominates the sum (uniform asymptotic negligibility). The proof proceeds similarly using characteristic functions: the log-characteristic function of the normalized sum \sum_{k=1}^n X_{nk}/s_n is \sum_{k=1}^n \log \phi_{X_{nk}}(t / s_n). Under the Lindeberg condition, each \log \phi_{X_{nk}}(u) = -\frac{\mathbb{E}[X_{nk}^2] u^2}{2} + o(u^2) with the remainder uniformly controlled by uniform integrability of X_{nk}^2 / s_n^2 on compact sets, leading to convergence of the sum to -\frac{t^2}{2}. Thus, the characteristic function converges to e^{-t^2/2}, implying the CLT by continuity. Local limit theorems provide finer approximations, establishing pointwise convergence of probabilities or densities for lattice or non-lattice distributions. For i.i.d. random variables satisfying the CLT conditions, the inversion formula for characteristic functions enables such results: the probability P(S_n / \sqrt{n} \in [x, x + h]) or density at x converges to the normal density \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \cdot h (adjusted for lattice span). Specifically, for lattice distributions on integers with span 1, the local limit theorem states that

\sup_x \left| \sqrt{n} P(S_n = x) - \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left( -\frac{(x - n\mu)^2}{2 n \sigma^2} \right) \right| \to 0

as n \to \infty. The proof uses the inversion formula for the probability mass function P(S_n = x) = \frac{1}{2\pi} \int_{-\pi}^{\pi} e^{-i t x} [\phi(t)]^n \, dt (for lattice case) and analyzes the characteristic function's behavior, using the CLT convergence to approximate the integral pointwise while controlling the tails. This yields uniform or pointwise density convergence under moment conditions. Characteristic functions also characterize stable distributions, which arise as limits of normalized sums under weaker moment conditions (e.g., infinite variance). A distribution is \alpha-stable for $0 < \alpha \leq 2 if the characteristic function is of the form \begin{equation*} \phi(t) = \exp\left{ i \mu t - c |t|^\alpha \left(1 - i \beta \sign(t) \Phi \right) \right}, \end{equation*} where \mu \in \mathbb{R} is the location parameter, c > 0 is the scale, \beta \in [-1,1] is the skewness, \sign(t) is the sign function, and \Phi = \tan(\pi \alpha / 2) for \alpha \neq 1 (with a logarithmic adjustment for \alpha = 1). For \alpha = 2, this recovers the normal distribution (\beta = 0); for \alpha = 1, \beta = 0, the Cauchy; and for \alpha = 1/2, \beta = 1, the Lévy distribution. These forms ensure stability under convolution: sums of i.i.d. stable variables, suitably normalized, retain the stable distribution. Proofs of limit theorems to stable laws use similar log-expansions of characteristic functions, replacing the quadratic term with |t|^\alpha under tail conditions like regular variation of the distribution.^[36]

Empirical Estimation and Goodness-of-Fit Tests

The empirical characteristic function (ECF) serves as a nonparametric estimator of the true characteristic function from a sample of independent and identically distributed random variables X_1, \dots, X_n. It is defined as

\hat{\phi}_n(t) = \frac{1}{n} \sum_{j=1}^n e^{i t X_j},

for t \in \mathbb{R}, where i = \sqrt{-1}. This estimator is unbiased for the population characteristic function \phi(t) = \mathbb{E}[e^{i t X}] and preserves all information in the empirical distribution function due to its Fourier transform relationship.^[37] Under mild regularity conditions, such as finite second moments or weak dependence in stationary processes, the ECF converges uniformly and strongly to the true characteristic function on compact intervals. Specifically, \sup_{|t| \leq T} |\hat{\phi}_n(t) - \phi(t)| \to 0 almost surely for any fixed T > 0, with an asymptotic normality rate of \sqrt{n} for pointwise convergence. These properties hold for both i.i.d. samples and dependent data, enabling reliable inference even when higher moments do not exist.^[37]^[38] ECF-based goodness-of-fit tests assess whether sample data conform to a hypothesized distribution with known or estimated characteristic function \phi_0(t). A common approach is the Cramér-von Mises-type statistic, which measures discrepancy via the integrated squared difference \int_{-\infty}^{\infty} |\hat{\phi}_n(t) - \phi_0(t)|^2 w(t) \, dt, where w(t) is a positive weight function ensuring integrability, or the supremum norm \sup_t |\hat{\phi}_n(t) - \phi_0(t)|. For normality testing, the Epps-Pulley statistic integrates the squared difference over a finite interval with weights derived from the Gaussian characteristic function, yielding asymptotic chi-squared distribution under the null hypothesis. Extensions to composite hypotheses involve estimating \phi_0(t) via minimum distance methods, maintaining consistency and controlled size in large samples.^[39] These tests prove particularly valuable for fitting heavy-tailed models, such as stable distributions, where traditional moment-based methods fail due to non-existent variance. The closed-form characteristic function of stable laws allows direct minimization of the weighted integrated squared error between \hat{\phi}_n(t) and the parametric form, providing efficient parameter estimates for financial returns exhibiting skewness and kurtosis. In such applications, the ECF avoids simulation biases and handles infinite variance effectively, outperforming quantile-based alternatives in tail regions.^[37]^[36] Recent computational advances in the 2020s leverage fast Fourier transform (FFT) algorithms to accelerate ECF evaluation and inversion, enhancing scalability for large datasets in heavy-tailed modeling. For instance, FFT-based procedures integrate the ECF with residuals in stochastic frontier models to estimate efficiency distributions, achieving rapid density recovery without closed-form assumptions. These methods, applied to tempered stable variants, support real-time parameter fitting in high-frequency finance, with improved efficiency compared to direct summation in simulations.^[40]^[41]

Historical Context and Recent Developments

Origins and Key Contributors

The concept of the characteristic function in probability theory traces its early roots to the late 18th century through Pierre-Simon Laplace's development of generating functions in his seminal work Théorie analytique des probabilités (1812), where he introduced exponential generating functions as tools for analyzing probability distributions and their moments.^[42] These functions provided a foundational framework for encoding distributional properties, laying groundwork for later transforms in probability. Complementing this, Joseph Fourier's 1822 treatise Théorie analytique de la chaleur established the Fourier transform through his analysis of the heat equation, which mathematically underpins the characteristic function as the Fourier transform of a probability density.^[43] A pivotal advancement occurred in 1925 when Paul Lévy formally introduced the characteristic function into probability theory in his book Calcul des probabilités, defining it as the Fourier transform of the probability distribution and demonstrating its role in uniquely determining the distribution via what is now known as Lévy's uniqueness theorem.^[10] This contribution shifted focus from moment-based methods to transform techniques, enabling proofs of distributional uniqueness without relying on infinite moments. Building on this, Salomon Bochner's 1932 work Vorlesungen über Fourierintegrale characterized continuous positive definite functions as Fourier transforms of probability measures, providing a theorem essential for verifying when a function serves as a characteristic function of a valid distribution.^[44] Harald Cramér further systematized the theory in his 1937 monograph Random Variables and Probability Distributions, offering the first comprehensive exposition of characteristic functions for univariate random variables, including their properties under convolution and applications to limit theorems.^[45] William Feller's influential 1950 textbook An Introduction to Probability Theory and Its Applications (Volume I) popularized these ideas among a broader audience, integrating characteristic functions into pedagogical treatments of probability and stochastic processes.^[46] Boris Gnedenko and Andrey Kolmogorov provided foundational work on limit distributions in their 1954 book Limit Distributions for Sums of Independent Random Variables. Extensions to multivariate cases were developed in subsequent works during the 1960s.^[47]

Modern Advances Post-2020

Recent advancements in computational methods for characteristic functions have focused on efficient inversion techniques to handle high-dimensional data. A generalized approach to density reconstruction via inversion of the empirical characteristic function has been proposed, enabling robust estimation even with complex empirical data by leveraging smoothing techniques on the complex-valued empirical estimates. This method addresses challenges in high-dimensional settings by providing a flexible framework for recovering probability densities from characteristic functions without assuming specific parametric forms.^[48] In machine learning, characteristic functions have been integrated into generative adversarial networks (GANs) to improve distribution matching, particularly for conditional image generation and sequential data. For instance, the conditional characteristic function GAN (CCF-GAN) employs neural networks to approximate conditional characteristic functions, reducing discrepancies in generated distributions by optimizing in the Fourier domain, which enhances stability and fidelity in image synthesis tasks. Similarly, the path characteristic function GAN (PCF-GAN) extends this to sequential data by defining a path characteristic function on the measure space of trajectories, allowing GANs to capture temporal dependencies and generate realistic time series with improved sample quality over traditional methods.^[49]^[50] New theoretical developments post-2020 include extensions of characteristic functions to non-stationary processes and multivariate settings. The PCF-GAN framework introduces time-varying characteristic functions for paths in non-stationary time series, providing a principled way to represent and match distributions on path spaces, which traditional stationary assumptions cannot handle effectively. In higher dimensions, explicit derivations of the characteristic function for the multivariate folded normal distribution have resolved longstanding computational challenges, offering closed-form expressions that facilitate analysis of folded data in statistics and physics applications.^[50]^[51] Updated empirical estimation methods have filled gaps in applying characteristic functions to big data, particularly through indirect inference techniques that minimize discrepancies between empirical and simulated characteristic functions for time series models. These approaches scale to large datasets by integrating weighted mean squared errors over the characteristic function domain, resolving prior uncertainties in parameter estimation for complex models via simulation-based validation. Such methods have been demonstrated to outperform direct likelihood-based estimators in high-volume scenarios, providing reliable goodness-of-fit assessments without exhaustive numerical integration.^[52]

References

[1]
[PDF] Unit 8: Characteristic functions
Given X ∈ L, its characteristic function is a complex-valued function on R defined as ϕX(t) = E[eitX]. Compare this with the moment generating function.
[2]
Characteristic functions (aka Fourier Transforms)
It is a basic fact that the characteristic function of a random variable uniquely determine the distribution of a random variable. Furthermore, the following ...
[3]
[PDF] Overview 1 Characteristic Functions
Characteristic functions are essentially Fourier transformations of distribution functions, which provide a general and powerful tool to analyze probability ...
[4]
[PDF] Characteristic Functions and the Central Limit Theorem
The characteristic function of a random variable X is defined by, φX(t) = E[eitX] for all t ∈ R, where i denotes the imaginary unit. Lemma 1.2. For any random ...
[5]
[PDF] Lecture 8 Characteristic Functions
Dec 8, 2013 · A characteristic function is simply the Fourier transform, in probabilis- tic language. Since we will be integrating complex-valued functions, ...
[6]
[PDF] Math 639: Lecture 5 - Characteristic functions, central limit theorems
Feb 7, 2017 · Definition. The Lévy Metric on two distribution functions is defined by ... characteristic function converging to φ, which is the characteristic.
[7]
[PDF] 18.600: Lecture 26 Moment generating functions and characteristic ...
Characteristic functions. ▷ Let X be a random variable. ▷ The characteristic function of X is defined by. ϕ(t) = ϕX (t) := E[eitX ]. Like M(t) except with i ...Missing: mathematics | Show results with:mathematics<|control11|><|separator|>
[8]
Characteristic function - MyWeb
Sep 23, 2025 · The characteristic function is the Fourier transform of the probability density. Definition: The characteristic function of a random ...
[9]
[PDF] Multivariate normal distributions: characteristic functions
Nov 3, 2008 · The resulting function is called the characteristic function, formally defined by. φX (t) = E[e itX ]. For example, when X is a continuous ...
[10]
[PDF] Overview 1 Probability spaces - UChicago Math
Mar 21, 2016 · The characteristic function is a version of the Fourier transform. Definition The characteristic function of a a random variable X is the ...
[11]
Characteristic Function - Wiley Online Library
The name characteristic function is due to Paul Levy in his book Calcul des probabilités Lévy (1925) who reintroduces the same function as Laplace. Since.
[12]
[PDF] Probability and Measure - University of Colorado Boulder
Measure and integral are used together in Chapters 4 and 5 for the study of random sums, the Poisson process, convergence of measures, characteristic functions, ...
[13]
[PDF] Comparison of the Engineers' Fourier transform and the Definition of ...
In this appendix, the most important differences are listed. In the engineering definition, either the variable f or the variable ω is used in the frequency ...Missing: probability | Show results with:probability
[14]
[PDF] arXiv:1705.08744v3 [math.HO] 18 Dec 2017
Dec 18, 2017 · Lévy, Calcul des Probabilités, Gauthier-Villars, Paris (1925), pp. viii+350. - B. de Finetti, Funzione caratteristica di un fenomeno ...<|control11|><|separator|>
[15]
[PDF] Characteristic Functions and the Central Limit Theorem
The main advantage of the characteristic function over transforms such as the. Laplace transform, probability generating function or the moment generating.
[16]
Characteristic function - StatLect
The characteristic function (cf) is a complex function that completely characterizes the distribution of a random variable.Definition · Deriving moments with the... · Characterization of a... · More details
[17]
[PDF] Characteristic function of the Gaussian probability density
Characteristic function of the Gaussian probability density. The probability density of a Gaussian (or “normal distribution”) with mean µ and variance σ2 is.
[18]
Uniform distribution | Properties, proofs, exercises - StatLect
A continuous random variable has a uniform distribution if all the values belonging to its support have the same probability density.Definition · Moment generating function · Distribution function · Density plots
[19]
Exponential distribution | Properties, proofs, exercises - StatLect
The exponential distribution is a continuous probability distribution used to model the time elapsed before a given event occurs.How the distribution is used · Definition · Characteristic function · More details
[20]
Characteristic Functions - Probability Course
If a random variable does not have a well-defined MGF, we can use the characteristic function defined as ϕX(ω)=E[ejωX],. where j=√ ...
[21]
Bernoulli distribution | Properties, proofs, exercises - StatLect
The Bernoulli distribution models binary outcomes, where a random variable takes 1 for success (probability p) and 0 for failure (probability 1-p).
[22]
Poisson distribution | Properties, proofs, exercises - StatLect
The Poisson distribution is a discrete probability distribution used to model the number of occurrences of a random event.
[23]
6. Generating Functions - Random Services
Recall that the Poisson distribution has probability density function f given by f ( n ) = e − a a n n ! , n ∈ N where a ∈ ( 0 , ∞ ) is a parameter.
[24]
Binomial distribution | Properties, proofs, exercises - StatLect
The binomial distribution is a univariate discrete distribution used to model the number of favorable outcomes obtained in a repeated experiment.How the distribution is used · Definition · Relation to the Bernoulli...
[25]
Geometric distribution | Properties, proofs, exercises - StatLect
The geometric distribution is the probability distribution of the number of failures we get by repeating a Bernoulli experiment until we obtain the first ...
[26]
[PDF] Fourier Transforms of Measures - Stat@Duke
Suppose that two probability distributions µ1(A) = P[X1 ∈ A] and µ2(A) = P[X2 ∈ A] have the same Fourier transforms ˆµj(ω) = E[eiωXj ] = RR eiωx µj(dx); does it ...
[27]
[PDF] Math 6070 A Primer on Probability Theory
If X has a moment generating function M, then it can be shown that M(it) = φ(t). [This uses the technique of “analytic continuation” from complex analysis.] In ...
[28]
[PDF] 18.440: Lecture 27 Moment generating functions and characteristic ...
▶ Proofs using characteristic functions apply in more generality, but they require you to remember how to exponentiate imaginary numbers.
[29]
[PDF] STAT 24400 Lecture 11 Section 4.5 Moment Generating Functions ...
Cauchy Distribution Has No MGF. The Cauchy Distribution has the PDF. f(x) = 1. 𝜋(1 + x2). , −∞ ≤ x < ∞. Its MGF would be. M(t) = ∫. ∞. −∞. etx. 𝜋 ...
[30]
https://mathworld.wolfram.com/EdgeworthSeries.html
[31]
[PDF] TOPIC. Cumulants. Just as the generating function M of a ran
Jan 11, 2001 · Just as the generating function M of a ran- dom variable X “generates” its moments, the logarithm of M gen- erates a sequence of numbers called ...
[32]
Edgeworth Series -- from Wolfram MathWorld
The Edgeworth series is obtained by collecting terms to obtain the asymptotic expansion of the characteristic function of the form
[33]
[PDF] la ci nh ce T tn e m tr ap e D sci tsi ta ts oi B tr op e R
but later it will be what we use to obtain the Edgeworth expansion. Let ψ(t) be the characteristic function of Ψ(x) and let γ1,γ2,··· be its cumulants. Then ...<|control11|><|separator|>
[34]
[PDF] Multivariate Normal Distributions Continued; Characteristic Functions
Definition 3. A random vector X has a (multivariate) normal distribution if for every real vector a, the random variable a T X is normal.
[35]
[PDF] Lecture 8 Characteristic Functions
bounded and continuous function given by. dµ dλ. = f, where f(x) = 1. 2π. ∫. R e ... Let ϕ be a characteristic function of some probability measure µ on B(R).
[36]
[PDF] STAT 830 The Multivariate Normal Distribution
Definition: X ∈ Rp has a multivariate normal distribution if it has ... . If X ∈ Rp then the characteristic function (cf) of X is. φX (t) = EheitT X ...
[37]
[PDF] 39. Probability - Particle Data Group
Jun 1, 2020 · Let the (partial) characteristic function corresponding to the conditional p.d.f. f2(x|z) be φ2(u|z), and the p.d.f. of z be f1(z). The ...
[38]
An introduction to probability theory and its applications. Volume II
Jul 18, 2014 · An introduction to probability theory and its applications. Volume II. by: Feller, Vilim (1906-1970). Auteur.
[39]
[PDF] Stable Distributions - EdSpace
ble stable distributions is through the characteristic function or Fourier transform. ... a Lévy(γ,δ) distribution is stable with (α = 1/2,β = 1,a = γ,b ...
[40]
[PDF] Empirical Characteristic Function Estimation and Its Applications
In all cases, the resulting ECF estimator is strongly consistent and asymptotically normally distributed. More interestingly, the convergence rate for the ECF ...
[41]
(PDF) Empirical characteristic function in time series estimation
Aug 7, 2025 · This paper discusses an estimation method via the ECF for strictly stationary processes.
[42]
Goodness-of-Fit Tests for a Multivariate Distribution by the Empirical ...
In this paper, we take the characteristic function approach to goodness-of-fit tests. It has several advantages over existing methods: First, unlike the ...
[43]
Performance estimation when the distribution of inefficiency
Our procedure, which is based on the Fast Fourier Transform (FFT), utilizes the empirical characteristic function of the residuals in SFMs or efficiency scores ...
[44]
[PDF] arXiv:2205.00586v1 [q-fin.ST] 2 May 2022
May 2, 2022 · The theoretical tools developed are used to perform empirical analysis. The GTS distribution is fitted using three indexes: S&P 500, SPY ETF and.
[45]
[PDF] THE ANALYTIC THEORY OF PROBABILITIES Third Edition Book I ...
Generating functions remain important in mathematics. Part 2 extends the theory of generating functions to two variables. • Book II. Here Laplace presents ...
[46]
Highlights in the History of the Fourier Transform - IEEE Pulse
Jan 25, 2016 · Integral transforms were invented by the Swiss mathematician Leonhard Euler (1707–1783) within the context of second-order differential equation ...
[47]
[PDF] SALOMON BOCHNER August 20, 1899–May 2, 1982 BY ANTHONY ...
May 2, 1982 · This book contains what is now often known simply as Bochner's Theorem,6 characterizing continuous positive definite functions on Euclidean ...<|separator|>
[48]
Random Variables and Probability Distributions - Google Books
Random Variables and Probability Distributions, Issue 36, Part 1. Front Cover. Harald Cramér. University Press, 1962 - Distribution (Probability theory).
[49]
[PDF] 1957-feller-anintroductiontoprobabilitytheoryanditsapplications-1.pdf
Its widespread use was understandable as long as its point of view was new and its material was not otherwise available. But the popularity.
[50]
(PDF) On the Vector Representation of Characteristic Functions
Oct 8, 2023 · 7. Gnedenko, B.V.; Kolmogorov, A.N. Limit Distributions for Sums of Independent Random Variables; Addison-Wesley Mathematics. Series; Addison ...
[51]
Generalized distribution reconstruction based on the inversion of ...
This paper proposes a generalized density reconstruction method based on the inversion of characteristic function, which is estimated based on the complex ...<|control11|><|separator|>
[52]
[PDF] Neural Characteristic Function Learning for Conditional Image ...
In this paper, we propose a novel cGAN architecture upon the characteristic function (CF) of random variables,. i.e., conditional characteristic function GAN ( ...
[53]
[2305.12511] PCF-GAN: generating sequential data via the ... - arXiv
May 21, 2023 · PCF-GAN: generating sequential data via the characteristic function of measures on the path space. Authors:Hang Lou, Siran Li, Hao Ni.
[54]
Characteristic function and moment generating function of ...
May 10, 2025 · In this study, we derive the characteristic function of the multivariate folded normal distribution, a distribution that arises when the ...
[55]
Indirect inference for time series using the empirical characteristic ...
Jan 3, 2021 · We estimate the parameter of a stationary time series process by minimizing the integrated weighted mean squared error between the empirical and simulated ...<|control11|><|separator|>