Fact-checked by Grok 2 weeks ago

Isserlis's theorem

Isserlis's theorem is a fundamental result in probability theory that provides an explicit formula for the expected value of the product of components from a zero-mean multivariate Gaussian random vector, expressing it as a sum over all possible perfect matchings (pairings) of the indices, where each pairing contributes the product of the corresponding covariances between the paired variables.^[1] For an odd number of variables, no such pairings exist, so the expectation is zero.^[2] Also known as Wick's probability theorem, it reduces the computation of higher-order moments to second-order covariances, making it a cornerstone for analyzing Gaussian processes.^[3] The theorem was first derived by British statistician Leon Isserlis and published in 1918, building on his earlier work for four variables in 1916; the general case was established via an inductive proof in the 1918 paper.^[1] Isserlis's original formulation addressed the product-moment coefficients of normal frequency distributions in multiple variables, a key concern in early 20th-century statistics.^[1] Modern proofs often leverage properties of isotropic tensors or generating functions, confirming the result as a corollary to broader invariance principles in tensor algebra.^[2] Isserlis's theorem has broad applications across fields, including the computation of moments in Gaussian chaos expansions for stochastic differential equations and numerical methods for ergodic dynamics.^[3] In signal processing, extensions of the theorem facilitate derivations of bispectral densities for nonlinear systems driven by mixed-Gaussian noise.^[4] It also underpins analyses in random vibrations involving non-Gaussian mixtures.^[5] In physics, the theorem reappears as Wick's theorem, enabling contractions in perturbative quantum field theory calculations.^[6] The number of pairings for $2k variables is given by the double factorial (2k-1)!! = \frac{(2k)!}{2^k k!}, highlighting its combinatorial structure.^[2]

Introduction

Overview and Importance

Isserlis's theorem provides a formula for computing the expectation of the product of any number of jointly Gaussian random variables by expressing it as a sum over all possible pairings of the variables, with each term consisting of the product of their pairwise covariances. For a zero-mean multivariate Gaussian vector Y = (Y_1, \dots, Y_n)^T, the theorem states that E[Y_1 \cdots Y_n] = 0 if n is odd, while for even n = 2k,

E[Y_1 \cdots Y_n] = \sum_{p \in PP(n)} \prod_{(l,r) \in p} E[Y_l Y_r],

where PP(n) denotes the set of all pair partitions of \{1, \dots, n\}.^[2] This formulation assumes zero means, but extends to general cases by centering the variables, relying solely on the covariance structure.^[7] The theorem's primary importance lies in its ability to reduce the calculation of higher-order moments to combinations of second-order moments, bypassing the need for multidimensional integrals over the Gaussian density. This reduction is crucial for theoretical developments in probability, such as analyzing the dependence structure in Gaussian processes, and for computational efficiency in simulations and approximations involving multivariate normals.^[2] By leveraging only the covariance matrix, it facilitates tractable expressions for moments that would otherwise be intractable, underpinning advancements in statistical inference and stochastic modeling.^[7] In physics, Isserlis's theorem is equivalent to Wick's theorem, which applies the same pairing principle to evaluate correlation functions in Gaussian quantum field theories.^[2]

Historical Development

Isserlis first presented the result for four variables in his 1916 paper "On Certain Probable Errors and Correlation Coefficients of Multiple Frequency Distributions with Skew Regression," published in Biometrika.^[8] Isserlis's theorem was first derived by Leon Isserlis in 1918, in his seminal paper titled "On a Formula for the Product-Moment Coefficient of Any Order of a Normal Frequency Distribution in Any Number of Variables," published in Biometrika. In this work, Isserlis provided a recursive formula for computing higher-order central moments of multivariate normal distributions, focusing on finite-order moments without assuming zero mean for the variables. This derivation addressed the need for efficient calculation of product-moment coefficients in statistical analysis of normal distributions. The theorem is named after Isserlis in the field of statistics, reflecting its origins in probabilistic moment computations. It was independently rediscovered in 1954 by physicist Gian-Carlo Wick, who formulated an analogous result in the context of quantum field theory, where it is known as Wick's theorem.^[9] Wick's version, detailed in his paper "Properties of Bethe-Salpeter Wave Functions," enabled the simplification of time-ordered products of field operators into normal-ordered forms plus contractions, facilitating perturbative calculations in quantum electrodynamics. Following its initial publications, the theorem saw early applications in statistical mechanics, particularly for evaluating correlation functions in Gaussian models and free field theories.^[6] In the 2010s, researchers extended the result to more complex distributions, such as mixed-Gaussian random variables, with a notable generalization provided by Michalowicz et al. in 2011, which accommodates mixtures where the mean is a random variable independent of the Gaussian component.^[10] These developments broadened the theorem's utility beyond pure Gaussian cases while preserving its combinatorial structure for moment calculations.

Background Concepts

Multivariate Gaussian Distributions

A multivariate Gaussian distribution, also known as the multivariate normal distribution, describes a random vector \mathbf{X} = (X_1, \dots, X_n)^T in \mathbb{R}^n that follows \mathbf{X} \sim \mathcal{N}_n(\boldsymbol{\mu}, \boldsymbol{\Sigma}), where \boldsymbol{\mu} is the mean vector and \boldsymbol{\Sigma} is the covariance matrix. This distribution is characterized by the property that every linear combination \boldsymbol{\lambda}^T \mathbf{X} for \boldsymbol{\lambda} \in \mathbb{R}^n follows a univariate normal distribution \mathcal{N}(\boldsymbol{\lambda}^T \boldsymbol{\mu}, \boldsymbol{\lambda}^T \boldsymbol{\Sigma} \boldsymbol{\lambda}).^[11] The probability density function of \mathbf{X} \sim \mathcal{N}_n(\boldsymbol{\mu}, \boldsymbol{\Sigma}), assuming \boldsymbol{\Sigma} is positive definite, is given by

f(\mathbf{x} \mid \boldsymbol{\mu}, \boldsymbol{\Sigma}) = (2\pi)^{-n/2} (\det \boldsymbol{\Sigma})^{-1/2} \exp\left( -\frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right),

for \mathbf{x} \in \mathbb{R}^n. The covariance matrix \boldsymbol{\Sigma} is symmetric and positive semidefinite, with diagonal elements representing variances and off-diagonal elements capturing covariances between components, defined as \boldsymbol{\Sigma} = \mathbb{E}[(\mathbf{X} - \boldsymbol{\mu})(\mathbf{X} - \boldsymbol{\mu})^T]. A special case is the zero-mean or centered Gaussian distribution, where \boldsymbol{\mu} = \mathbf{0}, simplifying many expressions and often used as a reference for theoretical analysis.^[11] Key properties of the multivariate Gaussian include closure under linear transformations: if \mathbf{X} \sim \mathcal{N}_n(\boldsymbol{\mu}, \boldsymbol{\Sigma}), then for any matrix \mathbf{A} and vector \mathbf{b}, \mathbf{Y} = \mathbf{A} \mathbf{X} + \mathbf{b} \sim \mathcal{N}_m(\mathbf{A} \boldsymbol{\mu} + \mathbf{b}, \mathbf{A} \boldsymbol{\Sigma} \mathbf{A}^T), where m is the dimension of \mathbf{Y}. Regarding joint moments, the second-order moments are fully determined by the mean and covariance matrix, as \mathbb{E}[\mathbf{X}] = \boldsymbol{\mu} and \mathrm{Cov}(\mathbf{X}) = \boldsymbol{\Sigma}; however, higher-order moments, while also derivable from \boldsymbol{\mu} and \boldsymbol{\Sigma} via the moment-generating function, introduce dependencies beyond pairwise relations.^[11] For partitioned vectors \mathbf{X} = (\mathbf{X}_1^T, \mathbf{X}_2^T)^T with corresponding blocks in \boldsymbol{\mu} and \boldsymbol{\Sigma}, the components \mathbf{X}_1 and \mathbf{X}_2 are independent if and only if the off-block covariance submatrix is zero, i.e., \boldsymbol{\Sigma}_{12} = \mathbf{0}, making \boldsymbol{\Sigma} block-diagonal. This equivalence between uncorrelatedness and independence is a hallmark of the Gaussian distribution, distinguishing it from more general multivariate setups. While second-order moments suffice for many analyses, the structure of higher moments in dependent Gaussians adds layers of complexity addressed in related moment computations.^[11]

Moments and Their Computation

In probability theory, the moments of a random vector \mathbf{X} = (X_1, \dots, X_n)^T quantify the distribution's shape and dependence structure. The raw (or non-central) moment of order (k_1, \dots, k_n) is defined as \mathbb{E}[X_1^{k_1} \cdots X_n^{k_n}], where each k_i is a non-negative integer representing the power of the i-th component. Central moments, which measure deviations from the mean \boldsymbol{\mu} = \mathbb{E}[\mathbf{X}], are given by \mathbb{E}[(X_1 - \mu_1)^{k_1} \cdots (X_n - \mu_n)^{k_n}]. In multivariate analysis, joint moments often simplify to expectations of products, such as \mathbb{E}\left[\prod_{i=1}^m X_{j_i}\right] for m \leq n, allowing assessment of correlations across variables.^[12] Computing higher-order moments poses significant challenges, particularly for non-Gaussian distributions. In general, these moments depend on the entire joint probability density function, requiring full specification of the distribution to evaluate via integration or simulation; without such knowledge, approximations like Monte Carlo methods are necessary but can be inefficient for high dimensions or orders. For multivariate Gaussian distributions, whose density is parameterized solely by the mean \boldsymbol{\mu} and covariance matrix \boldsymbol{\Sigma}, higher-order moments are fully determined by these first- and second-order statistics alone—a property not shared by non-Gaussian cases. However, explicit computation still demands handling combinatorial expansions involving products of covariances, which scale factorially with the moment order and necessitate systematic enumeration of pairings or permutations.^[12]^[13] Cumulants offer an alternative characterization, derived from the logarithm of the moment-generating function \mathbb{E}[e^{\mathbf{t}^T \mathbf{X}}]; the k-th cumulant is the k-th derivative at \mathbf{t} = \mathbf{0}. Unlike moments, cumulants exhibit additivity under independent summation, simplifying analysis of sums of random variables. For Gaussian distributions, the first cumulant equals the mean, the second the variance-covariance, and all higher-order cumulants vanish identically, underscoring the absence of skewness, kurtosis, or further deviations inherent in the Gaussian form. This contrasts with non-Gaussian distributions, where non-zero higher cumulants encode tail behavior and multimodality.^[14] Direct integration of the multivariate Gaussian density to obtain \mathbb{E}\left[\prod_{i=1}^m X_i\right] for m > 4 is particularly intractable, as it involves high-dimensional contour integrals over exponential quadratics without closed-form reduction beyond low orders, often requiring numerical or recursive methods prone to combinatorial explosion.^[12]

Formulation of the Theorem

Statement for Odd Moments

Isserlis's theorem addresses the computation of higher-order moments for jointly Gaussian random variables, with the odd moments case providing a particularly simple result. Specifically, consider $2m+1 jointly Gaussian random variables X_1, \dots, X_{2m+1} that are distributed according to a multivariate normal distribution with zero mean, for some nonnegative integer m. The theorem states that the expected value of the product of these variables is zero:

E\left[ \prod_{i=1}^{2m+1} X_i \right] = 0.

This vanishing result holds irrespective of the covariances between the variables, as long as the joint distribution remains multivariate Gaussian with zero mean.^[13]^[7] The underlying reason for this outcome lies in the symmetry properties of the multivariate Gaussian density function, which is centered at the origin when the mean is zero. The product \prod_{i=1}^{2m+1} X_i constitutes an odd function under the transformation X_i \to -X_i for all i, and integration of such an odd function over a symmetric domain around the origin yields zero.^[13] This symmetry ensures the cancellation of positive and negative contributions in the expectation integral, leading to the null result for any odd order.^[7] While the theorem is formulated for zero-mean Gaussians, computations for non-centered cases involve shifting the variables to their centered counterparts and expanding the product, where the odd-moment vanishing applies to the centered components, though the overall expectation generally does not simplify to zero.^[13]

Statement for Even Moments

Isserlis's theorem provides a explicit expression for the even-order moments of jointly Gaussian random variables. Specifically, for a set of $2mjointly [normal](/page/Normal) random variablesX_1, \dots, X_{2m}(possibly with non-zero means), the [expectation](/page/Expectation) of their product is given by the sum over all possible ways to pair the indices intom$ disjoint pairs, with each pair contributing the corresponding second moment. The precise formula is

\mathbb{E}\left[ \prod_{i=1}^{2m} X_i \right] = \sum_{\pi} \prod_{\{j,k\} \in \pi} \mathbb{E}[X_j X_k],

where the sum is over all perfect matchings \pi of the set \{1, \dots, 2m\}, and each matching \pi consists of m unordered pairs \{\{j,k\}\}.^[15] This summation involves exactly (2m-1)!! = \frac{(2m)!}{2^m m!} terms, where !! denotes the double factorial (the product of all odd positive integers up to $2m-1). Each term in the sum corresponds to one such pairing, and the second moments \mathbb{E}[X_j X_k] incorporate both the covariances and the means via \mathbb{E}[X_j X_k] = \mathrm{Cov}(X_j, X_k) + \mathbb{E}[X_j] \mathbb{E}[X_k].^[15] In the special case where the Gaussian variables are centered (i.e., \mathbb{E}[X_i] = 0 for all i), the formula simplifies by replacing each second moment with the covariance: \mathbb{E}[X_j X_k] = \mathrm{Cov}(X_j, X_k). This reduction highlights the theorem's reliance on pairwise dependencies for higher even moments.^[15]

Examples and Illustrations

Simple Bivariate Example

To illustrate Isserlis's theorem, consider the simple case of two jointly Gaussian random variables X and Y with zero means, unit variances, and covariance \rho, so that (X, Y)^\top \sim \mathcal{N}(\mathbf{0}, \Sigma) where \Sigma = \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}. The second-order moment E[XY] equals the covariance \rho, which arises from the single possible pairing of the variables under the theorem.^[7] For an odd-order moment, such as the third-order E[X^2 Y], Isserlis's theorem yields zero because the total number of factors is odd for centered Gaussians. A higher even-order example is the fourth-order moment E[X^2 Y^2], which the theorem computes by summing over all pairings: one pairing matches the two X's together and the two Y's together, giving E[X^2] E[Y^2] = 1 \cdot 1 = 1; the other two pairings each match an X with a Y (accounting for the two ways to do so), giving $2 (E[XY])^2 = 2 \rho^2. Thus, E[X^2 Y^2] = 1 + 2 \rho^2.^[16] This bivariate case exemplifies the general even-moment formula, which sums the products of covariances over all perfect matchings of the variables.

Higher-Order Application

To illustrate the application of Isserlis's theorem to higher-order moments, consider four centered jointly Gaussian random variables X_1, X_2, X_3, X_4 with mean zero and covariance matrix \Sigma, where the (i,j)-th entry denotes \Sigma_{ij} = \mathbb{E}[X_i X_j].^[15] The fourth-order moment is given by

\mathbb{E}[X_1 X_2 X_3 X_4] = \mathbb{E}[X_1 X_2] \mathbb{E}[X_3 X_4] + \mathbb{E}[X_1 X_3] \mathbb{E}[X_2 X_4] + \mathbb{E}[X_1 X_4] \mathbb{E}[X_2 X_3],

corresponding to the three possible pairings of the variables.^[15] For a numerical example, suppose the variables are independent standard normals, so \Sigma = I_4 and all off-diagonal covariances are zero; then each product term vanishes, yielding \mathbb{E}[X_1 X_2 X_3 X_4] = 0. Alternatively, if X_1 = X_2 = X_3 = X_4 = X \sim \mathcal{N}(0,1), then \mathbb{E}[X^4] = 3, which matches the theorem's expansion since each of the three pairings gives \mathbb{E}[X^2] \mathbb{E}[X^2] = 1 \cdot 1 = 1.^[15] This combinatorial structure reveals that the number of terms in the expansion for a $2k-th order moment is the double factorial (2k-1)!! = \frac{(2k)!}{2^k k!}, which grows factorially with k; thus, the theorem provides an efficient means to compute such moments via covariances rather than multidimensional integration.

Proofs

Derivation Using Generating Functions

The derivation of Isserlis's theorem using generating functions proceeds via the moment-generating function (MGF) of a multivariate Gaussian random vector. For a random vector \mathbf{X} \sim \mathcal{N}_d(\boldsymbol{\mu}, \Sigma) in \mathbb{R}^d, the MGF is

M(\mathbf{t}) = \mathbb{E}\left[ \exp(\mathbf{t}^\top \mathbf{X}) \right] = \exp\left( \boldsymbol{\mu}^\top \mathbf{t} + \frac{1}{2} \mathbf{t}^\top \Sigma \mathbf{t} \right),

where \mathbf{t} \in \mathbb{R}^d.^[6]^[17] The raw moments of \mathbf{X} are obtained by evaluating mixed partial derivatives of the MGF at the origin. Specifically, for indices i_1, \dots, i_n \in \{1, \dots, d\},

\mathbb{E}[X_{i_1} \cdots X_{i_n}] = \left. \frac{\partial^n M(\mathbf{t})}{\partial t_{i_1} \cdots \partial t_{i_n}} \right|_{\mathbf{t} = \mathbf{0}}.

To derive the structure of these moments, consider first the centered case \boldsymbol{\mu} = \mathbf{0}, so M(\mathbf{t}) = \exp\left( \frac{1}{2} \sum_{p,q=1}^d \Sigma_{pq} t_p t_q \right). This MGF is an even function, satisfying M(-\mathbf{t}) = M(\mathbf{t}), which implies that all odd-order partial derivatives vanish at \mathbf{t} = \mathbf{0}. Consequently, odd-order moments are zero: \mathbb{E}[X_{i_1} \cdots X_{i_{2m+1}}] = 0 for any m \geq 0.^[6]^[17] For even-order moments with n = 2m, the evaluation requires expanding the exponential in its Taylor series around \mathbf{t} = \mathbf{0}. The argument of the exponential is a quadratic form \frac{1}{2} \mathbf{t}^\top \Sigma \mathbf{t} = \frac{1}{2} \sum_{p,q} \Sigma_{pq} t_p t_q, and raising this to powers in the series \exp(u) = \sum_{k=0}^\infty \frac{u^k}{k!} (with u = \frac{1}{2} \mathbf{t}^\top \Sigma \mathbf{t}) generates multilinear terms in the t's. Applying the $2m-th partial derivative extracts only those terms in the expansion that are homogeneous of degree $2m in the t's, scaled by factorial coefficients. These contributing terms correspond precisely to complete contractions or pairings of the indices, as each differentiation "contracts" a pair of t's via the quadratic factors \Sigma_{pq} t_p t_q. The result is a sum over all perfect matchings \pi of the $2m indices, where each matching \pi = \{ (j_k, l_k) \}_{k=1}^m (with \{j_1, l_1, \dots, j_m, l_m\} = \{i_1, \dots, i_{2m}\}) yields the product \prod_{k=1}^m \Sigma_{i_{j_k} i_{l_k}}. Thus,

\mathbb{E}[X_{i_1} \cdots X_{i_{2m}}] = \sum_{\pi \in \Pi_{2m}} \prod_{(j,l) \in \pi} \Sigma_{i_j i_l},

where \Pi_{2m} is the set of all (2m-1)!! = \frac{(2m)!}{2^m m!} perfect matchings of $2m elements, and the pairings follow Wick's rule for Gaussian contractions.^[6]^[17] This pairing structure emerges combinatorially from the generating nature of the exponential, where unpaired indices would leave odd-degree terms that do not survive differentiation at the origin. For the non-centered case \boldsymbol{\mu} \neq \mathbf{0}, the linear term \boldsymbol{\mu}^\top \mathbf{t} in the MGF introduces additional contractions involving components of \boldsymbol{\mu}, but the core pairing mechanism over the covariance remains, with odd moments generally nonzero due to the asymmetry.^[6]

Approach via Gaussian Integration by Parts

The approach via Gaussian integration by parts offers a recursive method to establish Isserlis's theorem, relying on a core identity that relates expectations involving Gaussian variables to derivatives under the Gaussian measure. For a centered multivariate Gaussian random vector X \sim \mathcal{N}(0, \Sigma) in \mathbb{R}^n, the raw moment of order k is expressed as

E\left[\prod_{i=1}^k X_{j_i}\right] = \int_{\mathbb{R}^n} \left( \prod_{i=1}^k x_{j_i} \right) \phi(x) \, dx,

where \phi(x) = (2\pi)^{-n/2} (\det \Sigma)^{-1/2} \exp\left( -\frac{1}{2} x^T \Sigma^{-1} x \right) denotes the probability density function. The pivotal identity, known as the multivariate Stein's lemma or Gaussian integration by parts formula, asserts that for a differentiable function f: \mathbb{R}^n \to \mathbb{R} satisfying appropriate growth conditions (e.g., E[|\partial f / \partial x_m (X)|] < \infty for all m),

E\left[ X_l f(X) \right] = \sum_{m=1}^n \Sigma_{l m} \, E\left[ \frac{\partial f}{\partial x_m}(X) \right].

This follows from direct integration by parts on the Gaussian density, exploiting the relation \frac{\partial}{\partial x_m} \log \phi(x) = - \sum_p (\Sigma^{-1})_{m p} x_p, which ensures boundary terms vanish at infinity. To derive the theorem, apply this identity iteratively to the moment integral by isolating one factor, say X_l, and setting f(X) = \prod_{i \neq l} X_{j_i}. Each application reduces the degree of the product by two, replacing X_l with a linear combination of partial derivatives that effectively "pair" it with another variable via the covariance entries. For odd k, repeated recursion yields an expectation of zero, as the process terminates in the first moment E[X_j] = 0 or an odd-powered integral over a symmetric density that integrates to zero. For even k = 2m, the recursion unfolds into a sum over all perfect matchings (pair partitions) of the $2m indices, where each matching \pi contributes \prod_{(r,s) \in \pi} \Sigma_{j_r j_s}, with the overall factor accounting for the number of ways to contract the pairs. This combinatorial structure emerges naturally from the inductive pairings, with the base case E[X_j X_{j'}] = \Sigma_{j j'} confirming the form. This recursive framework extends Isserlis's theorem to non-centered Gaussians \mathcal{N}(\mu, \Sigma) by centering: express X = \mu + Z with Z \sim \mathcal{N}(0, \Sigma), expand the product \prod X_{j_i} via the multinomial theorem into sums of moments of Z (governed by the theorem) and lower-order terms involving \mu, thereby reducing to the centered case.

Applications

In Statistical Analysis

Isserlis's theorem facilitates the computation of higher-order moments of multivariate Gaussian random variables, which are essential for Edgeworth expansions that approximate the distribution of statistics from non-Gaussian data by perturbing around a Gaussian base using cumulants derived from these moments.^[18] In particular, the theorem provides explicit pairings of covariances to express even moments, enabling the construction of series expansions that improve upon central limit theorem approximations for inference in parametric models.^[19] In multivariate statistical analysis, Isserlis's theorem simplifies the evaluation of kurtosis and higher cumulants for Gaussian vectors, which is crucial for assessing assumptions in techniques like principal component analysis (PCA) and factor analysis where normality is often presumed.^[20] For instance, the fourth-order moment structure, given by the sum over all pairings of second moments, yields the multivariate kurtosis coefficient \beta_{2,d} = d(d + 2) for a d-dimensional Gaussian, aiding in goodness-of-fit tests and model diagnostics without exhaustive numerical integration.^[20] In Bayesian computation, the theorem supports moment matching approximations for posteriors in Gaussian process models, where higher moments of latent Gaussian variables are matched to variational distributions for efficient inference in complex hierarchies. This approach leverages the theorem's decomposition of joint moments into products of pairwise covariances, facilitating scalable evaluation of expectations in non-conjugate settings like Gaussian process regression with non-Gaussian likelihoods. A specific application arises in time series analysis, particularly for ARCH and GARCH models with Gaussian innovations, where Isserlis's theorem computes fourth-order moments of the innovation process to derive unconditional moments and ensure stationarity conditions. For example, under Gaussian assumptions, the theorem expresses E[\epsilon_t^4] = 3 \sigma_t^4 and cross-moments, which propagate through the model's recursion to assess volatility persistence and risk measures.

In Physics and Quantum Field Theory

In quantum field theory (QFT), Isserlis's theorem manifests as Wick's theorem, which provides a combinatorial rule for computing vacuum expectation values of products of free field operators. Specifically, for a product of field operators \phi_i, the expectation value \langle \prod_i \phi_i \rangle in the free theory equals the sum over all possible full contractions, where each contraction pairs operators via their two-point propagator \langle \phi_j \phi_k \rangle, with the value being the product of these propagators.^[21] This equivalence holds for normal-ordered operators, where Wick's theorem expresses time-ordered products as sums of normal-ordered terms plus contractions, enabling the evaluation of correlation functions in perturbative expansions.^[22] A key application arises in the construction of Feynman diagrams, where Wick's theorem dictates the pairing of field operators at vertices, facilitating the computation of higher-order corrections such as self-energy insertions or vertex renormalizations in scattering amplitudes.^[21] For instance, in \phi^4 theory, the four-point function decomposes into disconnected and connected diagrams via these contractions, directly linking statistical pairings to diagrammatic rules.^[23] In the path integral formulation, Wick's theorem computes vacuum expectations \langle 0 | T \exp(i \int \mathcal{L}_{\text{int}}) | 0 \rangle by expanding the interaction Lagrangian and contracting fields against the free propagator, which underpins the generating functional for Green's functions.^[22] In quantum mechanics of free fields, Wick's theorem simplifies the evaluation of time-evolution operators in the interaction picture, where the Dyson series involves time-ordered exponentials that reduce to sums of propagator products for Gaussian initial states.^[24] This is particularly useful for harmonic oscillators or free scalar fields, allowing explicit computation of transition amplitudes without full diagonalization.^[23] The physical formulation in Wick's theorem is underpinned by Isserlis's theorem, as free quantum fields obey Gaussian statistics in their vacuum fluctuations, making the higher-moment reductions identical in structure for these cases.^[25]

Generalizations and Extensions

To Non-Gaussian Random Variables

One approach to extending Isserlis's theorem to non-Gaussian random variables involves considering cases where the variables are perturbations of a Gaussian vector by polynomial functions. Specifically, let \mathbf{G} = (G_1, \dots, G_n) be a multivariate Gaussian random vector, and let \mathbf{P}(\mathbf{G}) = (P_1(\mathbf{G}), \dots, P_n(\mathbf{G})) where each P_i is a polynomial in the components of \mathbf{G}. Define X_i = G_i + P_i(\mathbf{G}) for each i. The expectation \mathbb{E}\left[ \prod_{i=1}^m X_{j_i} \right] for some indices j_1, \dots, j_m can then be computed by expanding the product using the multilinearity of expectation:

\mathbb{E}\left[ \prod_{i=1}^m X_{j_i} \right] = \sum_{\mathbf{k} \in \mathcal{K}} \mathbb{E}\left[ \prod_{\ell=1}^{k_0} G_{a_\ell} \cdot \prod_{r=1}^{k_1} P_{b_r}(\mathbf{G}) \right],

where the sum is over multi-indices \mathbf{k} = (k_0, k_1, \dots) corresponding to the number of Gaussian and polynomial terms selected in the expansion, and \mathcal{K} enumerates the possible combinations. Each term in this expansion reduces to an expectation of a product of Gaussians (since each P_b(\mathbf{G}) expands into monomials in \mathbf{G}), which is evaluated using the original Isserlis's theorem via pairings of the Gaussian components. However, the pairings are truncated or incomplete relative to the original variables, as the polynomial contributions introduce additional factors that do not pair directly with the X_i's but instead generate higher-order Gaussian moments. A prominent exact generalization applies to mixed-Gaussian random variables, where the Gaussian vector has random location and/or scale parameters, resulting in non-Gaussian marginal distributions such as the generalized hyperbolic distribution. In 2011, Vignat provided a comprehensive extension of Isserlis's theorem to arbitrary location mixtures of Gaussian vectors. For a random vector \mathbf{X} = \boldsymbol{\mu} + \boldsymbol{\zeta}, where \boldsymbol{\mu} has an arbitrary distribution independent of the zero-mean Gaussian \boldsymbol{\zeta}, and A is a finite set of indices with |A| = 2N + \epsilon (\epsilon \in \{0,1\}), the moment is

\mathbb{E}\left[ \prod_{i \in A} X_i \right] = \sum_{k=0}^{N - \lfloor \epsilon/2 \rfloor} \sum_{\substack{S \subset A \\ |S| = 2k + \epsilon}} \mathbb{E}\left[ \prod_{i \in S} \mu_i \right] \sum_{\pi \in \Pi(A \setminus S)} \prod_{\{p,q\} \in \pi} \mathbb{E}[ \zeta_p \zeta_q ],

where \Pi(B) denotes the set of perfect matchings (pairings) on set B (empty if |B| is odd), and the inner sum applies Isserlis's theorem to the Gaussian part \boldsymbol{\zeta}. This formula reduces to the classical Isserlis's theorem when \boldsymbol{\mu} = 0. Vignat further extended this to scale-location mixtures, such as X_i = \mu_i + \sigma_i^2 \Delta \beta_i + \sigma_i \Delta^{1/2} Z_i for centered Gaussian Z_i and random \sigma_i > 0, yielding similar summed terms over moments of the mixing variables and Gaussian pairings. An earlier special case for Bernoulli location mixtures was given by Michalowicz et al. in the same year.^[26]^[27] These extensions preserve the pairing structure for the Gaussian components but incorporate additional terms from the non-Gaussian perturbations, leading to more complex computations. The full set of complete pairings from the original theorem applies only when the perturbation vanishes (e.g., \mathbf{P} = 0 or no mixing), as non-zero perturbations introduce cross terms that require evaluating moments or cumulants of the mixing distribution; in approximate settings, such as small perturbations, higher-order cumulant corrections may be needed to account for deviations from Gaussianity beyond the exact expansion.^[26]

For Distributions on the Unit Sphere

Isserlis's theorem, which expresses moments of multivariate Gaussian random vectors as sums over pairings of covariances, admits extensions to distributions constrained to the unit sphere in \mathbb{R}^d. For a random vector X uniformly distributed on the unit sphere S^{d-1}, the moments can be computed using expansions in hyperspherical harmonics, which form an orthogonal basis for functions on the sphere. However, when X arises as a projection of a Gaussian vector, specifically X = G / \|G\| where G is an isotropic multivariate Gaussian with identity covariance, an Isserlis-like formula applies directly to the coordinate moments.^[28] The key result for such projected Gaussians mirrors the structure of Isserlis's theorem but incorporates dimension-dependent corrections arising from the spherical constraint. For an even-order moment E\left[\prod_{k=1}^{2m} X_{i_k}\right], where X_{i_k} are coordinates of X, the expectation reduces to a sum over all perfect matchings (pairings) of the indices, with each pairing contributing the product of Kronecker deltas for matched indices, scaled by a prefactor \frac{\Gamma(d/2)}{2^m \Gamma(m + d/2)}. This prefactor, involving Gamma functions, adjusts the Gaussian pairings for the normalization imposed by the sphere. Odd-order moments vanish identically, consistent with the symmetry of the uniform distribution on the sphere, analogous to the even-odd behavior in the Gaussian case.^[28] In high dimensions (d \to \infty), the prefactor approaches the form of the standard Isserlis theorem for uncorrelated unit-variance Gaussians, as the coordinates of X become asymptotically independent with variance $1/d, yielding an approximation to the unconstrained Gaussian moments. This high-dimensional limit facilitates analysis in regimes where spherical constraints are negligible. The exact formula for even moments, with its dimension corrections, has been applied in random matrix theory, particularly for computing expectations involving Wishart matrices normalized to the sphere, such as in integrals over spherical ensembles that recover results like Folland's theorem on polynomial expectations over S^{d-1}.^[28]

References

[1]
https://doi.org/10.1093/biomet/12.1-2.134
[2]
https://arxiv.org/pdf/2503.01588.pdf
[3]
[PDF] A short proof of Isserlis' theorem
Mar 7, 2025 · Isserlis' theorem was first derived by Leon Isserlis in 1918 [7]. This result, also known as Wick's probability theorem, permits to express ...
[4]
An Isserlis' Theorem for Mixed Gaussian Variables: Application to ...
Jun 5, 2009 · This work derives a version of Isserlis' theorem for the specific case of four mixed-Gaussian random variables. The theorem is then used to ...Missing: Isserlis's | Show results with:Isserlis's
[5]
A general Isserlis theorem for mixed-Gaussian random variables
This work generalizes a widely used result derived by L. Isserlis for the expectations of products of jointly Gaussian random variables by extending it to ...Missing: Isserlis's | Show results with:Isserlis's
[6]
[PDF] Calculus on Gauss Space: An Introduction to Gaussian Analysis
Banach's Theorem, and Some Applications . . . . . . . . . . . . 153. §1.1 ... theorem, the proof of Wick's theorem actually relies on the Isserlis theorem.
[7]
[PDF] Supplemental material A Theorems of Isserlis and Arcones
We will often use the following well–known theorem which allows us to compute the moments of products of normal random variables from their second order moments ...
[8]
A general Isserlis theorem for mixed-Gaussian random variables
A general Isserlis theorem for mixed-Gaussian random variables. Author links open overlay panel J.V. Michalowicz a , J.M. Nichols b , F. Bucholtz b , C.C. ...
[9]
[PDF] The Multivariate Gaussian Distribution - Oxford statistics department
The definition (1) makes sense if and only if λ>Σλ ≥ 0, i.e. if Σ is positive semidefinite. Note that we have allowed distributions with variance zero.
[10]
(PDF) Moments and cumulants of the multivariate real and complex ...
This paper considers the problem of higher order moments and cumulants for the multivariate normal distribution.Missing: intractability | Show results with:intractability
[11]
[PDF] New formulas for moments of the multivariate normal distribution ...
ABSTRACT. We prove a formula for the evaluation of expectations containing a scalar function of a Gaussian random vector multiplied by a product of the ...<|separator|>
[12]
[PDF] arXiv:0812.2800v1 [quant-ph] 15 Dec 2008
Dec 15, 2008 · Moreover, all higher-order cumulants of order greater than 2 vanish identically for Gaussian states. Thus any non-vanishing higher-order ...
[13]
on a formula for the product-moment coefficient of any order of ... - jstor
ISSERLIS, D.Sc. 1. In Biometrika, Vol. XI, Part III, I have shown that for a normal frequency distribution in four variables, if px.zt = SSSS {n,11t xyzt}/N.
[14]
[PDF] Supplement - LWW
For example, for the bivariate normal distribution, applying Isserlis' Theorem regarding mixed standardized moments1 yields k31 = 3ρ, k22 =1+2ρ2, and k40 = 3, ...
[15]
[PDF] Note on the moment generating function of the multivariate normal ...
We give two applications. First, we provide a simple proof of Isserlis' theorem and derive a formula for the moments of the multivariate normal distribution.<|control11|><|separator|>
[16]
New Methods for Multivariate Normal Moments - MDPI
... Edgeworth expansions for the distribution of parameter estimates. Isserlis (1918) gave the bivariate normal moments and two special cases of trivariate ...
[17]
MS 460 New Methods for Multivariate Normal Moments - Preprints.org
... Edgeworth expansions for the distribution of parameter estimates. Isserlis (1918) gave the bivariate normal moments and 2 special cases of trivariate ...
[18]
[PDF] Tensor Methods in Statistics
mary measures of multivariate kurtosis and skewness. The additional skewness ... Isserlis (1918) gives the expression for real Gaussian moments as a sum.
[19]
[PDF] Correlation Functions and Diagrams - UMD Physics
This is Wick's theorem derived in terms of path integrals. We can write this in diagrammatic language by writing a dot for each position x1,...,x2n and denoting ...
[20]
[PDF] Path Integrals in Quantum Field Theory C6, HT 2014
In this lecture notes I will show how to apply path integrals to the quantization of field theories. We start the discussion by recalling the most important ...
[21]
[PDF] Feynman Diagrams in Quantum Mechanics - MIT
We now apply Wick's Theorem. Wick's Theorem says that in order to compute that time-ordered product, we need to pair together the x(ti)'s and x(0)'s in all ...
[22]
[PDF] Quantum Field Theory
Jan 12, 2016 · if N is even, and = 0 otherwise (Wick's Theorem for free fields). In particular, knowing the 2-point function of a free field, one knows all its ...
[23]
[PDF] Wick theorem for analytic functions of Gaussian fields - arXiv
Jul 7, 2025 · From the probabilistic viewpoint, Isserlis' theorem provides a closed-form expression for the expectation of any product of jointly Gaussian ...
[24]
[1107.2309] A generalized Isserlis theorem for location mixtures of ...
Jul 12, 2011 · Abstract page for arXiv paper 1107.2309: A generalized Isserlis theorem for location mixtures of Gaussian random vectors.Missing: Wojbor Bryc
[25]
https://arxiv.org/pdf/2507.05131
[26]
[0709.1999] An extension of Wick's theorem - arXiv
Sep 13, 2007 · Abstract page for arXiv paper 0709.1999: An extension of Wick's theorem. ... sphere and then to the whole class of elliptical distributions.