Multivariate t -distribution

The multivariate t-distribution, also known as the multivariate Student's t-distribution, is a continuous probability distribution that extends the univariate Student's t-distribution to random vectors in multiple dimensions, providing a robust alternative to the multivariate normal distribution with heavier tails suitable for modeling data with outliers or uncertainty in variance estimates.^[1] It is defined by a probability density function involving a location vector \mu (the mean), a positive definite scale matrix \Sigma (proportional to the covariance), and a scalar degrees of freedom parameter \nu > 0 that governs the shape and tail behavior, with the density given by

f(\mathbf{x}) = \frac{\Gamma\left(\frac{\nu + k}{2}\right)}{(\nu \pi)^{k/2} \Gamma\left(\frac{\nu}{2}\right) |\Sigma|^{1/2}} \left[1 + \frac{1}{\nu} (\mathbf{x} - \mu)^\top \Sigma^{-1} (\mathbf{x} - \mu)\right]^{-\frac{\nu + k}{2}},

where k is the dimension, \Gamma is the gamma function, and \mathbf{x} is a k-dimensional vector.^[2] The mean exists and equals \mu for \nu > 1, and the covariance matrix is \frac{\nu}{\nu - 2} \Sigma for \nu > 2, while marginal and conditional distributions also follow multivariate or univariate t-forms, preserving the family under linear transformations.^[1] Introduced by E. A. Cornish in 1954 as the distribution arising from ratios of multivariate normal sample deviates to a chi-squared scalar, the multivariate t-distribution builds on R. A. Fisher's 1925 work on the univariate t and has since been formalized through mixture representations, such as a multivariate normal vector scaled by the inverse square root of an independent chi-squared random variable divided by its degrees of freedom.^[3]^[4] Key properties include elliptical symmetry, independence of components under diagonal scale matrices, and the fact that it arises naturally in Bayesian inference as a posterior for normal means with unknown variance, as well as in robust regression where it accommodates heteroscedasticity and outliers better than Gaussian assumptions.^[4] Applications span finance for modeling asset returns with fat tails, environmental science for spatial data analysis, and machine learning for robust clustering and dimensionality reduction, with computational methods evolving to handle high dimensions via Monte Carlo simulations and approximations.^[5]

Fundamentals

Definition

The multivariate t-distribution is a type of elliptical distribution for a p-dimensional random vector, characterized by a location vector \mu \in \mathbb{R}^p, a positive-definite scale matrix \Sigma \in \mathbb{R}^{p \times p}, and degrees of freedom \nu > 0.^[6] The multivariate t-distribution, first derived by Cornish (1954),^[3] is a type of elliptically contoured distribution, a class to which it belongs as shown by Kelker (1970),^[7] generalizing the univariate Student's t-distribution to higher dimensions while preserving elliptical symmetry.^[6] In its standard form, the multivariate t-distribution assumes \mu = \mathbf{0} (the zero vector) and \Sigma = I_p (the p \times p identity matrix).^[6] The support of the distribution spans the entire space \mathbb{R}^p.^[6] The probability density function is

f(\mathbf{x}) = \frac{\Gamma\left(\frac{\nu + p}{2}\right)}{\Gamma\left(\frac{\nu}{2}\right) (\nu \pi)^{p/2} |\Sigma|^{1/2}} \left[1 + \frac{1}{\nu} (\mathbf{x} - \mu)^\top \Sigma^{-1} (\mathbf{x} - \mu)\right]^{-(\nu + p)/2},

for \mathbf{x} \in \mathbb{R}^p.^[6] As \nu \to \infty, the multivariate t-distribution converges to the multivariate normal distribution with mean \mu and covariance \Sigma.^[6]

Parameters and Support

The multivariate t-distribution is parameterized by a p-dimensional location vector \mu \in \mathbb{R}^p, a p × p positive definite scale matrix \Sigma, and a scalar degrees of freedom parameter \nu > 0. The parameter \mu serves as the location vector, representing the center of the distribution, and it equals the mean \mathbb{E}[\mathbf{X}] = \mu provided that \nu > 1.^[8] The scale matrix \Sigma governs the dispersion and shape of the distribution, with the covariance matrix given by \text{Cov}(\mathbf{X}) = \frac{\nu}{\nu-2} \Sigma when \nu > 2.^[8] The degrees of freedom \nu controls the heaviness of the tails, with smaller values yielding heavier tails and greater kurtosis compared to the normal distribution.^[9] For the distribution to be well-defined, \nu > 0 is required, ensuring the existence of the normalizing constant in the density. The mean exists and is finite only for \nu > 1, while the variance exists and is finite only for \nu > 2; for $0 < \nu \leq 1, the mean is undefined, and for $1 < \nu \leq 2, the variance is infinite.^[9] Additionally, \Sigma must be positive definite to guarantee that the distribution is non-degenerate.^[9] The support of the multivariate t-distribution is the entire p-dimensional Euclidean space \mathbb{R}^p, meaning it assigns positive probability density to all possible vectors \mathbf{x} \in \mathbb{R}^p. As \nu increases, the probability mass concentrates more sharply around \mu, reflecting reduced tail heaviness.^[9] Special cases include the univariate t-distribution, which arises when p = 1, reducing to the classical Student's t-distribution with location \mu \in \mathbb{R}, scale \sigma^2 > 0 (where \Sigma = \sigma^2), and degrees of freedom \nu > 0. In the limit as \nu \to \infty, the multivariate t-distribution converges to the multivariate normal distribution \mathcal{N}_p(\mu, \Sigma).^[9]

Derivation

Scale Mixture of Normals

The multivariate t-distribution can be represented as a scale mixture of multivariate normal distributions. Specifically, a p-dimensional random vector X follows a multivariate t-distribution with location parameter μ, p × p positive definite scale matrix Σ, and positive degrees of freedom parameter ν, denoted X ~ t_p(μ, Σ, ν), if there exists a latent positive scalar random variable τ such that X | τ ~ N_p(μ, Σ / τ) and τ ~ Gamma(ν / 2, ν / 2) in the shape-rate parameterization, with τ independent of the conditional normal distribution.^[10] This mixture representation arises naturally in contexts where variability in the scale of the normal distribution is introduced through the mixing variable τ, which has mean 1 and variance 2 / ν. The marginal density of X is obtained by integrating out τ from the joint density:

f(**x**) = \int_0^\infty f_{N}(**x** | **μ**, **Σ** / \tau) \, f_{\Gamma}(\tau | \nu/2, \nu/2) \, d\tau,

where f_{N}(**x** | **μ**, **Σ** / \tau) = (2\pi)^{-p/2} |\Sigma / \tau|^{-1/2} \exp\left( -\frac{1}{2} (**x** - **μ**)' (\Sigma / \tau)^{-1} (**x** - **μ**) \right) is the p-dimensional normal density and f_{\Gamma}(\tau | \nu/2, \nu/2) = \frac{(\nu/2)^{\nu/2}}{\Gamma(\nu/2)} \tau^{\nu/2 - 1} \exp(-\nu \tau / 2) is the gamma density. Substituting and simplifying the integral, using the identity for the gamma function integral \int_0^\infty \tau^{(\nu + p)/2 - 1} \exp(-\tau b / 2) \, d\tau = 2^{(\nu + p)/2} \Gamma((\nu + p)/2) / b^{(\nu + p)/2} with b = \nu + (**x** - **μ**)' **Σ**<sup>-1</sup> (**x** - **μ**), and recognizing the structure via Beta function relations (equivalent to the confluent hypergeometric function evaluation), yields the standard multivariate t density up to normalization.^[10] The resulting normalizing constant in the marginal density is \Gamma((\nu + p)/2) / [\Gamma(\nu/2) \, (\nu \pi)^{p/2} \, |\Sigma|^{1/2}], multiplying the kernel [1 + (**x** - **μ**)' **Σ**<sup>-1</sup> (**x** - **μ**) / \nu]^{-(\nu + p)/2}.^[10] The scale mixing with the gamma-distributed τ, which exhibits positive skewness and variance decreasing to zero as ν → ∞, induces heavier tails in the marginal distribution compared to the normal case (where ν = ∞ effectively). For finite ν > 4, this results in excess kurtosis exceeding that of the normal distribution (which has kurtosis 3), specifically with univariate marginals showing kurtosis 3(ν - 2)/(ν - 4) > 3, reflecting the multivariate structure's elliptical symmetry.^[10]

Normal-Gamma Conjugate Prior Interpretation

In Bayesian statistics, the multivariate t-distribution emerges naturally as the marginal distribution of observations in models where the mean vector follows a normal distribution conditional on an unknown precision parameter, which itself has a gamma prior. Specifically, consider a Bayesian regression setup where the mean \mu is assigned a prior \mu \sim \mathcal{N}(\mu_0, (\kappa_0 \tau)^{-1} \Sigma) and the precision \tau follows a gamma distribution \tau \sim \Gamma(\alpha, \beta), with \Sigma known or fixed. Integrating out \mu and \tau yields a marginal prior for the data that is multivariate t-distributed, providing a conjugate framework for inference in linear models with unknown variance.^[11]^[12] This parameterization links directly to posterior predictive distributions in multivariate linear regression. The degrees of freedom parameter \nu of the resulting t-distribution is given by \nu = 2\alpha, reflecting the shape of the gamma prior on precision and enabling closed-form expressions for predictions under conjugate updates. After observing data, the posterior predictive distribution for a new observation y^* takes the form y^* \sim t(\mu_n, \Sigma_n (1 + 1/\kappa_n)/\nu_n, \nu_n), where \mu_n, \Sigma_n, \kappa_n, and \nu_n are updated posterior quantities incorporating the prior and likelihood information. This structure facilitates robust inference by accounting for uncertainty in both mean and variance.^[13]^[11] The use of the normal-gamma conjugate prior for deriving the multivariate t-distribution was popularized in Bayesian analysis by Press (1982), who emphasized its role in robust multivariate inference under elliptical models with unknown parameters. This approach offers key advantages, including closed-form marginal distributions that exhibit heavy tails, enhancing robustness to outliers compared to normal-based models while maintaining conjugacy for efficient computation.^[12]

Probability Functions

Probability Density Function

The probability density function of the p-dimensional multivariate t-distribution with location parameter \boldsymbol{\mu} \in \mathbb{R}^p, positive definite scale matrix \boldsymbol{\Sigma} \in \mathbb{R}^{p \times p}, and degrees of freedom \nu > 0 is given by

f(\mathbf{x} \mid \boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu) = \frac{\Gamma\left(\frac{\nu + p}{2}\right)}{\Gamma\left(\frac{\nu}{2}\right) (\nu \pi)^{p/2} |\boldsymbol{\Sigma}|^{1/2}} \left[1 + \frac{1}{\nu} (\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})\right]^{-(\nu + p)/2}, \quad \mathbf{x} \in \mathbb{R}^p.

^[1]^[10] This expression integrates to 1 over \mathbb{R}^p, ensuring proper normalization; a brief sketch of the derivation uses the scale mixture representation, where the density is the integral \int_0^\infty f_N(\mathbf{x} \mid \boldsymbol{\mu}, W \boldsymbol{\Sigma}) g(W) \, dW with f_N the multivariate normal density and g the Inverse-Gamma(\nu/2, \nu/2) density of W, evaluating this integral yields the Gamma function ratios in the normalizing constant.^[10] The density is symmetric around the location \boldsymbol{\mu}, with elliptical level contours determined by the quadratic form (\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}), reflecting the orientation and spread encoded in \boldsymbol{\Sigma}.^[1] For large \|\mathbf{x}\|, the density exhibits polynomial tail decay asymptotically proportional to \|\mathbf{x}\|^{-(\nu + p)}, governed by the powered exponent in the density kernel.^[10] In contrast to the multivariate normal distribution, which shares the same quadratic form in its kernel but features an exponential decay \exp\left[-\frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})\right], the multivariate t-density's structure produces heavier, polynomial tails, particularly pronounced for finite \nu.^[1]

Cumulative Distribution Function

The cumulative distribution function (CDF) of the multivariate t-distribution with p dimensions, location parameter \boldsymbol{\mu}, positive-definite scale matrix \boldsymbol{\Sigma}, and degrees of freedom parameter \nu > 0 is given by the p-fold integral

F(\mathbf{x} \mid \boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu) = \int_{-\infty}^{x_1} \cdots \int_{-\infty}^{x_p} f(\mathbf{t} \mid \boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu) \, dt_1 \cdots dt_p,

where f(\cdot \mid \boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu) denotes the probability density function and \mathbf{x} = (x_1, \dots, x_p)^\top.^[14] For p > 1, this integral lacks an elementary closed-form expression, requiring numerical methods for evaluation.^[14] In the special univariate case (p=1), the CDF admits a closed-form representation in terms of the regularized incomplete beta function I_z(a,b):

F(x \mid \mu, \sigma^2, \nu) = \frac{1}{2} + \frac{1}{2} \operatorname{sign}(t) \left[1 - I_{\frac{\nu}{\nu + t^2}}\left(\frac{\nu}{2}, \frac{1}{2}\right)\right],

where t = (x - \mu)/\sigma and \sigma^2 = \Sigma.^[15] For the bivariate case (p=2), no exact closed form exists, but approximations that leverage related bivariate normal integrals provide accurate numerical computation. The multivariate t-CDF relates to the multivariate normal CDF through the scale mixture representation, where the t-distributed vector can be expressed as a normal vector scaled by the inverse square root of a gamma-distributed mixing variable; however, integrating over this mixing distribution renders the expression intractable in closed form.^[14] Common numerical approaches include Monte Carlo integration for general dimensions, importance sampling that exploits the mixture structure to reduce variance, and quasi-Monte Carlo quadrature methods applied after transformation to radial coordinates, which enhance efficiency for elliptical distributions. These techniques, particularly those using Genz's transformation for integration limits, achieve high accuracy even for moderate p up to 20.^[14] As \nu \to \infty, the multivariate t-distribution converges in distribution to the multivariate normal distribution with mean \boldsymbol{\mu} and covariance \boldsymbol{\Sigma}, implying that the t-CDF asymptotically approximates the corresponding normal CDF.^[1]

Marginal and Conditional Distributions

Marginal Distributions

The marginal distribution of the j-th component X_j of a p-dimensional random vector \mathbf{X} \sim t_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu) follows a univariate Student's t-distribution with location parameter \mu_j, scale parameter \Sigma_{jj}, and \nu degrees of freedom. This univariate t-distribution has mean \mu_j (for \nu > 1) and variance \frac{\nu}{\nu-2} \Sigma_{jj} (for \nu > 2). More generally, the marginal distribution corresponding to any k-dimensional subvector \mathbf{X}_m (where k < p) is a k-dimensional multivariate t-distribution t_k(\boldsymbol{\mu}_m, \boldsymbol{\Sigma}_m, \nu), with \boldsymbol{\mu}_m as the corresponding subvector of \boldsymbol{\mu} and \boldsymbol{\Sigma}_m as the k \times k principal submatrix of \boldsymbol{\Sigma}. The covariance matrix of this marginal is \frac{\nu}{\nu-2} \boldsymbol{\Sigma}_m for \nu > 2, which aligns with the corresponding submatrix of the full covariance \frac{\nu}{\nu-2} \boldsymbol{\Sigma}. This preservation of the t-form under marginalization can be shown via direct integration of the joint probability density function over the complementary components or by conditioning in the scale mixture representation of the multivariate t. Specifically, integrating the joint PDF yields a density that matches the k-dimensional t-form due to the elliptical symmetry and the shared inverse-gamma mixing variable across dimensions. The marginal components are generally dependent unless the off-diagonal elements of \boldsymbol{\Sigma}_m are zero, reflecting the correlation structure inherited from the full distribution.

Conditional Distributions

Consider a random vector \mathbf{X} \sim t_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu) partitioned as \mathbf{X} = \begin{pmatrix} \mathbf{X}_1 \\ \mathbf{X}_2 \end{pmatrix}, where \mathbf{X}_1 is p_1-dimensional and \mathbf{X}_2 is p_2-dimensional with p = p_1 + p_2. The conditional distribution \mathbf{X}_1 \mid \mathbf{X}_2 = \mathbf{x}_2 follows a multivariate t-distribution with updated parameters: degrees of freedom \nu + p_2, location vector \boldsymbol{\mu}_{1 \mid 2} = \boldsymbol{\mu}_1 + \boldsymbol{\Sigma}_{12} \boldsymbol{\Sigma}_{22}^{-1} (\mathbf{x}_2 - \boldsymbol{\mu}_2), and scale matrix \frac{\nu + q}{\nu + p_2} \boldsymbol{\Sigma}_{1 \mid 2}, where \boldsymbol{\Sigma}_{1 \mid 2} = \boldsymbol{\Sigma}_{11} - \boldsymbol{\Sigma}_{12} \boldsymbol{\Sigma}_{22}^{-1} \boldsymbol{\Sigma}_{21} is the Schur complement and q = (\mathbf{x}_2 - \boldsymbol{\mu}_2)^\top \boldsymbol{\Sigma}_{22}^{-1} (\mathbf{x}_2 - \boldsymbol{\mu}_2) measures the Mahalanobis distance of \mathbf{x}_2 from \boldsymbol{\mu}_2.^[16] This form shows that the conditional is a non-standard (shifted and scaled) multivariate t-distribution, retaining the t-family structure but with parameters adjusted by the conditioning value; the term involving q increases the scale when \mathbf{x}_2 deviates from the mean, reflecting greater uncertainty for outlier observations.^[17] The derivation of this conditional distribution can proceed in two principal ways. First, it follows from the ratio of the joint probability density function (PDF) to the marginal PDF of \mathbf{X}_2, which itself is t_{p_2}(\boldsymbol{\mu}_2, \boldsymbol{\Sigma}_{22}, \nu). The quadratic form in the joint PDF (\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}) partitions into conditional and marginal components using the block matrix inverse, where the precision matrix \boldsymbol{\Sigma}^{-1} has off-diagonal blocks that yield the regression coefficients \boldsymbol{\Sigma}_{12} \boldsymbol{\Sigma}_{22}^{-1}; this partitioning relies on the matrix inversion lemma to simplify the conditional quadratic form to q + (\mathbf{x}_1 - \boldsymbol{\mu}_{1 \mid 2})^\top \boldsymbol{\Sigma}_{1 \mid 2}^{-1} (\mathbf{x}_1 - \boldsymbol{\mu}_{1 \mid 2}), resulting in the t-form after normalization.^[18] Alternatively, using the scale mixture representation \mathbf{X} \mid \tau \sim N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}/\tau) with \tau \sim \chi^2_\nu / \nu, conditioning on \mathbf{X}_2 = \mathbf{x}_2 updates the mixing variable's posterior to a scaled chi-squared with \nu + p_2 degrees of freedom scaled by $1/(\nu + q), yielding the same conditional t-distribution upon marginalization over \tau.^[17]^[16] Key properties of this conditional distribution include an increase in degrees of freedom from \nu to \nu + p_2, which lightens the tails relative to the unconditional distribution and reduces the influence of outliers in the conditioning variables.^[16] This feature makes the multivariate t suitable for sequential modeling tasks, such as robust state estimation in Kalman filtering under heavy-tailed noise, where conditionals enable Bayesian updates while preserving conjugacy.^[19] In the limiting case as \nu \to \infty, the multivariate t converges to the multivariate normal, and the conditional distribution approaches the standard Gaussian conditional N_{p_1}(\boldsymbol{\mu}_{1 \mid 2}, \boldsymbol{\Sigma}_{1 \mid 2}), independent of q.^[16]

Elliptical Representation

Angular and Radial Components

The multivariate t-distribution is a member of the broader class of elliptical distributions, which are characterized by their affine invariance and can be decomposed into independent location, radial, and angular components.^[20] A p-dimensional random vector \mathbf{X} from the multivariate t-distribution with location vector \boldsymbol{\mu}, positive definite scale matrix \boldsymbol{\Sigma}, and degrees of freedom parameter \nu > 0 admits the stochastic representation

\mathbf{X} = \boldsymbol{\mu} + \sqrt{R} \, \mathbf{A} \, \mathbf{U},

where \mathbf{A} is a p \times p matrix such that \mathbf{A} \mathbf{A}^\top = \boldsymbol{\Sigma} (for example, the Cholesky factor or a matrix from the spectral decomposition of \boldsymbol{\Sigma}), \mathbf{U} is independent of R, and the components are defined as described below.^[21]^[20] The angular component \mathbf{U} follows a uniform distribution on the unit sphere S^{p-1} = \{ \mathbf{u} \in \mathbb{R}^p : \| \mathbf{u} \| = 1 \} in \mathbb{R}^p, capturing the directional aspect of the distribution; this uniformity, combined with independence from the radial component, ensures the elliptical symmetry.^[20]^[21] The radial component R (representing the squared Mahalanobis distance from the location) is a positive random variable independent of \mathbf{U}, distributed as R = \nu \cdot \frac{\| \mathbf{Z} \|^2 }{ \chi^2_\nu }, where \mathbf{Z} \sim \mathcal{N}_p( \mathbf{0}, \mathbf{I}_p ) is a standard multivariate normal vector (so \| \mathbf{Z} \|^2 \sim \chi^2_p) and \chi^2_\nu denotes an independent chi-squared random variable with \nu degrees of freedom; equivalently, R / p \sim F(p, \nu), the F-distribution with shape parameters p and \nu.^[20] This radial law for R arises from the scale mixture of multivariate normals construction of the t-distribution, in which the generating variate (the reciprocal of a gamma-distributed scalar, equivalent to an inverse chi-squared) induces the specific heavy-tailed behavior; R \sim p \, F(p, \nu), connected to the inverse-gamma family via the properties of chi-squared variates.^[21]^[20] All elliptical distributions, including the multivariate t, share this canonical decomposition into a uniform angular part and a radial part whose distribution uniquely determines the specific member of the family.^[20]

Radial Distribution Properties

In the elliptical representation of the standardized multivariate t-distribution (location \mathbf{0}, scale matrix \mathbf{I}_p) with dimension p and degrees of freedom \nu, let R denote the squared radial component (squared Euclidean norm \| \mathbf{X} \|^2), which follows R \sim p F_{p,\nu}. The radial distance is then \rho = \sqrt{R} \sim \sqrt{p F_{p,\nu}}.^[6] The probability density function of the radial distance \rho is given by

f_\rho(\rho) \propto \rho^{p-1} \left(1 + \frac{\rho^2}{\nu}\right)^{-(\nu + p)/2}, \quad \rho > 0.

This form arises from the spherical symmetry and the scale mixture structure underlying the multivariate t-distribution.^[6] The cumulative distribution function of the radial distance, P(\rho \leq \rho_0), is I_u\left(\frac{p}{2}, \frac{\nu}{2}\right), where u = \frac{\rho_0^2}{\rho_0^2 + \nu} is the regularized incomplete beta function. This follows from the relationship R / p \sim F(p, \nu) and the known CDF of the F-distribution.^[6] The moments of the radial distance \rho^k exist for k < \nu and are given by

E[\rho^k] = \nu^{k/2} \frac{ \Gamma\left( \frac{p + k}{2} \right) \Gamma\left( \frac{\nu - k}{2} \right) }{ \Gamma\left( \frac{p}{2} \right) \Gamma\left( \frac{\nu}{2} \right) }.

These derive from the moments of the F-distribution applied to R \sim p F(p, \nu). The variance of \rho is finite for \nu > 2.^[6] The tails of the radial distribution exhibit polynomial decay, with P(\rho > \rho_0) \sim c \rho_0^{-\nu} as \rho_0 \to \infty for some constant c > 0 depending on p and \nu; this behavior underscores the heavy-tailed nature of the multivariate t-distribution, with tail heaviness decreasing as \nu increases.^[6] In the limit as \nu \to \infty, the distribution of \rho^2 (or R) converges to a \chi^2_p distribution, aligning the multivariate t with the multivariate normal case.^[6]

Transformations

Affine Transformations

The multivariate t-distribution exhibits closure under full-rank affine transformations, a property that preserves its form and degrees of freedom. Specifically, if \mathbf{X} \sim t_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu), where p is the dimension, \boldsymbol{\mu} is the location vector, \boldsymbol{\Sigma} is the positive definite scale matrix, and \nu > 0 is the degrees of freedom, then for any invertible p \times p matrix \mathbf{B} and p-dimensional vector \mathbf{c}, the transformed vector \mathbf{Y} = \mathbf{B} \mathbf{X} + \mathbf{c} follows t_p(\mathbf{B} \boldsymbol{\mu} + \mathbf{c}, \mathbf{B} \boldsymbol{\Sigma} \mathbf{B}^T, \nu).^[6] This transformation maintains the elliptical symmetry inherent to the distribution, as the Mahalanobis distance (\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}) maps to (\mathbf{y} - \mathbf{B} \boldsymbol{\mu} - \mathbf{c})^T (\mathbf{B} \boldsymbol{\Sigma} \mathbf{B}^T)^{-1} (\mathbf{y} - \mathbf{B} \boldsymbol{\mu} - \mathbf{c}), which equals the original quadratic form due to the relation (\mathbf{B} \boldsymbol{\Sigma} \mathbf{B}^T)^{-1} = \mathbf{B}^{-T} \boldsymbol{\Sigma}^{-1} \mathbf{B}^{-1}. The proof follows directly from substitution into the probability density function (PDF) of the multivariate t-distribution. The PDF of \mathbf{X} is given by

f(\mathbf{x}) = \frac{\Gamma\left(\frac{\nu + p}{2}\right)}{(\nu \pi)^{p/2} \Gamma\left(\frac{\nu}{2}\right) |\boldsymbol{\Sigma}|^{1/2}} \left[ 1 + \frac{1}{\nu} (\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right]^{-(\nu + p)/2}.

Under the transformation \mathbf{y} = \mathbf{B} \mathbf{x} + \mathbf{c}, the Jacobian determinant is |\det \mathbf{B}|, and the quadratic form invariance ensures the term in brackets remains structurally identical for \mathbf{y}. The scale matrix determinant transforms as |\mathbf{B} \boldsymbol{\Sigma} \mathbf{B}^T|^{1/2} = |\boldsymbol{\Sigma}|^{1/2} |\det \mathbf{B}|, so the Jacobian cancels with the determinant factor, yielding the PDF for \mathbf{Y} with updated parameters and unchanged \nu.^[19] The full-rank assumption on \mathbf{B} (i.e., invertibility) guarantees that the transformation preserves the p-dimensional support and avoids degeneracy.^[6] This property enables practical applications such as standardization, where \mathbf{B} = \boldsymbol{\Sigma}^{-1/2} and \mathbf{c} = -\boldsymbol{\Sigma}^{-1/2} \boldsymbol{\mu} (using a matrix square root) reduce the distribution to the standard form t_p(\mathbf{0}, \mathbf{I}_p, \nu), facilitating computations like moment calculations or simulations. Additionally, it supports sample generation: one can draw from the standard multivariate t-distribution and apply the inverse affine map to obtain samples from a general t_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu), which is useful in Bayesian inference and robust modeling where heavy tails are modeled via scale mixtures of normals.^[19]

Linear Combinations and Degeneracy

Linear combinations of random vectors following a multivariate t-distribution preserve the t-family under appropriate conditions. Specifically, for a p-dimensional random vector \mathbf{X} \sim t_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu) with location vector \boldsymbol{\mu}, positive semi-definite scale matrix \boldsymbol{\Sigma}, and degrees of freedom \nu > 0, the univariate linear combination a^\top \mathbf{X} for a fixed p-dimensional vector \mathbf{a} follows a univariate t-distribution: a^\top \mathbf{X} \sim t_1(a^\top \boldsymbol{\mu}, a^\top \boldsymbol{\Sigma} a, \nu).^[10] More generally, consider a linear transformation \mathbf{Y} = B \mathbf{X} + \mathbf{c}, where B is a q \times p matrix with q \leq p and \mathbf{c} is a q-dimensional constant vector. If \operatorname{rank}(B) = q, then \mathbf{Y} follows a non-degenerate q-dimensional multivariate t-distribution: \mathbf{Y} \sim t_q(B \boldsymbol{\mu} + \mathbf{c}, B \boldsymbol{\Sigma} B^\top, \nu). However, if \operatorname{rank}(B) < q, the resulting distribution is degenerate, supported on a lower-dimensional subspace determined by the column space of B, with the scale matrix B \boldsymbol{\Sigma} B^\top having rank less than q.^[10] Degeneracy also arises when the scale matrix \boldsymbol{\Sigma} itself is singular, with \operatorname{rank}(\boldsymbol{\Sigma}) = r < p. In this case, the distribution of \mathbf{X} is concentrated on an r-dimensional affine subspace, and the probability density function (PDF) is defined with respect to the Lebesgue measure on that subspace. The PDF takes the form

f(\mathbf{x}) = \frac{\Gamma\left(\frac{\nu + r}{2}\right)}{\Gamma\left(\frac{\nu}{2}\right) (\nu \pi)^{r/2} |\boldsymbol{\Sigma}_{*}|^{1/2} \left[1 + \frac{1}{\nu} (\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^+ (\mathbf{x} - \boldsymbol{\mu})\right]^{(\nu + r)/2}},

where |\boldsymbol{\Sigma}_{*}| denotes the pseudo-determinant of \boldsymbol{\Sigma} (the product of its non-zero eigenvalues), \boldsymbol{\Sigma}^+ denotes the Moore-Penrose pseudoinverse of \boldsymbol{\Sigma}, and the expression holds subject to the constraint that \mathbf{x} lies in the affine subspace \{\mathbf{z} : (\mathbf{z} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^\perp = \mathbf{0}\}, with \boldsymbol{\Sigma}^\perp spanning the null space of \boldsymbol{\Sigma}. This formulation extends the standard non-singular PDF by replacing the inverse with the pseudoinverse in the quadratic form and using the pseudo-determinant for normalization.^[22] Such degenerate multivariate t-distributions are useful for modeling data subject to linear equality constraints, as they naturally incorporate singularity in the covariance structure while maintaining the heavy-tailed properties of the t-family. For instance, in Bayesian analysis of linear models with inequality constraints derived from equality restrictions, the posterior distribution of certain parameter subspaces follows a degenerate multivariate t-distribution. Marginal distributions obtained by projecting from a full-rank multivariate t onto a singular subspace thus yield these degenerate forms.^[23]^[22]

Copula Representation

The multivariate t-copula arises from the multivariate t-distribution through Sklar's theorem, which decomposes any joint cumulative distribution function (CDF) into its marginal CDFs and a copula capturing the dependence structure. For the standardized multivariate t-distribution with p dimensions, zero mean, correlation matrix R, and degrees of freedom parameter ν > 0, the marginal distributions are univariate t-distributions with ν degrees of freedom. The corresponding t-copula C is thus given by

C(\mathbf{u}; R, \nu) = T_p \left( T_1^{-1}(u_1), \dots, T_1^{-1}(u_p); R, \nu \right),

where T_p(·; R, ν) denotes the CDF of the p-dimensional standardized multivariate t-distribution, T₁⁻¹(·; ν) is the inverse CDF (quantile function) of the univariate standard t-distribution with ν degrees of freedom, and u = (u₁, …, u_p) with each u_i ∈ (0,1). This representation allows the t-copula to model dependence while permitting arbitrary univariate marginal distributions for the joint variables, facilitating flexible multivariate modeling beyond elliptical contours. The probability density function of the t-copula, derived as the ratio of the multivariate t-density to the product of its univariate marginal densities evaluated at the transformed points t_i = T₁⁻¹(u_i; ν), is

\begin{aligned} c(\mathbf{u}; R, \nu) &= |R|^{-1/2} \frac{ \Gamma(\nu/2)^p \, \Gamma((\nu + p)/2) }{ \Gamma((\nu + 1)/2)^p \, \Gamma(\nu/2) } \\ &\quad \times \prod_{i=1}^p \left[1 + t_i^2 / \nu \right]^{(\nu + 1)/2} \left[1 + (\mathbf{t}^T R^{-1} \mathbf{t}) / \nu \right]^{-(\nu + p)/2}, \end{aligned}

where Γ denotes the gamma function and t = (t₁, …, t_p).^[24] This density highlights the t-copula's ability to generate heavier tails in the dependence structure compared to the Gaussian copula, as the quadratic form in the denominator amplifies clustering in extremes when ν is small. A key property of the t-copula is its symmetric tail dependence, which quantifies the likelihood of joint extreme events. In the bivariate case with correlation parameter ρ ∈ (−1,1), the upper and lower tail dependence coefficients are equal due to symmetry:

\lambda = 2 \, T_{\nu + 1} \left( -\sqrt{ \frac{ (\nu + 1)(1 - \rho) }{ 1 + \rho } } \right),

where T_ν+1 is the CDF of the univariate t-distribution with ν + 1 degrees of freedom. This coefficient λ > 0 for finite ν, increasing as ν decreases or |ρ| increases, contrasting with the Gaussian copula's zero tail dependence and enabling better capture of co-movements in tails. For multivariate extensions, pairwise tail dependences follow similar forms, though higher-dimensional joint tails require additional measures like tail dependence functions. The t-copula is widely applied in financial modeling to account for dependence in extreme events, such as joint defaults or market crashes, where linear correlation underestimates tail risks. For instance, it has been used to simulate portfolio credit risk and value-at-risk calculations, leveraging its tail dependence to improve estimates of systemic risk over Gaussian alternatives.

Connections to Other Distributions

The multivariate t-distribution exhibits several important limiting cases that connect it to other well-known distributions. As the degrees of freedom parameter \nu \to \infty, the multivariate t-distribution with location parameter \mu and scale matrix \Sigma converges in distribution to the multivariate normal distribution N_p(\mu, \Sigma). This limit reflects the thinning of the tails as \nu increases, approaching the lighter-tailed Gaussian form. Conversely, when \nu = 1, the distribution specializes to the multivariate Cauchy distribution, characterized by heavy tails and the absence of any finite moments, making it useful for modeling phenomena with extreme outliers. The multivariate t-distribution serves as a special case within broader families of distributions. It is a particular instance of the multivariate Pearson type VII distribution, which generalizes the form by allowing the exponent in the density function to take values beyond those corresponding to half-integer degrees of freedom in the t-case; specifically, the Pearson type VII density is proportional to \left[1 + (x - \mu)^\top \Sigma^{-1} (x - \mu)/m \right]^{-(m + p/2)} for parameter m > 0, reducing to the t-distribution when m = \nu.^[25] Additionally, the multivariate t-distribution relates to the multivariate F-distribution through transformations involving ratios of quadratic forms. In particular, if X \sim t_p(\mu, \Sigma, \nu), then the squared Mahalanobis distance (X - \mu)^\top \Sigma^{-1} (X - \mu) follows the distribution of \nu \cdot F_{p, \nu}, where F_{p, \nu} denotes an F-distributed random variable with p and \nu degrees of freedom. From a mixture perspective, the multivariate t-distribution arises as a normal variance-mean mixture where the mixing distribution is inverse gamma, but replacing the inverse gamma with a more general generalized inverse Gaussian (GIG) mixing distribution yields the broader class of multivariate generalized hyperbolic distributions. These encompass the t-distribution as a special case (when the GIG parameters \lambda = -\nu/2, \chi = 0, and \psi > 0) and allow for additional flexibility in tail behavior and asymmetry, producing elliptical heavy-tailed models beyond the symmetric t-form. The existence of moments distinguishes the multivariate t-distribution from related distributions like the normal and Cauchy. While the multivariate normal has all moments finite, the multivariate Cauchy has none. For the t-distribution, the k-th order moments exist if and only if k < \nu. The table below summarizes these properties for key low-order moments:

Moment Order	Multivariate Normal	Multivariate Cauchy	Multivariate t (\nu df)
k=1 (mean)	Exists	Does not exist	Exists if \nu > 1
k=2 (covariance)	Exists	Does not exist	Exists if \nu > 2
k \geq 3	All exist	None exist	Exists if k < \nu