Fact-checked by Grok 2 weeks ago

Multivariate t -distribution

The multivariate t-distribution, also known as the multivariate Student's t-distribution, is a continuous probability distribution that extends the univariate Student's t-distribution to random vectors in multiple dimensions, providing a robust alternative to the multivariate normal distribution with heavier tails suitable for modeling data with outliers or uncertainty in variance estimates. It is defined by a probability density function involving a location vector \mu (the mean), a positive definite scale matrix \Sigma (proportional to the covariance), and a scalar degrees of freedom parameter \nu > 0 that governs the shape and tail behavior, with the density given by f(\mathbf{x}) = \frac{\Gamma\left(\frac{\nu + k}{2}\right)}{(\nu \pi)^{k/2} \Gamma\left(\frac{\nu}{2}\right) |\Sigma|^{1/2}} \left[1 + \frac{1}{\nu} (\mathbf{x} - \mu)^\top \Sigma^{-1} (\mathbf{x} - \mu)\right]^{-\frac{\nu + k}{2}}, where k is the dimension, \Gamma is the gamma function, and \mathbf{x} is a k-dimensional vector. The mean exists and equals \mu for \nu > 1, and the covariance matrix is \frac{\nu}{\nu - 2} \Sigma for \nu > 2, while marginal and conditional distributions also follow multivariate or univariate t-forms, preserving the family under linear transformations. Introduced by E. A. Cornish in 1954 as the distribution arising from ratios of multivariate normal sample deviates to a chi-squared scalar, the multivariate t-distribution builds on R. A. Fisher's 1925 work on the univariate t and has since been formalized through mixture representations, such as a multivariate normal vector scaled by the inverse square root of an independent chi-squared random variable divided by its degrees of freedom. Key properties include elliptical symmetry, independence of components under diagonal scale matrices, and the fact that it arises naturally in Bayesian inference as a posterior for normal means with unknown variance, as well as in robust regression where it accommodates heteroscedasticity and outliers better than Gaussian assumptions. Applications span finance for modeling asset returns with fat tails, environmental science for spatial data analysis, and machine learning for robust clustering and dimensionality reduction, with computational methods evolving to handle high dimensions via Monte Carlo simulations and approximations.

Fundamentals

Definition

The multivariate t-distribution is a type of for a p-dimensional random vector, characterized by a location vector \mu \in \mathbb{R}^p, a positive-definite scale matrix \Sigma \in \mathbb{R}^{p \times p}, and \nu > 0. The multivariate t-distribution, first derived by (1954), is a type of elliptically contoured , a class to which it belongs as shown by Kelker (1970), generalizing the univariate to higher dimensions while preserving elliptical symmetry. In its standard form, the multivariate t-distribution assumes \mu = \mathbf{0} (the zero vector) and \Sigma = I_p (the p \times p ). The support of the distribution spans the entire space \mathbb{R}^p. The is f(\mathbf{x}) = \frac{\Gamma\left(\frac{\nu + p}{2}\right)}{\Gamma\left(\frac{\nu}{2}\right) (\nu \pi)^{p/2} |\Sigma|^{1/2}} \left[1 + \frac{1}{\nu} (\mathbf{x} - \mu)^\top \Sigma^{-1} (\mathbf{x} - \mu)\right]^{-(\nu + p)/2}, for \mathbf{x} \in \mathbb{R}^p. As \nu \to \infty, the multivariate t-distribution converges to the with mean \mu and covariance \Sigma.

Parameters and Support

The multivariate t-distribution is parameterized by a p-dimensional location vector \mu \in \mathbb{R}^p, a p × p positive definite scale matrix \Sigma, and a scalar degrees of freedom parameter \nu > 0. The parameter \mu serves as the location vector, representing the center of the distribution, and it equals the mean \mathbb{E}[\mathbf{X}] = \mu provided that \nu > 1. The scale matrix \Sigma governs the dispersion and shape of the distribution, with the covariance matrix given by \text{Cov}(\mathbf{X}) = \frac{\nu}{\nu-2} \Sigma when \nu > 2. The degrees of freedom \nu controls the heaviness of the tails, with smaller values yielding heavier tails and greater kurtosis compared to the normal distribution. For the distribution to be well-defined, \nu > 0 is required, ensuring the existence of the in the . The exists and is finite only for \nu > 1, while the variance exists and is finite only for \nu > 2; for $0 < \nu \leq 1, the is undefined, and for $1 < \nu \leq 2, the variance is infinite. Additionally, \Sigma must be positive definite to guarantee that the distribution is non-degenerate. The support of the multivariate t-distribution is the entire p-dimensional Euclidean space \mathbb{R}^p, meaning it assigns positive probability density to all possible vectors \mathbf{x} \in \mathbb{R}^p. As \nu increases, the probability mass concentrates more sharply around \mu, reflecting reduced tail heaviness. Special cases include the univariate t-distribution, which arises when p = 1, reducing to the classical Student's t-distribution with location \mu \in \mathbb{R}, scale \sigma^2 > 0 (where \Sigma = \sigma^2), and \nu > 0. In the limit as \nu \to \infty, the multivariate t-distribution converges to the \mathcal{N}_p(\mu, \Sigma).

Derivation

Scale Mixture of Normals

The multivariate t-distribution can be represented as a scale mixture of multivariate normal distributions. Specifically, a p-dimensional random vector X follows a multivariate t-distribution with location parameter μ, p × p positive definite scale matrix Σ, and positive degrees of freedom parameter ν, denoted X ~ tp(μ, Σ, ν), if there exists a latent positive scalar random variable τ such that X | τ ~ Np(μ, Σ / τ) and τ ~ Gamma(ν / 2, ν / 2) in the shape-rate parameterization, with τ independent of the conditional normal distribution. This mixture representation arises naturally in contexts where variability in the scale of the normal distribution is introduced through the mixing variable τ, which has mean 1 and variance 2 / ν. The marginal density of X is obtained by integrating out τ from the joint density: f(**x**) = \int_0^\infty f_{N}(**x** | **μ**, **Σ** / \tau) \, f_{\Gamma}(\tau | \nu/2, \nu/2) \, d\tau, where f_{N}(**x** | **μ**, **Σ** / \tau) = (2\pi)^{-p/2} |\Sigma / \tau|^{-1/2} \exp\left( -\frac{1}{2} (**x** - **μ**)' (\Sigma / \tau)^{-1} (**x** - **μ**) \right) is the p-dimensional normal density and f_{\Gamma}(\tau | \nu/2, \nu/2) = \frac{(\nu/2)^{\nu/2}}{\Gamma(\nu/2)} \tau^{\nu/2 - 1} \exp(-\nu \tau / 2) is the gamma density. Substituting and simplifying the integral, using the identity for the gamma function integral \int_0^\infty \tau^{(\nu + p)/2 - 1} \exp(-\tau b / 2) \, d\tau = 2^{(\nu + p)/2} \Gamma((\nu + p)/2) / b^{(\nu + p)/2} with b = \nu + (**x** - **μ**)' **Σ**<sup>-1</sup> (**x** - **μ**), and recognizing the structure via Beta function relations (equivalent to the confluent hypergeometric function evaluation), yields the standard multivariate t density up to normalization. The resulting normalizing constant in the marginal density is \Gamma((\nu + p)/2) / [\Gamma(\nu/2) \, (\nu \pi)^{p/2} \, |\Sigma|^{1/2}], multiplying the kernel [1 + (**x** - **μ**)' **Σ**<sup>-1</sup> (**x** - **μ**) / \nu]^{-(\nu + p)/2}. The scale mixing with the gamma-distributed τ, which exhibits positive skewness and variance decreasing to zero as ν → ∞, induces heavier tails in the marginal distribution compared to the normal case (where ν = ∞ effectively). For finite ν > 4, this results in excess kurtosis exceeding that of the normal distribution (which has kurtosis 3), specifically with univariate marginals showing kurtosis 3(ν - 2)/(ν - 4) > 3, reflecting the multivariate structure's elliptical symmetry.

Normal-Gamma Conjugate Prior Interpretation

In Bayesian statistics, the multivariate t-distribution emerges naturally as the marginal distribution of observations in models where the mean vector follows a normal distribution conditional on an unknown precision parameter, which itself has a gamma prior. Specifically, consider a Bayesian regression setup where the mean \mu is assigned a prior \mu \sim \mathcal{N}(\mu_0, (\kappa_0 \tau)^{-1} \Sigma) and the precision \tau follows a gamma distribution \tau \sim \Gamma(\alpha, \beta), with \Sigma known or fixed. Integrating out \mu and \tau yields a marginal prior for the data that is multivariate t-distributed, providing a conjugate framework for inference in linear models with unknown variance. This parameterization links directly to posterior predictive distributions in multivariate linear regression. The degrees of freedom parameter \nu of the resulting t-distribution is given by \nu = 2\alpha, reflecting the shape of the gamma prior on precision and enabling closed-form expressions for predictions under conjugate updates. After observing data, the posterior predictive distribution for a new observation y^* takes the form y^* \sim t(\mu_n, \Sigma_n (1 + 1/\kappa_n)/\nu_n, \nu_n), where \mu_n, \Sigma_n, \kappa_n, and \nu_n are updated posterior quantities incorporating the prior and likelihood information. This structure facilitates robust inference by accounting for uncertainty in both mean and variance. The use of the normal-gamma conjugate prior for deriving the multivariate t-distribution was popularized in Bayesian analysis by Press (1982), who emphasized its role in robust multivariate under elliptical models with unknown parameters. This approach offers key advantages, including closed-form marginal distributions that exhibit heavy tails, enhancing robustness to outliers compared to normal-based models while maintaining for efficient computation.

Probability Functions

Probability Density Function

The probability density function of the p-dimensional multivariate t-distribution with location parameter \boldsymbol{\mu} \in \mathbb{R}^p, positive definite scale matrix \boldsymbol{\Sigma} \in \mathbb{R}^{p \times p}, and degrees of freedom \nu > 0 is given by f(\mathbf{x} \mid \boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu) = \frac{\Gamma\left(\frac{\nu + p}{2}\right)}{\Gamma\left(\frac{\nu}{2}\right) (\nu \pi)^{p/2} |\boldsymbol{\Sigma}|^{1/2}} \left[1 + \frac{1}{\nu} (\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})\right]^{-(\nu + p)/2}, \quad \mathbf{x} \in \mathbb{R}^p. This expression integrates to 1 over \mathbb{R}^p, ensuring proper ; a brief sketch of the derivation uses the scale mixture representation, where the density is the integral \int_0^\infty f_N(\mathbf{x} \mid \boldsymbol{\mu}, W \boldsymbol{\Sigma}) g(W) \, dW with f_N the multivariate normal density and g the Inverse-Gamma(\nu/2, \nu/2) density of W, evaluating this integral yields the Gamma function ratios in the normalizing constant. The is symmetric around the \boldsymbol{\mu}, with elliptical level determined by the (\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}), reflecting the orientation and spread encoded in \boldsymbol{\Sigma}. For large \|\mathbf{x}\|, the exhibits tail decay asymptotically proportional to \|\mathbf{x}\|^{-(\nu + p)}, governed by the powered exponent in the kernel. In contrast to the , which shares the same in its but features an \exp\left[-\frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})\right], the multivariate t-density's structure produces heavier, tails, particularly pronounced for finite \nu.

Cumulative

The cumulative distribution function (CDF) of the multivariate t-distribution with p dimensions, location parameter \boldsymbol{\mu}, positive-definite scale matrix \boldsymbol{\Sigma}, and degrees of freedom parameter \nu > 0 is given by the p-fold integral F(\mathbf{x} \mid \boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu) = \int_{-\infty}^{x_1} \cdots \int_{-\infty}^{x_p} f(\mathbf{t} \mid \boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu) \, dt_1 \cdots dt_p, where f(\cdot \mid \boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu) denotes the probability density function and \mathbf{x} = (x_1, \dots, x_p)^\top. For p > 1, this integral lacks an elementary closed-form expression, requiring numerical methods for evaluation. In the special univariate case (p=1), the CDF admits a closed-form representation in terms of the regularized incomplete I_z(a,b): F(x \mid \mu, \sigma^2, \nu) = \frac{1}{2} + \frac{1}{2} \operatorname{sign}(t) \left[1 - I_{\frac{\nu}{\nu + t^2}}\left(\frac{\nu}{2}, \frac{1}{2}\right)\right], where t = (x - \mu)/\sigma and \sigma^2 = \Sigma. For the bivariate case (p=2), no exact closed form exists, but approximations that leverage related bivariate normal integrals provide accurate numerical computation. The multivariate t-CDF relates to the multivariate CDF through the scale representation, where the t-distributed can be expressed as a scaled by the of a gamma-distributed mixing ; however, integrating over this mixing renders the expression intractable in closed form. Common numerical approaches include for general dimensions, that exploits the structure to reduce variance, and quasi-Monte Carlo quadrature methods applied after to radial coordinates, which enhance efficiency for elliptical . These techniques, particularly those using Genz's for integration limits, achieve high accuracy even for moderate p up to 20. As \nu \to \infty, the multivariate t-distribution converges in distribution to the multivariate normal distribution with mean \boldsymbol{\mu} and covariance \boldsymbol{\Sigma}, implying that the t-CDF asymptotically approximates the corresponding normal CDF.

Marginal and Conditional Distributions

Marginal Distributions

The marginal distribution of the j-th component X_j of a p-dimensional random vector \mathbf{X} \sim t_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu) follows a univariate Student's t-distribution with location parameter \mu_j, scale parameter \Sigma_{jj}, and \nu degrees of freedom. This univariate t-distribution has mean \mu_j (for \nu > 1) and variance \frac{\nu}{\nu-2} \Sigma_{jj} (for \nu > 2). More generally, the corresponding to any k-dimensional subvector \mathbf{X}_m (where k < p) is a k-dimensional multivariate t-distribution t_k(\boldsymbol{\mu}_m, \boldsymbol{\Sigma}_m, \nu), with \boldsymbol{\mu}_m as the corresponding subvector of \boldsymbol{\mu} and \boldsymbol{\Sigma}_m as the k \times k principal submatrix of \boldsymbol{\Sigma}. The matrix of this marginal is \frac{\nu}{\nu-2} \boldsymbol{\Sigma}_m for \nu > 2, which aligns with the corresponding submatrix of the full \frac{\nu}{\nu-2} \boldsymbol{\Sigma}. This preservation of the t-form under marginalization can be shown via direct integration of the joint probability density function over the complementary components or by conditioning in the scale mixture representation of the multivariate t. Specifically, integrating the PDF yields a density that matches the k-dimensional t-form due to the elliptical symmetry and the shared inverse-gamma mixing variable across dimensions. The marginal components are generally dependent unless the off-diagonal elements of \boldsymbol{\Sigma}_m are zero, reflecting the structure inherited from the full .

Conditional Distributions

Consider a random vector \mathbf{X} \sim t_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu) partitioned as \mathbf{X} = \begin{pmatrix} \mathbf{X}_1 \\ \mathbf{X}_2 \end{pmatrix}, where \mathbf{X}_1 is p_1-dimensional and \mathbf{X}_2 is p_2-dimensional with p = p_1 + p_2. The conditional distribution \mathbf{X}_1 \mid \mathbf{X}_2 = \mathbf{x}_2 follows a multivariate t-distribution with updated parameters: degrees of freedom \nu + p_2, location vector \boldsymbol{\mu}_{1 \mid 2} = \boldsymbol{\mu}_1 + \boldsymbol{\Sigma}_{12} \boldsymbol{\Sigma}_{22}^{-1} (\mathbf{x}_2 - \boldsymbol{\mu}_2), and scale matrix \frac{\nu + q}{\nu + p_2} \boldsymbol{\Sigma}_{1 \mid 2}, where \boldsymbol{\Sigma}_{1 \mid 2} = \boldsymbol{\Sigma}_{11} - \boldsymbol{\Sigma}_{12} \boldsymbol{\Sigma}_{22}^{-1} \boldsymbol{\Sigma}_{21} is the Schur complement and q = (\mathbf{x}_2 - \boldsymbol{\mu}_2)^\top \boldsymbol{\Sigma}_{22}^{-1} (\mathbf{x}_2 - \boldsymbol{\mu}_2) measures the Mahalanobis distance of \mathbf{x}_2 from \boldsymbol{\mu}_2. This form shows that the conditional is a non-standard (shifted and scaled) multivariate t-distribution, retaining the t-family structure but with parameters adjusted by the conditioning value; the term involving q increases the scale when \mathbf{x}_2 deviates from the mean, reflecting greater uncertainty for outlier observations. The derivation of this conditional distribution can proceed in two principal ways. First, it follows from the ratio of the to the marginal PDF of \mathbf{X}_2, which itself is t_{p_2}(\boldsymbol{\mu}_2, \boldsymbol{\Sigma}_{22}, \nu). The in the joint PDF (\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}) partitions into conditional and marginal components using the inverse, where the precision matrix \boldsymbol{\Sigma}^{-1} has off-diagonal blocks that yield the regression coefficients \boldsymbol{\Sigma}_{12} \boldsymbol{\Sigma}_{22}^{-1}; this partitioning relies on the matrix inversion lemma to simplify the conditional to q + (\mathbf{x}_1 - \boldsymbol{\mu}_{1 \mid 2})^\top \boldsymbol{\Sigma}_{1 \mid 2}^{-1} (\mathbf{x}_1 - \boldsymbol{\mu}_{1 \mid 2}), resulting in the t-form after . Alternatively, using the scale mixture representation \mathbf{X} \mid \tau \sim N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}/\tau) with \tau \sim \chi^2_\nu / \nu, conditioning on \mathbf{X}_2 = \mathbf{x}_2 updates the mixing variable's posterior to a scaled chi-squared with \nu + p_2 scaled by $1/(\nu + q), yielding the same conditional t-distribution upon marginalization over \tau. Key properties of this conditional distribution include an increase in degrees of freedom from \nu to \nu + p_2, which lightens the tails relative to the unconditional and reduces the influence of outliers in the conditioning variables. This feature makes the multivariate t suitable for sequential modeling tasks, such as robust state estimation in Kalman filtering under heavy-tailed noise, where conditionals enable Bayesian updates while preserving conjugacy. In the limiting case as \nu \to \infty, the multivariate t converges to the multivariate normal, and the conditional approaches the standard Gaussian conditional N_{p_1}(\boldsymbol{\mu}_{1 \mid 2}, \boldsymbol{\Sigma}_{1 \mid 2}), of q.

Elliptical Representation

Angular and Radial Components

The multivariate t-distribution is a member of the broader class of elliptical distributions, which are characterized by their affine invariance and can be decomposed into independent location, radial, and angular components. A p-dimensional random vector \mathbf{X} from the multivariate t-distribution with location vector \boldsymbol{\mu}, positive definite scale matrix \boldsymbol{\Sigma}, and parameter \nu > 0 admits the stochastic representation \mathbf{X} = \boldsymbol{\mu} + \sqrt{R} \, \mathbf{A} \, \mathbf{U}, where \mathbf{A} is a p \times p such that \mathbf{A} \mathbf{A}^\top = \boldsymbol{\Sigma} (for example, the Cholesky factor or a matrix from the spectral decomposition of \boldsymbol{\Sigma}), \mathbf{U} is independent of R, and the components are defined as described below. The angular component \mathbf{U} follows a on the unit sphere S^{p-1} = \{ \mathbf{u} \in \mathbb{R}^p : \| \mathbf{u} \| = 1 \} in \mathbb{R}^p, capturing the directional aspect of the distribution; this uniformity, combined with from the radial component, ensures the elliptical . The radial component R (representing the squared from the location) is a positive independent of \mathbf{U}, distributed as R = \nu \cdot \frac{\| \mathbf{Z} \|^2 }{ \chi^2_\nu }, where \mathbf{Z} \sim \mathcal{N}_p( \mathbf{0}, \mathbf{I}_p ) is a standard multivariate normal vector (so \| \mathbf{Z} \|^2 \sim \chi^2_p) and \chi^2_\nu denotes an independent chi-squared with \nu ; equivalently, R / p \sim F(p, \nu), the F-distribution with shape parameters p and \nu. This radial law for R arises from the scale mixture of multivariate normals of the t-distribution, in which the generating variate (the reciprocal of a gamma-distributed scalar, equivalent to an inverse chi-squared) induces the specific heavy-tailed behavior; R \sim p \, F(p, \nu), connected to the via the properties of chi-squared variates. All elliptical distributions, including the multivariate t, share this canonical decomposition into a uniform angular part and a radial part whose distribution uniquely determines the specific member of the .

Radial Distribution Properties

In the elliptical representation of the standardized multivariate t-distribution (location \mathbf{0}, scale \mathbf{I}_p) with p and \nu, let R denote the squared radial component (squared norm \| \mathbf{X} \|^2), which follows R \sim p F_{p,\nu}. The radial distance is then \rho = \sqrt{R} \sim \sqrt{p F_{p,\nu}}. The probability density function of the radial distance \rho is given by f_\rho(\rho) \propto \rho^{p-1} \left(1 + \frac{\rho^2}{\nu}\right)^{-(\nu + p)/2}, \quad \rho > 0. This form arises from the spherical symmetry and the scale mixture structure underlying the multivariate t-distribution. The cumulative distribution function of the radial distance, P(\rho \leq \rho_0), is I_u\left(\frac{p}{2}, \frac{\nu}{2}\right), where u = \frac{\rho_0^2}{\rho_0^2 + \nu} is the regularized incomplete beta function. This follows from the relationship R / p \sim F(p, \nu) and the known CDF of the F-distribution. The moments of the radial distance \rho^k exist for k < \nu and are given by E[\rho^k] = \nu^{k/2} \frac{ \Gamma\left( \frac{p + k}{2} \right) \Gamma\left( \frac{\nu - k}{2} \right) }{ \Gamma\left( \frac{p}{2} \right) \Gamma\left( \frac{\nu}{2} \right) }. These derive from the moments of the F-distribution applied to R \sim p F(p, \nu). The variance of \rho is finite for \nu > 2. The tails of the radial distribution exhibit polynomial decay, with P(\rho > \rho_0) \sim c \rho_0^{-\nu} as \rho_0 \to \infty for some constant c > 0 depending on p and \nu; this behavior underscores the heavy-tailed nature of the multivariate t-distribution, with tail heaviness decreasing as \nu increases. In the limit as \nu \to \infty, the distribution of \rho^2 (or R) converges to a \chi^2_p distribution, aligning the multivariate t with the multivariate case.

Transformations

Affine Transformations

The multivariate t-distribution exhibits closure under full-rank affine transformations, a property that preserves its form and degrees of freedom. Specifically, if \mathbf{X} \sim t_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu), where p is the dimension, \boldsymbol{\mu} is the location vector, \boldsymbol{\Sigma} is the positive definite scale matrix, and \nu > 0 is the degrees of freedom, then for any invertible p \times p matrix \mathbf{B} and p-dimensional vector \mathbf{c}, the transformed vector \mathbf{Y} = \mathbf{B} \mathbf{X} + \mathbf{c} follows t_p(\mathbf{B} \boldsymbol{\mu} + \mathbf{c}, \mathbf{B} \boldsymbol{\Sigma} \mathbf{B}^T, \nu). This transformation maintains the elliptical symmetry inherent to the distribution, as the Mahalanobis distance (\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}) maps to (\mathbf{y} - \mathbf{B} \boldsymbol{\mu} - \mathbf{c})^T (\mathbf{B} \boldsymbol{\Sigma} \mathbf{B}^T)^{-1} (\mathbf{y} - \mathbf{B} \boldsymbol{\mu} - \mathbf{c}), which equals the original quadratic form due to the relation (\mathbf{B} \boldsymbol{\Sigma} \mathbf{B}^T)^{-1} = \mathbf{B}^{-T} \boldsymbol{\Sigma}^{-1} \mathbf{B}^{-1}. The proof follows directly from substitution into the (PDF) of the multivariate t-distribution. The PDF of \mathbf{X} is given by f(\mathbf{x}) = \frac{\Gamma\left(\frac{\nu + p}{2}\right)}{(\nu \pi)^{p/2} \Gamma\left(\frac{\nu}{2}\right) |\boldsymbol{\Sigma}|^{1/2}} \left[ 1 + \frac{1}{\nu} (\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right]^{-(\nu + p)/2}. Under the transformation \mathbf{y} = \mathbf{B} \mathbf{x} + \mathbf{c}, the determinant is |\det \mathbf{B}|, and the quadratic form invariance ensures the term in brackets remains structurally for \mathbf{y}. The scale matrix determinant transforms as |\mathbf{B} \boldsymbol{\Sigma} \mathbf{B}^T|^{1/2} = |\boldsymbol{\Sigma}|^{1/2} |\det \mathbf{B}|, so the Jacobian cancels with the determinant factor, yielding the PDF for \mathbf{Y} with updated parameters and unchanged \nu. The full-rank assumption on \mathbf{B} (i.e., invertibility) guarantees that the transformation preserves the p-dimensional support and avoids degeneracy. This property enables practical applications such as standardization, where \mathbf{B} = \boldsymbol{\Sigma}^{-1/2} and \mathbf{c} = -\boldsymbol{\Sigma}^{-1/2} \boldsymbol{\mu} (using a matrix square root) reduce the distribution to the standard form t_p(\mathbf{0}, \mathbf{I}_p, \nu), facilitating computations like moment calculations or simulations. Additionally, it supports sample generation: one can draw from the standard multivariate t-distribution and apply the inverse affine map to obtain samples from a general t_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu), which is useful in Bayesian inference and robust modeling where heavy tails are modeled via scale mixtures of normals.

Linear Combinations and Degeneracy

Linear combinations of random vectors following a multivariate t-distribution preserve the t-family under appropriate conditions. Specifically, for a p-dimensional random vector \mathbf{X} \sim t_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu) with location vector \boldsymbol{\mu}, positive semi-definite scale matrix \boldsymbol{\Sigma}, and degrees of freedom \nu > 0, the univariate linear combination a^\top \mathbf{X} for a fixed p-dimensional vector \mathbf{a} follows a univariate t-distribution: a^\top \mathbf{X} \sim t_1(a^\top \boldsymbol{\mu}, a^\top \boldsymbol{\Sigma} a, \nu). More generally, consider a linear \mathbf{Y} = B \mathbf{X} + \mathbf{c}, where B is a q \times p matrix with q \leq p and \mathbf{c} is a q-dimensional constant . If \operatorname{rank}(B) = q, then \mathbf{Y} follows a non-degenerate q-dimensional multivariate t-distribution: \mathbf{Y} \sim t_q(B \boldsymbol{\mu} + \mathbf{c}, B \boldsymbol{\Sigma} B^\top, \nu). However, if \operatorname{rank}(B) < q, the resulting distribution is degenerate, supported on a lower-dimensional subspace determined by the column space of B, with the scale B \boldsymbol{\Sigma} B^\top having less than q. Degeneracy also arises when the scale matrix \boldsymbol{\Sigma} itself is singular, with \operatorname{rank}(\boldsymbol{\Sigma}) = r < p. In this case, the distribution of \mathbf{X} is concentrated on an r-dimensional affine subspace, and the probability density function (PDF) is defined with respect to the Lebesgue measure on that subspace. The PDF takes the form f(\mathbf{x}) = \frac{\Gamma\left(\frac{\nu + r}{2}\right)}{\Gamma\left(\frac{\nu}{2}\right) (\nu \pi)^{r/2} |\boldsymbol{\Sigma}_{*}|^{1/2} \left[1 + \frac{1}{\nu} (\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^+ (\mathbf{x} - \boldsymbol{\mu})\right]^{(\nu + r)/2}}, where |\boldsymbol{\Sigma}_{*}| denotes the pseudo-determinant of \boldsymbol{\Sigma} (the product of its non-zero eigenvalues), \boldsymbol{\Sigma}^+ denotes the Moore-Penrose pseudoinverse of \boldsymbol{\Sigma}, and the expression holds subject to the constraint that \mathbf{x} lies in the affine subspace \{\mathbf{z} : (\mathbf{z} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^\perp = \mathbf{0}\}, with \boldsymbol{\Sigma}^\perp spanning the null space of \boldsymbol{\Sigma}. This formulation extends the standard non-singular PDF by replacing the inverse with the pseudoinverse in the quadratic form and using the pseudo-determinant for normalization. Such degenerate multivariate t-distributions are useful for modeling data subject to linear equality constraints, as they naturally incorporate singularity in the covariance structure while maintaining the heavy-tailed properties of the t-family. For instance, in Bayesian analysis of linear models with inequality constraints derived from equality restrictions, the posterior distribution of certain parameter subspaces follows a degenerate multivariate t-distribution. Marginal distributions obtained by projecting from a full-rank multivariate t onto a singular subspace thus yield these degenerate forms.

Copula Representation

The multivariate t-copula arises from the multivariate t-distribution through Sklar's theorem, which decomposes any joint cumulative distribution function (CDF) into its marginal CDFs and a capturing the dependence structure. For the standardized multivariate t-distribution with p dimensions, zero mean, correlation matrix R, and degrees of freedom parameter ν > 0, the marginal distributions are univariate t-distributions with ν . The corresponding t-copula C is thus given by C(\mathbf{u}; R, \nu) = T_p \left( T_1^{-1}(u_1), \dots, T_1^{-1}(u_p); R, \nu \right), where Tp(·; R, ν) denotes the CDF of the p-dimensional standardized multivariate t-distribution, T1−1(·; ν) is the inverse CDF (quantile function) of the univariate standard t-distribution with ν degrees of freedom, and u = (u1, …, up) with each ui ∈ (0,1). This representation allows the t-copula to model dependence while permitting arbitrary univariate marginal distributions for the joint variables, facilitating flexible multivariate modeling beyond elliptical contours. The of the t-copula, derived as the ratio of the multivariate t-density to the product of its univariate marginal densities evaluated at the transformed points ti = T1−1(ui; ν), is \begin{aligned} c(\mathbf{u}; R, \nu) &= |R|^{-1/2} \frac{ \Gamma(\nu/2)^p \, \Gamma((\nu + p)/2) }{ \Gamma((\nu + 1)/2)^p \, \Gamma(\nu/2) } \\ &\quad \times \prod_{i=1}^p \left[1 + t_i^2 / \nu \right]^{(\nu + 1)/2} \left[1 + (\mathbf{t}^T R^{-1} \mathbf{t}) / \nu \right]^{-(\nu + p)/2}, \end{aligned} where Γ denotes the and t = (t1, …, tp). This density highlights the t-copula's ability to generate heavier tails in the dependence structure compared to the Gaussian copula, as the in the denominator amplifies clustering in extremes when ν is small. A key property of the t-copula is its symmetric tail dependence, which quantifies the likelihood of joint extreme events. In the bivariate case with parameter ρ ∈ (−1,1), the upper and lower tail dependence coefficients are equal due to : \lambda = 2 \, T_{\nu + 1} \left( -\sqrt{ \frac{ (\nu + 1)(1 - \rho) }{ 1 + \rho } } \right), where Tν+1 is the CDF of the univariate t-distribution with ν + 1 degrees of freedom. This coefficient λ > 0 for finite ν, increasing as ν decreases or |ρ| increases, contrasting with the Gaussian copula's zero tail dependence and enabling better capture of co-movements in tails. For multivariate extensions, pairwise tail dependences follow similar forms, though higher-dimensional joint tails require additional measures like tail dependence functions. The t-copula is widely applied in financial modeling to account for dependence in extreme events, such as joint defaults or market crashes, where linear correlation underestimates tail risks. For instance, it has been used to simulate portfolio credit risk and value-at-risk calculations, leveraging its tail dependence to improve estimates of systemic risk over Gaussian alternatives.

Connections to Other Distributions

The multivariate t-distribution exhibits several important limiting cases that connect it to other well-known distributions. As the degrees of freedom parameter \nu \to \infty, the multivariate t-distribution with location parameter \mu and scale matrix \Sigma converges in distribution to the multivariate normal distribution N_p(\mu, \Sigma). This limit reflects the thinning of the tails as \nu increases, approaching the lighter-tailed Gaussian form. Conversely, when \nu = 1, the distribution specializes to the multivariate Cauchy distribution, characterized by heavy tails and the absence of any finite moments, making it useful for modeling phenomena with extreme outliers. The multivariate t-distribution serves as a special case within broader families of distributions. It is a particular instance of the multivariate Pearson type VII distribution, which generalizes the form by allowing the exponent in the density function to take values beyond those corresponding to half-integer degrees of freedom in the t-case; specifically, the Pearson type VII density is proportional to \left[1 + (x - \mu)^\top \Sigma^{-1} (x - \mu)/m \right]^{-(m + p/2)} for parameter m > 0, reducing to the t-distribution when m = \nu. Additionally, the multivariate t-distribution relates to the multivariate F-distribution through transformations involving ratios of quadratic forms. In particular, if X \sim t_p(\mu, \Sigma, \nu), then the squared Mahalanobis distance (X - \mu)^\top \Sigma^{-1} (X - \mu) follows the distribution of \nu \cdot F_{p, \nu}, where F_{p, \nu} denotes an F-distributed random variable with p and \nu degrees of freedom. From a perspective, the multivariate t-distribution arises as a variance-mean where the mixing is inverse gamma, but replacing the inverse gamma with a more general Gaussian (GIG) mixing yields the broader class of multivariate generalized hyperbolic distributions. These encompass the t-distribution as a special case (when the GIG parameters \lambda = -\nu/2, \chi = 0, and \psi > 0) and allow for additional flexibility in tail behavior and , producing elliptical heavy-tailed models beyond the symmetric t-form. The existence of moments distinguishes the multivariate t-distribution from related distributions like the and Cauchy. While the multivariate has all moments finite, the multivariate Cauchy has none. For the t-distribution, the k-th order moments exist if and only if k < \nu. The table below summarizes these properties for key low-order moments:
Moment OrderMultivariate Multivariate CauchyMultivariate t (\nu df)
k=1 (mean)ExistsDoes not existExists if \nu > 1
k=2 (covariance)ExistsDoes not existExists if \nu > 2
k \geq 3All existNone existExists if k < \nu

References

  1. [1]
    Multivariate Student's t distribution | Properties and proofs - StatLect
    The multivariate (MV) Student's t distribution is a multivariate continuous distribution that generalizes the one-dimensional Student's t distribution.The standard multivariate... · Definition · Marginals · The multivariate Student's t...
  2. [2]
    Multivariate t Distribution - MATLAB & Simulink - MathWorks
    The multivariate Student's t distribution is a generalization of the univariate Student's t to two or more variables.Missing: properties | Show results with:properties
  3. [3]
    The Multivariate t-Distribution Associated with a Set of Normal ...
    The Multivariate t-Distribution Associated with a Set of Normal Sample Deviates. Cornish, E. A.. Abstract. Publication: Australian Journal of Physics.
  4. [4]
    Mathematical Properties of the Multivariate t Distribution
    In this paper, we provide a comprehensive review of the known mathematical properties of multivariate t distributions. We believe that this review will serve as.<|control11|><|separator|>
  5. [5]
    Multivariate Student's t and Its Applications - SpringerLink
    The purpose of this article is to review the multivariate t distribution and its computation, and to provide an overview of some of its applications ...
  6. [6]
    Multivariate T-Distributions and Their Applications
    Contents · 1 - Introduction. pp 1-35 · 2 - The Characteristic Function. pp 36-43 · 3 - Linear Combinations, Products, and Ratios. pp 44-62 · 4 - Bivariate ...
  7. [7]
    Some characterizations of the multivariate t distribution - ScienceDirect
    (1954), pp. 170-176. Google Scholar. [4]. E.A. Cornish. The multivariate t-distribution associated with a set of normal sample deviates. Austral. J. Phys., 7 ( ...
  8. [8]
    Continuous Multivariate Distributions, Models and Applications
    Apr 21, 2000 · Continuous Multivariate Distributions, Volume 1, Second Edition provides a remarkably comprehensive, self-contained resource for this critical statistical area.
  9. [9]
    [PDF] A SHORT REVIEW OF MULTIVARIATE t-DISTRIBUTION
    The multivariate t-distribution is a natural generalization of the univariate Student t- distribution. Cornish (1954) derived it first in connection with a set ...
  10. [10]
    [PDF] A Compendium of Conjugate Priors - Applied Mathematics Consulting
    The marginal distribution of the mean vector, µ, is a multivariate t distribution with 2α degrees of freedom, location vector µ, and precision matrix (α/β)p.Missing: 1982 | Show results with:1982
  11. [11]
    [PDF] Bayesian Inference Chapter 9. Linear models and regression
    It is easy to see that there is a conjugate, multivariate normal-gamma prior distribution for any normal linear model. ... multivariate t-distribution obtained.
  12. [12]
    [PDF] Bayesian Data Analysis Third edition (with errors fixed as of 20 ...
    This book covers the fundamentals of Bayesian inference, including probability, single-parameter models, multi-parameter models, and hierarchical models.
  13. [13]
    [PDF] Methods for the Computation of Multivariate t-Probabilities ∗
    Cornish, E. A. (1954) 'The Multivariate t-Distribution Associated with a Set of Normal Sample Deviates'. Australian Journal of Physics 7, pp. 531–542 ...
  14. [14]
    The Student t Distribution - R
    ... incomplete beta function, in R this is pbeta(x, a,b) . Value. dt gives the density, pt is the cumulative distribution function, and qt is the quantile ...<|separator|>
  15. [15]
  16. [16]
    [1604.00561] On the Conditional Distribution of the Multivariate $t ...
    Apr 2, 2016 · Previous literature has recognized that the conditional distribution of the multivariate t distribution also follows the multivariate t distribution.
  17. [17]
  18. [18]
    [PDF] On the Multivariate t Distribution, Report no. LiTH-ISY-R-3059
    Apr 17, 2013 · We here present results that are used in the derivation of the conditional pdf of a partitioned t random vector, as presented in Section A.6.
  19. [19]
    [PDF] Adv. Appl. Prob. 34(3) 587-608 Elliptical distributions
    Classical examples of elliptical distributions are the multivariate normal distribu- tion and the multivariate t-distribution. Let X D. = µ + RAU ∼ Ed(µ ...
  20. [20]
    Statistical inference for multivariate extremes via a geometric approach
    Multivariate t ν distribution. We consider the multivariate t distribution with ν degrees ... After transforming to radial-angular coordinates ‖ X P ‖ and ...
  21. [21]
    [PDF] SINGULAR MATRIX VARIATE SKEW- ELLIPTICAL DISTRIBUTION ...
    particular, Vm,m is the group of orthogonal matrices O(m). Definition 1 (Matrix-variate Singular Elliptical Distribution) . Let Y ∈. L+ m,N (q), such that Y ...
  22. [22]
    Bayesian Analysis of the Linear Model Subject to Linear Inequality ...
    D72 has a degenerate multivariate t distribution, so it is more convenient to compute the probability using the a2- dimensional distribution of 'Y2 (rather ...