Generalized chi-squared distribution
The generalized chi-squared distribution is the probability distribution of a quadratic form in one or more normally distributed random variables, typically expressed as Q = \mathbf{x}^T A \mathbf{x} + \mathbf{b}^T \mathbf{x} + c, where \mathbf{x} is a multivariate normal random vector with mean \boldsymbol{\mu} and covariance matrix \Sigma, and A, \mathbf{b}, and c are fixed matrices, vectors, and scalars, respectively.[1] This form generalizes the classical chi-squared distribution, which arises as a special case when A = I (the identity matrix), \mathbf{b} = \mathbf{0}, c = 0, and \mathbf{x} \sim N(\mathbf{0}, I), by incorporating arbitrary linear transformations, non-zero means, and affine adjustments that allow for weighted sums of non-central chi-squared variables.[2] In its canonical representation, the generalized chi-squared random variable can be decomposed as \tilde{\chi}^2 = \sum_{i=1}^k w_i \chi^2_{d_i}(\lambda_i) + s Z + m, where w_i are weights (eigenvalues of A \Sigma), \chi^2_{d_i}(\lambda_i) are non-central chi-squared distributions with degrees of freedom d_i and non-centrality parameters \lambda_i, Z is a standard normal variable, and s and m are scalar coefficients accounting for linear and constant terms.[2] The distribution's density and cumulative distribution function (CDF) lack closed-form expressions in general, except for specific cases like equal weights or even degrees of freedom, necessitating numerical methods such as characteristic function inversion or saddlepoint approximations for computation.[1] Its probability density function (PDF) and CDF can exhibit finite or infinite support depending on the signs of the weights w_i and the value of s; for instance, finite tails occur when all w_i > 0 and s = 0, resembling a scaled non-central chi-squared.[2] This distribution plays a central role in multivariate statistical analysis, particularly in hypothesis testing for covariance structures, such as the likelihood ratio test in multivariate normal models, where test statistics follow quadratic forms under the null hypothesis.[1] Computationally challenging due to the potential for mixed-sign eigenvalues leading to bimodal or heavy-tailed densities, it has spurred developments in approximation techniques, including Imhof's numerical integration method from 1961 and recent exact methods like inverse fast Fourier transform (IFFT) and ray-tracing algorithms.[1][2] Beyond statistics, the generalized chi-squared arises in diverse applications, including signal detection in engineering (e.g., matched filter outputs under Gaussian noise), neuroscience (e.g., computing discriminability indices like d' in psychophysics), and machine learning (e.g., evaluating quadratic loss functions or anomaly detection scores).[2] Open-source tools, such as MATLAB and Python implementations, now facilitate its PDF, CDF, and inverse CDF evaluation, enabling broader use in simulations and real-time processing.[2]Definition
Quadratic Form Representation
The generalized chi-squared distribution arises as the distribution of a quadratic form in a multivariate normal random vector. Specifically, let \mathbf{X} \sim \mathcal{N}_d(\boldsymbol{\mu}, \boldsymbol{\Sigma}), where d is the dimension, \boldsymbol{\mu} is the mean vector, and \boldsymbol{\Sigma} is the positive definite covariance matrix. The random variable Y is defined as Y = \mathbf{X}^\top \mathbf{Q}_2 \mathbf{X} + \mathbf{q}_1^\top \mathbf{X} + q_0, where \mathbf{Q}_2 is a symmetric matrix capturing the quadratic structure, \mathbf{q}_1 is a vector of linear coefficients, and q_0 is a constant scalar offset.[2] This formulation generalizes the classical chi-squared distribution by allowing arbitrary quadratic, linear, and constant terms in the normal variables.[1] The matrix \mathbf{Q}_2 is symmetric and may be positive semidefinite (for non-negative support in many applications) or indefinite (allowing mixed-sign eigenvalues and more complex distributions). The linear term \mathbf{q}_1^\top \mathbf{X} introduces asymmetry and shifts the location, while the constant q_0 provides an overall translation. More precisely, after a transformation to standardize the covariance (whitening), the eigenvalues are those of \boldsymbol{\Sigma}^{1/2} \mathbf{Q}_2 \boldsymbol{\Sigma}^{1/2}, or equivalently, the generalized eigenvalues of \mathbf{Q}_2 with respect to \boldsymbol{\Sigma}^{-1}. When \mathbf{Q}_2 = \mathbf{I}_p, \mathbf{q}_1 = \mathbf{0}, q_0 = 0, \boldsymbol{\mu} = \mathbf{0}, and \boldsymbol{\Sigma} = \mathbf{I}_p for dimension p, the distribution of Y recovers the standard central chi-squared distribution with p degrees of freedom.[1] More generally, non-zero \boldsymbol{\mu} leads to noncentrality, akin to the noncentral chi-squared case.[2] A key connection to familiar distributions emerges via spectral decomposition of \mathbf{Q}_2. Since \mathbf{Q}_2 is symmetric, it admits a decomposition \mathbf{Q}_2 = \mathbf{U} \mathbf{D} \mathbf{U}^\top, where \mathbf{U} is orthogonal and \mathbf{D} is diagonal with real eigenvalues \lambda_i for i = 1, \dots, d. Substituting yields \mathbf{X}^\top \mathbf{Q}_2 \mathbf{X} = \sum_{i=1}^d \lambda_i (\mathbf{u}_i^\top \mathbf{X})^2, where \mathbf{u}_i^\top are the rows of \mathbf{U}. For \mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{I}_d) (after whitening if \boldsymbol{\Sigma} \neq \mathbf{I}_d), each \mathbf{u}_i^\top \mathbf{X} \sim \mathcal{N}(\mathbf{u}_i^\top \boldsymbol{\mu}, 1) independently, so the quadratic term becomes a weighted sum (with possibly negative weights) of independent noncentral chi-squared random variables with one degree of freedom each: \sum_{i=1}^d \lambda_i \chi^2_1(\delta_i^2), where \delta_i = \mathbf{u}_i^\top \boldsymbol{\mu}. The full Y incorporates the linear and constant terms, which can be partially absorbed into adjusted noncentralities or treated separately. This decomposition links the quadratic form directly to noncentral chi-squared components, facilitating analysis and computation.[1][3] In the univariate case (d=1), let X \sim \mathcal{N}(\mu, \sigma^2). Then Y = a X^2 + b X + c with a \neq 0, which can be rewritten by completing the square as Y = a (X + b/(2a))^2 + (c - b^2/(4a)). Here, X + b/(2a) \sim \mathcal{N}(\mu + b/(2a), \sigma^2), so Y follows a scaled noncentral chi-squared distribution (with one degree of freedom and noncentrality parameter [(\mu + b/(2a))/\sigma]^2) plus a constant shift when a > 0; for a < 0, the scaling is negative. Equivalently, before completing the square, it appears as a quadratic term in X plus a normal linear term b X.[2]Linear Combination Representation
The generalized chi-squared distribution can be equivalently represented as a linear combination of independent noncentral chi-squared random variables, a standard normal random variable, and a constant term. This form is given by Y = \sum_{i=1}^{p} w_i \chi^2_{k_i, \lambda_i} + s Z + m, where the \chi^2_{k_i, \lambda_i} are independent noncentral chi-squared random variables with k_i degrees of freedom and noncentrality parameters \lambda_i \geq 0, Z \sim N(0,1) is independent of the chi-squared terms, w_i are the weights (real numbers), s is the scale for the normal term, and m is the location parameter. This representation arises from the diagonalization of the underlying quadratic form in the original definition. Specifically, for a quadratic form q(\mathbf{x}) = \mathbf{x}^\top \mathbf{Q}_2 \mathbf{x} + \mathbf{q}_1^\top \mathbf{x} + q_0 where \mathbf{x} \sim N(\boldsymbol{\mu}, \boldsymbol{\Sigma}), an eigendecomposition of \mathbf{Q}_2 (after suitable transformation to standardize the covariance) yields independent components that correspond to the weighted noncentral chi-squared terms, with any residual linear component captured by the s Z + m adjustment; the independence holds under this orthogonal transformation. Early formulations of distributions involving sums of chi-squared variables date back to Imhof (1961), who developed methods for computing the distribution of quadratic forms in normal variables, including cases reducible to linear combinations of chi-squares via characteristic function inversion. Subsequent work by Ruben (1962) extended these to differences of chi-squares, laying groundwork for handling indefinite quadratic forms in the generalized setting. A concrete example occurs in the bivariate normal case, where the quadratic form z_1^2 - z_2^2 - 2\sqrt{2} z_1 + 4 z_2 - 2 (with \mathbf{z} \sim N(\mathbf{0}, \mathbf{I}_2)) decomposes into two weighted noncentral chi-squares with weights w = [1, -1], degrees of freedom k = [1, 1], noncentralities \lambda = [2, 4], and no additional normal or constant terms (s = m = 0). This illustrates the utility of the representation for decomposing complex forms into tractable independent components.Parameters and Conversions
Vector and Scalar Parameters
The generalized chi-squared distribution admits a linear combination representation parameterized by the vector of weights \mathbf{w} = (w_1, \dots, w_p)^\top \in \mathbb{R}^p, the vector of degrees of freedom \mathbf{k} = (k_1, \dots, k_p)^\top with each k_i a positive integer, the vector of noncentrality parameters \boldsymbol{\lambda} = (\lambda_1, \dots, \lambda_p)^\top with each \lambda_i \geq 0, the nonnegative scalar normal scale s \geq 0, and the real-valued location shift m \in \mathbb{R}. In this form, the random variable is expressed as \tilde{\chi}_{\mathbf{w}, \mathbf{k}, \boldsymbol{\lambda}, s, m} = \sum_{i=1}^p w_i \chi'^2_{k_i, \lambda_i} + s Z + m, where the \chi'^2_{k_i, \lambda_i} are independent noncentral chi-squared random variables and Z \sim N(0,1) is independent of them. The weights w_i scale the contributions of each noncentral chi-square term and are typically taken positive to yield a right-tailed distribution with support on [m, \infty) when s = 0, though negative or mixed signs are permitted and extend the support to the reals while altering the tail behavior. The degrees of freedom k_i govern the shape and variance of the i-th chi-square component, with larger k_i leading to more concentrated distributions around their means. The noncentrality parameters \lambda_i quantify the shift from centrality in each component, increasing the mean without affecting the variance scaling directly; \lambda_i = 0 for all i yields a central case. The scale s weights an additional Gaussian term, ensuring unbounded support below m if s > 0, while m provides a simple horizontal translation of the entire distribution. These parameters collectively allow flexible modeling of quadratic forms arising in multivariate normal settings. Non-degeneracy requires positive variance, satisfied when \sum_{i=1}^p w_i^2 k_i > 0, as this ensures the quadratic form is not constant; mixed signs in the w_i can lead to infinite tails and potentially affect unimodality, with all positive w_i and s = 0 typically producing a unimodal density similar to standard chi-squared distributions. Maximum likelihood estimation of these parameters from data, such as observed values of quadratic forms in underlying normal variables, involves optimizing the likelihood function derived from the distribution's density, but presents significant challenges owing to the absence of closed-form expressions for the probability density function (PDF) and cumulative distribution function (CDF), requiring iterative numerical evaluations for each candidate parameter set. As an illustrative example, consider the distribution of a quadratic form corresponding to a simple heteroscedastic regression residual under differing error variances, which can be parameterized with \mathbf{w} = [1, -1]^\top, \mathbf{k} = [1, 1]^\top, \boldsymbol{\lambda} = [2, 4]^\top, s = 0, and m = 0; here, the mixed weights reflect opposing contributions from components with distinct noncentralities, resulting in a hyperbolic form with infinite tails on both sides. This setup arises in models where residuals exhibit varying scales, such as in weighted least squares adjustments for unequal variances.Matrix-Vector Equivalence
The matrix-vector equivalence provides a framework for transforming the parameters of the generalized chi-squared distribution between its quadratic form representation, Y = \mathbf{X}^\top \mathbf{Q}_2 \mathbf{X} + \mathbf{q}_1^\top \mathbf{X} + q_0 where \mathbf{X} \sim \mathcal{N}_d(\boldsymbol{\mu}, \boldsymbol{\Sigma}), and its linear combination representation, Y \stackrel{d}{=} \sum_{i=1}^d w_i \tilde{\chi}^2_{1, \lambda_i} + c, where \tilde{\chi}^2_{1, \lambda_i} denotes a non-central chi-squared random variable with 1 degree of freedom and non-centrality parameter \lambda_i, and c is a constant shift. This equivalence is achieved through spectral decomposition, enabling the distribution to be expressed as a sum of independent weighted non-central chi-squares, which is particularly useful for computational methods like saddlepoint approximations or simulation. To obtain the linear combination parameters from the quadratic form, begin by standardizing the random vector to \mathbf{Z} = \boldsymbol{\Sigma}^{-1/2} (\mathbf{X} - \boldsymbol{\mu}) \sim \mathcal{N}_d(\mathbf{0}, \mathbf{I}_d). Substituting yields Y = \mathbf{Z}^\top (\boldsymbol{\Sigma}^{1/2} \mathbf{Q}_2 \boldsymbol{\Sigma}^{1/2}) \mathbf{Z} + \boldsymbol{\beta}^\top \mathbf{Z} + k, where \boldsymbol{\beta} = 2 \boldsymbol{\Sigma}^{1/2} \mathbf{Q}_2 \boldsymbol{\mu} + \boldsymbol{\Sigma}^{1/2} \mathbf{q}_1 is the vector coefficient of the linear term in \mathbf{Z}, and k = \boldsymbol{\mu}^\top \mathbf{Q}_2 \boldsymbol{\mu} + \mathbf{q}_1^\top \boldsymbol{\mu} + q_0 is the initial constant. Let \mathbf{A} = \boldsymbol{\Sigma}^{1/2} \mathbf{Q}_2 \boldsymbol{\Sigma}^{1/2} admit the eigendecomposition \mathbf{A} = \mathbf{P} \mathbf{D} \mathbf{P}^\top, with \mathbf{D} = \operatorname{diag}(w_1, \dots, w_d) containing the eigenvalues w_i (the weights) and \mathbf{P} the orthogonal matrix of eigenvectors. Transforming to the eigenbasis via \mathbf{W} = \mathbf{P}^\top \mathbf{Z} \sim \mathcal{N}_d(\mathbf{0}, \mathbf{I}_d) gives Y = \sum_{i=1}^d w_i W_i^2 + \sum_{i=1}^d \gamma_i W_i + k, with \boldsymbol{\gamma} = \mathbf{P}^\top \boldsymbol{\beta}. Completing the square for each term produces w_i W_i^2 + \gamma_i W_i = w_i (W_i + \delta_i)^2 - w_i \delta_i^2, \quad \delta_i = \frac{\gamma_i}{2 w_i}, assuming w_i \neq 0; terms with w_i = 0 contribute linearly but are typically absorbed or handled separately in degenerate cases. Thus, Y \stackrel{d}{=} \sum_{i=1}^d w_i \tilde{\chi}^2_{1, \lambda_i} + c, where the non-centrality parameters are \lambda_i = \delta_i^2 = \gamma_i^2 / (4 w_i^2), and the overall constant is c = k - \sum_{i=1}^d w_i \lambda_i. If the original linear term is absent (\mathbf{q}_1 = \mathbf{0}), then \boldsymbol{\beta} = 2 \boldsymbol{\Sigma}^{1/2} \mathbf{Q}_2 \boldsymbol{\mu} and \mathbf{q}_1 effectively becomes $2 \mathbf{Q}_2 \boldsymbol{\mu} in the unstandardized form, with q_0 = \boldsymbol{\mu}^\top \mathbf{Q}_2 \boldsymbol{\mu} + c' adjusted by the completion terms. The reverse conversion, from linear combination parameters to quadratic form parameters, involves reconstructing \mathbf{Q}_2, \mathbf{q}_1, and q_0 for a chosen \boldsymbol{\Sigma} and \boldsymbol{\mu}. One approach sets \boldsymbol{\Sigma} = \mathbf{I}_d for simplicity and constructs \mathbf{Q}_2 = \mathbf{P} \mathbf{D} \mathbf{P}^\top using an arbitrary orthogonal \mathbf{P}, with non-centralities dictating \boldsymbol{\mu} via \lambda_i = (\sqrt{|w_i|} \mu_i)^2 / |w_i| for aligned components (adjusted for signs of w_i). The linear term follows as \mathbf{q}_1 = -2 \mathbf{Q}_2 \boldsymbol{\mu} to center the form, and q_0 incorporates the shift c + \sum w_i \lambda_i. This parameterization is not unique, as rotations in the eigenbasis preserve the distribution. Numerical stability in these conversions can be compromised when \boldsymbol{\Sigma} or \mathbf{Q}_2 is ill-conditioned, leading to inaccurate computation of \boldsymbol{\Sigma}^{1/2} or sensitive eigendecompositions, especially if eigenvalues cluster near zero or change sign. In such cases, generalized eigenvalue solvers or perturbative methods are recommended over direct eigendecomposition; for positive semi-definite forms, Cholesky factorization of \mathbf{A} may suffice to avoid full spectral analysis. Software implementations often incorporate condition number checks and scaling to mitigate precision loss. As an illustrative example, consider a bivariate case with d=2, \boldsymbol{\Sigma} = \mathbf{I}_2, \boldsymbol{\mu} = (1, 0)^\top, \mathbf{q}_1 = \mathbf{0}, q_0 = 0, and \mathbf{Q}_2 = \begin{pmatrix} 2 & 1 \\ 1 & 3 \end{pmatrix}. The matrix \mathbf{A} = \mathbf{Q}_2 has eigenvalues w_1 \approx 1.382, w_2 \approx 3.618 with eigenvectors forming \mathbf{P}. Then \boldsymbol{\beta} = 2 \mathbf{Q}_2 \boldsymbol{\mu} = (4, 2)^\top, \boldsymbol{\gamma} = \mathbf{P}^\top \boldsymbol{\beta}, yielding \lambda_1 \approx 0.724, \lambda_2 \approx 0.276, and c \approx 0 after adjustments, confirming the equivalence to $1.382 \tilde{\chi}^2_{1, 0.724} + 3.618 \tilde{\chi}^2_{1, 0.276} + 0. This demonstrates how the transformation captures the non-centrality induced by \boldsymbol{\mu}.Properties
Support and Asymptotic Behavior
The support of the generalized chi-squared distribution, defined as \tilde{\chi} = \sum_i w_i \chi'^2_{k_i, \lambda_i} + s z + m where \chi'^2_{k_i, \lambda_i} are independent noncentral chi-squared random variables with degrees of freedom k_i and noncentrality parameters \lambda_i, z \sim \mathcal{N}(0,1), w_i are weights, s is the coefficient of the Gaussian term, and m is an offset constant, depends on the signs of the weights w_i and the value of s. If all w_i \geq 0 and s = 0, the support is [m, \infty), as each weighted noncentral chi-squared term is nonnegative. Conversely, if all w_i \leq 0 and s = 0, the support is (-\infty, m] by symmetry, reflecting the negative scaling of nonnegative components. When the w_i have mixed signs or s \neq 0, the support extends to the entire real line \mathbb{R}, due to the unbounded influence of the indefinite quadratic form or the Gaussian term. Degenerate cases occur when s = 0 and all w_i = 0, reducing the distribution to a point mass at m. Tail behaviors vary based on these parameters. If all w_i \leq 0 and s > 0, the right tail decays asymptotically as \sim \exp(-t^2/2) for large positive t, dominated by the Gaussian term. If s = 0 and all w_i > 0, the right tail exhibits chi-squared-like exponential decay, approximately f(t) \approx (t/w_*)^{(k_*-2)/2} \exp(-t/(2w_*)) for the dominant term with largest positive weight w_* and corresponding degrees of freedom k_* (when \lambda_* = 0), or adjusted by \exp(\sqrt{\lambda_* t / w_*}) for positive noncentrality \lambda_* > 0. In general with positive w_i > 0 and s > 0, the right tail remains chi-squared-like, dominated by the quadratic terms. The left tail follows analogous behavior with sign flips: Gaussian-like \exp(-t^2/2) decay for large negative t when s < 0 and all w_i \geq 0; chi-squared-like decay on the left when s = 0 and all w_i < 0. These asymptotics arise from large-deviation principles applied to the quadratic form in normals.[2] Asymptotic approximations provide refined estimates for both central and tail regions. In the central region, Edgeworth expansions improve upon the normal approximation by incorporating higher cumulants of the distribution, yielding series corrections of order O(1/\sqrt{n}) for large sample sizes underlying the quadratic form. For tail probabilities, saddlepoint approximations, such as the Lugannani-Rice formula, offer high accuracy by evaluating the cumulant generating function at a saddlepoint, with relative errors often below $10^{-3} even for moderate deviations; the formula approximates P(\tilde{\chi} \leq t) \approx 1 - \Phi(\hat{w}) + \phi(\hat{w})(1/\hat{u} - 1/\hat{w}), where \hat{u} and \hat{w} are standardized saddlepoint solutions. For example, the offset m shifts the support uniformly: in centered normal variables (\lambda_i = 0), m = 0 yields support starting at the origin for positive weights, whereas noncentered normals introduce a positive m from the mean vector's quadratic contribution, displacing the lower bound accordingly without altering the tail decay rates.Moments
The mean of a random variable Y following the generalized chi-squared distribution, expressed in its linear combination representation as Y = \sum_i w_i \chi^2_{k_i}(\lambda_i) + U where the \chi^2_{k_i}(\lambda_i) are independent noncentral chi-squared random variables with degrees of freedom k_i and noncentrality parameters \lambda_i, and U \sim N(\mu, \sigma^2) is an independent normal random variable, is given by E[Y] = \sum_i w_i (k_i + \lambda_i) + \mu. This follows from the linearity of expectation and the known first moments of the component distributions, where E[\chi^2_{k}(\lambda)] = k + \lambda and E[U] = \mu.[4][5] Similarly, the variance is \text{Var}[Y] = 2 \sum_i w_i^2 (k_i + 2 \lambda_i) + \sigma^2, arising from the independence of the components and the second moments \text{Var}[\chi^2_{k}(\lambda)] = 2(k + 2\lambda) and \text{Var}[U] = \sigma^2.[4][5] In special cases, these moments recover those of the standard (central) chi-squared distribution. Specifically, if all \lambda_i = 0, w_i = 1, the normal term is absent (\mu = \sigma^2 = 0), and there are k terms each with k_i = 1, then Y \sim \chi^2_k with mean k and variance $2k.[4] Higher-order moments of Y can be obtained via recursive relations derived from the moments of the noncentral chi-squared components, leveraging the independence to add cumulants across terms (with the normal contributing only to the first two cumulants). The cumulants \kappa_n of a single noncentral chi-squared \chi^2_k(\lambda) up to fourth order are \kappa_1 = k + \lambda, \kappa_2 = 2(k + 2\lambda), \kappa_3 = 8(k + 3\lambda), and \kappa_4 = 48(k + 4\lambda). Skewness and kurtosis then follow from these cumulants using standard relations, such as skewness \gamma_1 = \kappa_3 / \kappa_2^{3/2} and excess kurtosis \gamma_2 = \kappa_4 / \kappa_2^2. For the full Y, the total cumulants are the weighted sums over the chi-squared terms plus the normal's contributions.[4][6] As an example, consider the quadratic form Y = X^T A X where X \sim N_p(\mu, \Sigma) with p-dimensional mean \mu and covariance \Sigma, and A is symmetric. The mean is E[Y] = \operatorname{tr}(A \Sigma) + \mu^T A \mu and the variance is \operatorname{Var}[Y] = 2 \operatorname{tr}((A \Sigma)^2) + 4 \mu^T A \Sigma A \mu, which aligns with the general linear combination form after eigendecomposition of A relative to \Sigma. This arises in contexts like testing means in multivariate normal models.[6]Moment-Generating Function
The moment-generating function (MGF) of a random variable Y following the generalized chi-squared distribution, expressed in the linear combination representation as Y = \sum_i w_i \chi^2_{k_i}(\lambda_i) + U + c where the \chi^2_{k_i}(\lambda_i) are independent noncentral chi-squared random variables, U \sim N(\mu, \sigma^2) is independent of them, and c is a constant, is given by M_Y(t) = \exp\left( c t + \mu t + \frac{\sigma^2 t^2}{2} + \sum_i \frac{w_i \lambda_i t}{1 - 2 w_i t} \right) \bigg/ \prod_i (1 - 2 w_i t)^{k_i / 2}, valid for real t less than the minimum of $1/(2 w_i) over all i with w_i > 0.[7][8] If some w_i < 0, the domain of convergence is an open interval around zero determined by the poles of the expression, and the MGF admits analytic continuation to the complex plane excluding branch cuts from the poles.[7] This closed-form expression arises from the independence of the component distributions: the MGF is the product of the individual MGFs, where the noncentral chi-squared component w_i \chi^2_{k_i}(\lambda_i) has MGF \exp\left( \frac{w_i \lambda_i t}{1 - 2 w_i t} \right) (1 - 2 w_i t)^{-k_i / 2} for t < 1/(2 w_i) if w_i > 0, the normal term contributes \exp(\mu t + \sigma^2 t^2 / 2), and the constant c contributes \exp(c t).[8][9] Equivalently, in the quadratic form representation Y = X^\top A X + b^\top X + c with X \sim N_p(\mu, \Sigma), the MGF follows from eigenvalue decomposition of A \Sigma, yielding the summed and product terms over the eigenvalues w_i with multiplicities k_i and adjusted noncentralities \lambda_i.[9] The logarithm of the MGF serves as the cumulant-generating function, from which the cumulants of Y can be obtained by successive differentiation at t = 0; this facilitates derivations of higher-order moments and approximations for the distribution's tails or central behavior.[7] For instance, in the special case of a central chi-squared distribution with \sum_i k_i = \nu degrees of freedom (where all \lambda_i = 0, \mu = 0, \sigma^2 = 0, c = 0, and all w_i = 1), the MGF simplifies to (1 - 2 t)^{-\nu / 2} for t < 1/2.[8]Numerical Computation
Density and Cumulative Distribution
The probability density function (PDF) of the generalized chi-squared distribution has no closed-form expression in general. Alternatively, the PDF can be obtained via Fourier inversion of the characteristic function: f(y) = \frac{1}{2\pi} \int_{-\infty}^\infty \phi(t) e^{-i t y} \, dt, where \phi(t) is the characteristic function of the distribution.[10] The cumulative distribution function (CDF) is given by F(y) = P(Y \leq y) = \int_{-\infty}^y f(u) \, du, which lacks a closed form and requires numerical approximation. Common methods include Gauss-Laguerre quadrature for efficient integration over the positive support when applicable, particularly for distributions with non-negative weights, and saddlepoint approximations that leverage the cumulant-generating function for high accuracy across the domain.[11] Key algorithms for numerical evaluation focus on the CDF, with extensions to the PDF via differentiation or direct inversion. Imhof's method (1961), an iterative approach based on Gil-Pelaez inversion of the characteristic function, computes the CDF as F(y) = \frac{1}{2} - \frac{1}{\pi} \int_0^\infty \frac{\operatorname{Im} [\phi(t) e^{-i t y}]}{t} \, dt, offering robust performance for quadratic forms in normal variables. Davies' algorithm (1980), a Fourier-based method tailored to linear combinations of chi-squared random variables, extends this by incorporating offsets and normal terms, achieving efficient computation through numerical quadrature of the inverted characteristic function.[1][12] These algorithms typically provide absolute error bounds below $10^{-8} in double-precision arithmetic, with computation times on the order of milliseconds per evaluation point for moderate dimensions. Implementations are available in statistical software, such as the R package CompQuadForm, which supports Imhof's and Davies' methods alongside others for quadratic forms.[13] For illustration, consider a simple two-component case where Y = \chi_1^2 + 0.5 \chi_2^2, with \chi_1^2 and \chi_2^2 independent central chi-squared variables with 1 degree of freedom each. The PDF is unimodal and positively skewed, peaking near 0.5 and decaying asymptotically, computable via Davies' algorithm to high precision.[12][13]Tail Approximations
Tail approximations for the generalized chi-squared distribution are essential for evaluating rare-event probabilities, particularly in the upper or lower tails where direct numerical integration becomes inefficient. For large values y in the right tail, P(Y > y) \approx a \bar{F}_{\chi'^2_{k^*, \lambda^*}}(y / w^*), where a is a scaling factor, w^* is the largest eigenvalue (weight), k^* the corresponding degrees of freedom, \lambda^* the non-centrality parameter, and \bar{F}_{\chi'^2_{k^*, \lambda^*}} is the survival function of a non-central chi-squared distribution, often expressed via the Marcum Q-function Q_{k^*/2}(\sqrt{\lambda^*}, \sqrt{y / w^*}). This Laplace-type asymptotic expansion simplifies computation by reducing the tail to a scaled standard form, with the left tail obtainable by sign-flipping the weights. Advanced numerical methods address tail computations more broadly. Ruben's algorithm, developed in the 1960s for positive definite quadratic forms (same-sign weights), expresses the tail probability as an infinite series: P(Y > y) = \sum_{i=0}^\infty a_i \bar{F}_{\chi^2_{d+2i}}(y / \beta), where d is the dimension, \beta a scaling parameter, and coefficients a_i derived from the characteristic function; it achieves machine precision (down to $10^{-308}) in finite tails but is limited to non-indefinite cases.[14] Ray-tracing integrates over rays from the multivariate normal center to the boundary defined by Y = y, excelling in infinite tails with GPU acceleration (0.2 s for $10^6 rays to $10^{-308}) but less so in finite tails due to sampling sparsity in high dimensions. The inverse fast Fourier transform (IFFT) inverts the characteristic function on a grid, yielding tail estimates in 60-70 ms with accuracy to $10^{-3}, suitable for broad coverage but degrading in far tails from grid resolution limits. Asymptotic expansions via saddlepoint methods provide high-order accuracy for tails. The Lugannani-Rice formula approximates the tail cumulative distribution function using derivatives of the moment-generating function (MGF): P(Y > y) \approx 1 - \Phi(\hat{w}) + \phi(\hat{w}) \left( \frac{1}{\hat{u}} - \frac{1}{\hat{w}} \right), where \hat{w} and \hat{u} solve saddlepoint equations from the cumulant generating function, offering relative errors of O(1/n) for sums of independent variables, extendable to quadratic forms.[15][16] Comparisons highlight trade-offs in speed and accuracy, especially for large degrees of freedom k. Imhof's method, integrating the imaginary part of the characteristic function P(Y > y) = \frac{1}{2} - \frac{1}{\pi} \int_0^\infty \frac{\operatorname{Im}[\phi(t) e^{-i t y}]}{t} dt, matches IFFT in central speed (20 ms vs. 60 ms) but requires variable precision (0.5 s) for far tails to $10^{-10}, while IFFT remains stable for large k (errors < $10^{-3}) but sacrifices deep-tail precision; ray-tracing outperforms both in infinite tails for indefinite forms yet slows with k > 100 due to ray sparsity. In signal detection, tail approximations set thresholds for quadratic detectors in Gaussian noise; for instance, with weights \{1, 0.5\} and non-centrality \lambda = 2, the right-tail probability P(Y > 3) \approx 0.05 using the scaled non-central chi-squared guides false-alarm rates in radar systems.[17]Random Variate Generation
Generating random variates from the generalized chi-squared distribution, which arises as a quadratic form in multivariate normal random variables, can be accomplished through several established algorithms. One direct approach involves simulating the underlying multivariate normal vector and evaluating the quadratic expression explicitly. Specifically, generate \mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}) using Cholesky decomposition of \boldsymbol{\Sigma}, then compute Y = \mathbf{X}^T \mathbf{Q} \mathbf{X} + 2 \boldsymbol{\beta}^T \mathbf{X} + \gamma, where \mathbf{Q}, \boldsymbol{\beta}, and \gamma define the quadratic, linear, and constant terms, respectively. This method is straightforward and leverages standard multivariate normal generators, with computational cost dominated by the O(p^2) operations for the form evaluation per sample, where p is the dimension.[18] A more structured alternative relies on decomposing the distribution into a linear combination of independent noncentral chi-squared random variables, facilitated by spectral decomposition of the quadratic form matrix. The generalized chi-squared Y admits the representation Y = \sum_{i=1}^m \lambda_i W_i + 2 \mathbf{b}^T \mathbf{Z}, where W_i \sim \chi^2_{k_i, \delta_i} are independent noncentral chi-squared variates with degrees of freedom k_i and noncentrality parameters \delta_i, the \lambda_i are scalar weights derived from eigenvalues, \mathbf{b} captures residual linear effects, and \mathbf{Z} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}) is a standard multivariate normal vector orthogonal to the quadratic components. To simulate each W_i, employ the Poisson mixture representation: generate N_i \sim \mathrm{Poisson}(\delta_i / 2), then condition on N_i to draw from a central chi-squared \chi^2_{k_i + 2N_i}, which is equivalent to a gamma distribution \mathrm{Gamma}((k_i + 2N_i)/2, 1/2). The resulting Y is obtained by summing the scaled W_i and the normal linear term, enabling efficient generation once the decomposition parameters are precomputed. This decomposition aligns with parameter conversions from vector and scalar forms, such as eigendecomposition of idempotent matrices for degrees of freedom.[18] For cases where direct decomposition is computationally intensive or the support is constrained (e.g., positive definiteness requirements), advanced techniques such as accept-reject sampling can be applied using a bounding envelope derived from the mixture components or moment-matched proposals. In accept-reject methods, propose variates from a simpler envelope distribution (e.g., a scaled chi-squared) that majorizes the target density, accepting with probability proportional to the ratio of densities; this ensures exact sampling while bounding inefficiency via the envelope constant. Conditional simulation methods further adapt to restricted supports by generating from truncated normals or conditionals within the quadratic framework, useful for indefinite forms. These approaches maintain exactness but may incur variable acceptance rates depending on parameter skewness. The eigendecomposition-based decomposition incurs an initial O(p^3) cost for computing eigenvalues and noncentralities in p-dimensions, followed by O(m) operations per variate, where m \leq p is the effective rank, making it suitable for repeated sampling after setup. Implementations of these algorithms are available in statistical software; for example, the MATLAB toolboxgx2 provides the function gx2rnd for generating variates via decomposition and direct methods, supporting user-specified parameters for efficient Monte Carlo simulations.[18][19]
As an illustrative application, random variates from the generalized chi-squared distribution facilitate Monte Carlo estimation of variances in quadratic-based statistics, such as assessing the variability of likelihood ratio tests in multivariate models where the test statistic follows this distribution under the alternative hypothesis.