Fact-checked by Grok 2 weeks ago

Inverse-chi-squared distribution

The inverse-chi-squared distribution, also known as the inverted chi-squared distribution, is a continuous probability distribution defined on the positive real line that describes the distribution of the reciprocal of a chi-squared random variable scaled by its degrees of freedom. It is a special case of the inverse-gamma distribution, specifically with shape parameter \alpha = \nu/2 and rate parameter \beta = \nu \sigma^2 / 2, where \nu > 0 represents the degrees of freedom and \sigma^2 > 0 is a scale parameter. The of the scaled inverse-chi-squared distribution is given by f(x \mid \nu, \sigma^2) = \frac{1}{\Gamma(\nu/2)} \left( \frac{\nu \sigma^2}{2} \right)^{\nu/2} x^{-(\nu/2 + 1)} \exp\left( -\frac{\nu \sigma^2}{2x} \right), \quad x > 0, where \Gamma denotes the . This form arises naturally when considering the transformation X = 1/Y where Y follows a with \nu , adjusted by the scale. For \nu > 2, the mean of the distribution is \mathbb{E}[X] = \frac{\nu \sigma^2}{\nu - 2}, and the mode is \frac{\nu \sigma^2}{\nu + 2}. The variance exists for \nu > 4 and is \mathrm{Var}(X) = \frac{2 \nu^2 \sigma^4}{(\nu - 2)^2 (\nu - 4)}. These moments highlight the distribution's heavy-tailed nature for small \nu, making it suitable for modeling in scale parameters. In , the inverse-chi-squared distribution serves as a for the variance parameter \sigma^2 of a with known , ensuring that the posterior distribution remains in the same after updating with data from i.i.d. normal observations. This conjugacy facilitates closed-form inference, particularly in hierarchical models like the normal-inverse-chi-squared distribution, which jointly specifies priors for both and variance.

Definition and Parameterization

Standard Inverse-chi-squared

The standard inverse-chi-squared distribution arises as the of the of a . Specifically, if X follows a with \nu > 0 , then Y = 1/X follows the standard inverse-chi-squared distribution with parameter \nu. This distribution is parameterized by a single scalar \nu, the , which equals twice the \alpha = \nu/2 in its equivalent inverse-gamma form with \beta = 1/2. The support is restricted to , y > 0. The distribution was introduced in the context of sampling theory in the mid-20th century, particularly for modeling variance components in linear models with unequal variances. The derivation begins with the of the for X and applies the transformation y = 1/x. This requires multiplying by the absolute value of the Jacobian determinant, |dx/dy| = 1/y^2, to yield the density of Y.

Scaled Inverse-chi-squared

The scaled inverse-chi-squared distribution is the distribution of \tau / X, where X follows a with \nu and \tau > 0 serves as a scale factor. It is defined for positive random variables and provides flexibility in modeling scaled variances compared to the unscaled case. The distribution is parameterized by two positive values: the \nu > 0 and the scale \tau > 0, with restricted to y > 0. In practice, \tau frequently incorporates a estimate for variance, scaled appropriately to reflect uncertainty in the model. This parameterization is equivalent to an using shape \nu/2 and scale \tau/2. When \tau = 1, the scaled form coincides with the standard inverse-chi-squared distribution. A common specification in sets \tau = \nu \sigma_0^2, where \sigma_0^2 denotes a scale for the variance.

Mathematical Properties

Probability Density Function

The (PDF) of the standard inverse-chi-squared distribution, parameterized by the \nu > 0, is f(y \mid \nu) = \frac{1}{2^{\nu/2} \Gamma(\nu/2)} y^{-(\nu/2 + 1)} \exp\left(-\frac{1}{2y}\right) for y > 0, and f(y \mid \nu) = 0 otherwise. This form arises as a special case of the with \alpha = \nu/2 and \beta = 1/2. The scaled inverse-chi-squared distribution extends this by incorporating a positive \tau > 0, yielding the PDF f(y \mid \nu, \tau) = \frac{(\tau/2)^{\nu/2}}{\Gamma(\nu/2)} y^{-(\nu/2 + 1)} \exp\left(-\frac{\tau}{2y}\right) for y > 0, and zero otherwise. Here, the standard form corresponds to \tau = 1, while larger \tau shifts the distribution toward larger values of y, increasing the mean and mode proportionally. Both variants are supported only on the positive real line, reflecting their role in modeling positive quantities such as variances in . The PDF exhibits unimodal behavior for \nu > 0, with the mode occurring at y = 1/(\nu + 2) for the standard form. As y \to 0^+, f(y) \to 0, dominated by the term despite the singularity from the power law. Similarly, as y \to \infty, f(y) \to 0 via power-law modulated by the slowly varying approaching 1. The overall shape is heavily right-skewed for small \nu (e.g., \nu < 5), featuring a sharp peak near zero and a long tail extending to large y; as \nu increases, the skewness diminishes, and the distribution approaches greater symmetry around its . This evolution mirrors that of the parent inverse-gamma family, making the distribution suitable for priors on precision parameters where heavy tails capture uncertainty in low-information scenarios. For \nu \leq 2, the PDF's tail heaviness leads to an infinite mean, as explored further in the moments section.

Moments and Central Tendency

The inverse-chi-squared distribution, in its standard form with degrees of freedom parameter \nu > 0, has a mean of \mathbb{E}[Y] = \frac{1}{\nu - 2} provided \nu > 2; otherwise, the mean is infinite. For the scaled form, parameterized by an additional scale \tau > 0, the mean is \mathbb{E}[Y] = \frac{\tau}{\nu - 2} under the same condition \nu > 2. The variance for the standard form is \mathrm{Var}(Y) = \frac{2}{(\nu - 2)^2 (\nu - 4)} when \nu > 4; it does not exist otherwise. In the scaled case, the variance is \mathrm{Var}(Y) = \frac{2 \tau^2}{(\nu - 2)^2 (\nu - 4)} for \nu > 4. These expressions arise from the distribution's representation as a special case of the , where the moments follow from the properties of the . The , representing the most probable value, is obtained by maximizing the and equals \frac{1}{\nu + 2} for the standard form and \frac{\tau}{\nu + 2} for the scaled form. This derivation involves setting the derivative of the log-density to zero, yielding the location of the peak in the positively skewed density. The has no and must be approximated numerically, often via its relation to the or by solving the equation. The increases with \nu, reflecting the distribution's tendency to concentrate toward smaller values as the grow, though it remains between the and due to positive . Higher-order moments of order k exist only when \nu > 2k. For the standard form, the k-th raw moment is given by \mathbb{E}[Y^k] = 2^{-k} \frac{\Gamma\left(\frac{\nu}{2} - k\right)}{\Gamma\left(\frac{\nu}{2}\right)}, while for the scaled form it is \mathbb{E}[Y^k] = \left(\frac{\tau}{2}\right)^k \frac{\Gamma\left(\frac{\nu}{2} - k\right)}{\Gamma\left(\frac{\nu}{2}\right)}. These follow from the negative moments of the underlying , which is gamma-distributed. The distribution exhibits positive , with the exceeding both the and ; this rightward pull on the stems from the heavy right tail, where large values of Y occur with non-negligible probability despite the concentration around the for large \nu.

Sampling and Generation

The primary method for generating random variates from the inverse-chi-squared distribution involves a direct transformation from the , which is exact and computationally efficient. For the standard inverse-chi-squared distribution with \nu , the algorithm consists of two steps: (1) draw a random variate Z from the chi-squared distribution with \nu , \chi^2(\nu); (2) compute Y = 1/Z. This transformation yields a sample from the target distribution because the of a chi-squared variate follows the inverse-chi-squared by definition. For the scaled inverse-chi-squared distribution with \nu and \tau, the procedure is analogous: after generating Z \sim \chi^2(\nu), set Y = \tau / Z. This adjustment incorporates the scaling factor directly into the , maintaining exactness without requiring rejection steps, as the probability is 1. The is particularly efficient for moderate values of \nu, where chi-squared sampling is straightforward via summation of squared standard normals or gamma variates. The inverse-chi-squared distribution is equivalent to a special case of the , facilitating alternative sampling approaches. Specifically, the standard form corresponds to an with \alpha = \nu/2 and \beta = 1/2, while the scaled form uses \beta = \tau / 2. Random variates can thus be generated using established inverse-gamma samplers, which internally apply similar transformations from the . In statistical software, these methods are readily implemented. For example, in , samples from the inverse-chi-squared can be obtained via $1 / \mathrm{rchisq}(n, \nu), where n is the desired sample size, leveraging the built-in chi-squared generator. In Python's library, the invgamma.rvs(a=\nu/2, scale=1/2, size=n) function provides direct sampling for the standard case, with the scale adjusted to \tau / 2 for the scaled variant. These implementations ensure high efficiency for typical applications in and simulation studies.

Relationships to Other Distributions

Connection to Chi-squared Distribution

The inverse-chi-squared distribution with \nu > 0 arises directly as the reciprocal of a . Specifically, if X \sim \chi^2(\nu), then Y = 1/X follows an inverse-chi-squared distribution with \nu . To derive this transformation, consider the (PDF) of the : f_X(x) = \frac{1}{2^{\nu/2} \Gamma(\nu/2)} x^{\nu/2 - 1} e^{-x/2}, \quad x > 0. Under the y = 1/x, so x = 1/y and the is |dx/dy| = 1/y^2. Substituting yields the PDF of Y: f_Y(y) = f_X(1/y) \cdot \frac{1}{y^2} = \frac{1}{2^{\nu/2} \Gamma(\nu/2)} \left(\frac{1}{y}\right)^{\nu/2 - 1} e^{-1/(2y)} \cdot \frac{1}{y^2} = \frac{1}{2^{\nu/2} \Gamma(\nu/2)} y^{-\nu/2 - 1} e^{-1/(2y)}, \quad y > 0, which is the PDF of the (unscaled) inverse-chi-squared distribution. A related quantile relationship follows from this transformation: the p-quantile of the inverse-chi-squared distribution with \nu degrees of freedom is the reciprocal of the (1-p)-quantile of the with \nu . This property is used, for instance, to construct credible intervals for variance parameters by inverting chi-squared . The inverse-chi-squared distribution emerged in the statistical literature of the and , particularly for inverting chi-squared-based tests and deriving confidence intervals for normal population variances. Like the chi-squared distribution, the inverse-chi-squared is supported on (0, \infty) and takes only positive values, reflecting the non-negativity of squared normals underlying the chi-squared. However, the inversion alters tail behavior: the chi-squared has exponentially decaying (light) tails, while the inverse-chi-squared exhibits power-law (heavy) tails near zero, resulting in inverted moment existence conditions—for example, the mean exists only for \nu > 2, in contrast to the chi-squared mean, which exists for all \nu > 0.

Relation to Inverse-gamma Distribution

The inverse-chi-squared distribution is a special case of the , which provides a broader framework for modeling positive random variables with heavy tails. This relationship allows the inverse-chi-squared to be expressed using the more general parameterization of the inverse-gamma family, facilitating computations and extensions in statistical modeling. For the standard inverse-chi-squared distribution Inv-\chi^2(\nu) with \nu > 0 , it corresponds to an inverse-gamma distribution in shape-rate parameterization, InvGamma(\alpha = \nu/2, \beta = 1/2). The (PDF) of the standard inverse-chi-squared is given by f(x \mid \nu) = \frac{1}{2^{\nu/2} \Gamma(\nu/2)} x^{-\nu/2 - 1} \exp\left( -\frac{1}{2x} \right), \quad x > 0, which directly matches the inverse-gamma PDF f(x \mid \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{-\alpha - 1} \exp\left( -\frac{\beta}{x} \right), \quad x > 0, upon substituting \alpha = \nu/2 and \beta = 1/2. This equivalence can be verified by aligning the normalizing constants via properties of the , \Gamma(z+1) = z \Gamma(z), and confirming that the exponential terms and power laws coincide. The moments also align; for instance, the mean of Inv-\chi^2(\nu) is $1/(\nu - 2) for \nu > 2, matching the inverse-gamma mean \beta / (\alpha - 1) = (1/2) / (\nu/2 - 1) = 1/(\nu - 2). The scaled inverse-chi-squared distribution Inv-\chi^2(\nu, \tau^2), which incorporates a \tau^2 > 0, is similarly equivalent to InvGamma(\alpha = \nu/2, \beta = \nu \tau^2 / 2). Here, the PDF adjusts to f(x \mid \nu, \tau^2) = \frac{(\nu \tau^2 / 2)^{\nu/2}}{\Gamma(\nu/2)} x^{-\nu/2 - 1} \exp\left( -\frac{\nu \tau^2}{2x} \right), matching the inverse-gamma form with the updated \beta = \nu \tau^2 / 2. This mapping preserves the moment structure, such as the \nu \tau^2 / (\nu - 2) for \nu > 2, derived from the inverse-gamma formula. The equivalence holds through identical substitution into the PDF and identities for normalization. The inverse-gamma parameterization offers greater flexibility for generalizations, as it allows arbitrary and rate values without the constraints implicit in the chi-squared , enabling broader applications in hierarchical models and robustness to prior specifications. In contrast, the inverse-chi-squared represents a subfamily where the rate \beta is tied to the via \beta = \alpha / \nu for the case (or proportionally scaled), reflecting its origin in the of a chi-squared variate. Notation for the inverse-gamma varies across sources, with some adopting a shape-scale form where the scale parameter is the reciprocal of the rate (leading to exp(-1/(\theta x)) with \theta > 0), though the shape-rate convention aligns directly with the above mappings and is common in Bayesian contexts. The inverse-chi-squared distribution establishes a direct connection to the distribution through its role in modeling the variance parameter. Specifically, if the variance \sigma^2 follows a scaled inverse-chi-squared distribution with \nu and \tau^2, then the precision \sigma^{-2} = 1 / \sigma^2 represents the , providing a that scales the dispersion in likelihoods. This parameterization aligns the distribution's tail behavior with the of residuals, ensuring compatibility in likelihood-based updates. A prominent link to the Student-t distribution emerges in Bayesian inference for normal data with unknown mean and variance. When an inverse-chi-squared prior is placed on the variance (or equivalently on precision), the marginal posterior for the mean, obtained by integrating out the variance, follows a Student-t distribution with degrees of freedom updated by the sample size plus prior degrees of freedom, and location and scale parameters incorporating the sample mean and prior information. This result highlights how uncertainty in the normal variance propagates to heavier tails in the mean's posterior, akin to the Student-t's finite-sample adjustment over the normal. Sampling properties further tie the inverse-chi-squared to normals via sums of squares. For n independent observations X_i \sim \mathcal{N}(\mu, \sigma^2) with known \mu, the sum \sum_{i=1}^n (X_i - \mu)^2 / \sigma^2 follows a with n , so the quantity n / \sum_{i=1}^n [(X_i - \mu)/\sigma ]^2 follows a scaled inverse-chi-squared distribution with n and 1. More generally, ratios involving squares of standard normals can generate variance components whose reciprocals align with inverse-chi-squared forms, particularly in hierarchical models where estimates arise from such ratios.

Applications

Bayesian Statistics as Conjugate Prior

In Bayesian statistics, the inverse-chi-squared distribution is employed as a conjugate prior for the variance parameter \sigma^2 in models where the data follow a normal distribution with an unknown mean and variance. Specifically, under a normal-inverse-chi-squared prior p(\mu, \sigma^2) = \mathcal{N}(\mu \mid \mu_0, \sigma^2 / \kappa_0) \times \text{Inv-}\chi^2(\sigma^2 \mid \nu, \tau), assuming independent observations x_1, \dots, x_n \sim \mathcal{N}(\mu, \sigma^2), the marginal posterior for \sigma^2 is also inverse-chi-squared, \sigma^2 \mid \mathbf{x} \sim \text{Inv-}\chi^2(\nu + n, \tau'), where the updated scale parameter is \tau' = (\nu \tau + \text{SS}) / (\nu + n) and SS denotes the sum of squared residuals adjusted for the uncertainty in \mu. This conjugacy arises because the normal likelihood, when marginalized over \mu, interacts multiplicatively with the inverse-chi-squared prior to preserve the distributional family. The posterior update rules are straightforward: the degrees of freedom increase additively by the sample size, \nu_{\text{post}} = \nu + n, reflecting the accumulation of , while the scale update incorporates both the prior scale and the data's , \tau_{\text{post}} = (\nu \tau + \sum_{i=1}^n (x_i - \hat{\mu})^2 + \text{adjustment for [prior](/page/Prior) mean}) / (\nu + n), where the adjustment term accounts for the discrepancy between the mean and the sample . These updates enable without numerical in simple settings. The conjugate form offers key advantages, including a closed-form posterior that facilitates analytical computation of credible intervals and posterior moments, and it is particularly natural for non-informative priors when \nu is large, yielding a distribution concentrated around the data-driven estimate of variance. The use of the inverse-chi-squared gained prominence in Bayesian texts following the 1970s, especially for hierarchical models where variance components require scalable updating rules. This parameterization, often in its scaled form where \tau = \nu s_0^2 with s_0^2 representing a scale estimate for \sigma^2, aligns well with variance interpretations in normal models. As an illustrative example, consider a simple model with known \mu = 0 and observations x_1, \dots, x_n \sim \mathcal{N}(0, \sigma^2). With \sigma^2 \sim \text{Inv-}\chi^2(\nu, \tau), the posterior simplifies to \sigma^2 \mid \mathbf{x} \sim \text{Inv-}\chi^2(\nu + n, \tau + \sum_{i=1}^n x_i^2), directly pooling the pseudo-sum-of-squares \nu \tau with the observed \sum x_i^2. This setup is common in introductory Bayesian analyses of variance, allowing straightforward posterior sampling or moment calculation.

Variance Estimation in Regression Models

In , the scaled inverse-chi-squared distribution serves as a for the error variance \sigma^2, parameterized as \sigma^2 \sim \text{scaled-Inv-}\chi^2(\nu, \nu s^2), where \nu denotes the prior and s^2 is a prior scale estimate reflecting expected variability. This choice arises from the normal likelihood of the regression errors, ensuring the posterior retains the same distributional form for tractable . Given data \mathbf{y} = X \boldsymbol{\beta} + \boldsymbol{\epsilon} with \boldsymbol{\epsilon} \sim N(\mathbf{0}, \sigma^2 I_n), the posterior for \sigma^2 incorporates the (RSS) from the least-squares fit. The updated degrees of freedom become \nu_{\text{post}} = \nu + n - p, where n is the sample size and p is the number of predictors (including the intercept), accounting for the lost in estimating \boldsymbol{\beta}. The posterior scale parameter is \tau_{\text{post}} = \nu s^2 + \text{RSS}, yielding \sigma^2 \mid \mathbf{y}, X \sim \text{scaled-Inv-}\chi^2(\nu_{\text{post}}, \tau_{\text{post}}). This update weights the prior scale against the data-driven RSS, with larger n or smaller RSS pulling the posterior toward lower variance values. Credible intervals for \sigma^2 are derived from the quantiles of the posterior scaled inverse-chi-squared distribution, offering asymmetric bounds that reflect beyond point estimates like the posterior \tau_{\text{post}} / (\nu_{\text{post}} + 2). These intervals are particularly useful for assessing the of predictions in settings, as they integrate the full posterior rather than relying on asymptotic approximations. The inverse-chi-squared posterior also supports model comparison via Bayes factors for variance components, such as in comparing models with different numbers of predictors or hierarchical structures, by evaluating the marginal likelihood after integrating out \sigma^2. This approach quantifies evidence for simpler variance assumptions against more complex ones, favoring models where the posterior adequately explains the data without excessive parameterization. As an example, in simple linear regression y_i = \beta_0 + \beta_1 x_i + \epsilon_i with a noninformative prior \sigma^2 \sim \text{scaled-Inv-}\chi^2(1, 1) and n=30 observations yielding RSS = 50, the posterior is \text{scaled-Inv-}\chi^2(29, 51), from which 95% credible intervals for \sigma^2 can be computed as the 0.025 and 0.975 quantiles, typically spanning values consistent with the data's scatter.

Markov Chain Monte Carlo Methods

The inverse-chi-squared distribution plays a central role in (MCMC) methods, particularly , for involving variance parameters in normal and hierarchical models. As a for the variance \sigma^2 in the normal distribution, it enables the derivation of full conditional posteriors that retain the same distributional form, facilitating direct and efficient sampling without the need for more computationally intensive techniques like Metropolis-Hastings for this component. In the canonical setting of a normal model with unknown mean \mu and variance \sigma^2, Gibbs sampling alternates between drawing from the full conditional posterior of \mu given \sigma^2 and data, which is normal, and the full conditional of \sigma^2 given \mu and data, which follows an inverse-chi-squared distribution. Specifically, if the prior is the normal-inverse-chi-squared p(\mu, \sigma^2) = \mathcal{N}(\mu \mid \mu_0, \sigma^2 / \kappa_0) \times \text{Inv-}\chi^2(\sigma^2 \mid \nu_0, \sigma_0^2), the full conditional for \sigma^2 is \text{Inv-}\chi^2(\sigma^2 \mid \nu_n, \sigma_n^2), where the updated degrees of freedom \nu_n = \nu_0 + n and scale \sigma_n^2 incorporate the data sum of squares and prior information. This iterative process generates samples from the joint posterior, converging to the target distribution under standard MCMC conditions. In more complex hierarchical models, such as those with multiple levels of variance components, the full conditional for a variance \sigma^2 given the and other parameters remains inverse-chi-squared, with updated parameters \nu and scale \tau that aggregate contributions from the likelihood (e.g., sums of squares) and hyperpriors on related parameters like means or other variances. For instance, in a two-level hierarchical model y_{ij} \sim \mathcal{N}(\theta_j, \sigma^2) and \theta_j \sim \mathcal{N}(\mu, \tau^2), the conditional p(\sigma^2 \mid y, \theta, \mu, \tau^2) is inverse-chi-squared with increased by the number of observations and scale updated by the pooled residuals. This structure preserves conjugacy across levels, allowing straightforward Gibbs updates. The direct samplability of these full conditionals enhances MCMC efficiency by avoiding rejection-based methods for variance components, reducing in chains and accelerating , especially in high-dimensional settings. This is particularly advantageous in models where other parameters may require Metropolis-Hastings steps, as the inverse-chi-squared block samples exactly. Such MCMC strategies are commonly applied in Bayesian analysis of variance (ANOVA) models, where group variances follow inverse-chi-squared priors, and in random-effects models for clustered data, enabling on between- and within-group variability through posterior samples of variance ratios or credible intervals. Implementation is streamlined in languages like and JAGS, which natively support the inverse-chi-squared distribution for specifying priors and automatically generate efficient MCMC samplers, including Gibbs-like updates within frameworks in .

Parameter Estimation

Method of Moments

The method of moments estimation for the parameters of the scaled inverse-chi-squared distribution, denoted Inv-χ²(ν, τ), is performed by equating the first two sample moments to the theoretical moments. The theoretical is given by \mu = \frac{\nu \tau}{\nu - 2}, \quad \nu > 2, and the theoretical variance by \sigma^2 = \frac{2 \nu^2 \tau^2}{(\nu - 2)^2 (\nu - 4)}, \quad \nu > 4. Let m denote the sample mean and v the sample variance from a random sample of size n. Setting m = μ yields τ = m (ν - 2)/ν. Substituting into the variance equation gives v = 2 m^2 / (ν - 4), which rearranges to the explicit solution \hat{\nu} = 4 + \frac{2 m^2}{v}. Then, the for the is \hat{\tau} = m \left( \frac{\hat{\nu} - 2}{\hat{\nu}} \right). This solution requires n > 4 to ensure the sample variance v is defined and positive. The resulting estimators are biased for small n, as the method of moments generally produces biased estimators unless adjusted. The method of moments provides a straightforward computational approach by solving a simple , making it useful for quick approximations in preliminary analysis. However, it is typically less efficient than , yielding estimators with higher variance, particularly for distributions with heavy tails like the inverse-chi-squared. For example, given sample data with mean m and variance v, the degrees-of-freedom estimator simplifies to \hat{\nu} \approx 2 \left( \frac{m^2}{v} + 2 \right) exactly, or approximately 2 \left( \frac{m^2}{v} + 1 \right) when \frac{m^2}{v} \gg 1.

Maximum Likelihood Estimation

The likelihood function for an independent and identically distributed sample y_1, \dots, y_n from the scaled inverse-chi-squared distribution with degrees of freedom \nu > 0 and scale parameter \tau > 0 is given by the product of the individual probability density functions: L(\nu, \tau) = \prod_{i=1}^n f(y_i \mid \nu, \tau), where f(y \mid \nu, \tau) = \frac{ (\nu \tau / 2)^{\nu/2} }{ \Gamma(\nu/2) } y^{-(\nu/2 + 1)} \exp\left( -\frac{\nu \tau}{2 y} \right) for y > 0. The corresponding log-likelihood is \ell(\nu, \tau) = n \left[ \frac{\nu}{2} \log\left( \frac{\nu \tau}{2} \right) - \log \Gamma\left( \frac{\nu}{2} \right) \right] - \left( \frac{\nu}{2} + 1 \right) \sum_{i=1}^n \log y_i - \frac{\nu \tau}{2} \sum_{i=1}^n \frac{1}{y_i}. This expression involves sums of \log y_i and $1/y_i, along with exponential and log-gamma terms, but yields no closed-form solution for the maximum likelihood estimator (MLE) of \nu. To obtain the MLEs \hat{\nu} and \hat{\tau}, numerical optimization techniques are required. Conditional on \nu, the MLE for \tau has a closed form: \hat{\tau} = n / \left( \sum_{i=1}^n 1/y_i \right), which facilitates a profile log-likelihood for \nu alone. The profile likelihood for \nu can then be maximized using methods such as Newton-Raphson iteration, where updates rely on the \psi(\nu/2) for the score and . The EM algorithm provides an alternative iterative approach, particularly useful when embedding the estimation within broader models involving latent variables. Starting values for optimization are typically derived from method of moments estimators to ensure convergence. Under standard regularity conditions, the MLEs \hat{\nu} and \hat{\tau} are consistent and asymptotically efficient as the sample size n \to \infty, converging in probability to the true parameters. Approximate standard errors are computed from the inverse of the negative Hessian matrix of the log-likelihood evaluated at the MLE, yielding asymptotic normality: \sqrt{n} (\hat{\theta} - \theta_0) \to \mathcal{N}(0, I(\theta_0)^{-1}), where \theta = (\nu, \tau) and I(\theta_0) is the matrix. Bias in the estimators diminishes with larger n. A key challenge in this optimization arises for small values of \nu, where the likelihood surface can be non-monotonic, potentially leading to multiple local maxima and requiring robust initial values or strategies to identify the global MLE. Such issues are more pronounced in small samples (n < 20) and underscore the importance of sensitivity checks. Implementations of these procedures are available in statistical software via general-purpose optimizers; for instance, the optim() function in or scipy.optimize.minimize in can maximize the log-likelihood with user-supplied objective functions and derivatives.

References

  1. [1]
    [PDF] Conjugate Bayesian analysis of the Gaussian distribution
    Oct 3, 2007 · We will see that the natural conjugate prior for σ2 is the inverse-chi-squared distribution. 5.1 Likelihood. The likelihood can be written in ...
  2. [2]
    Inverse Chi Squared Distribution - Boost
    The inverse chi squared distribution is a continuous probability distribution of the reciprocal of a variable distributed according to the chi squared ...
  3. [3]
    [PDF] Gaussian Linear model: Conjugate Bayes - Stat@Duke
    The Normal-Inverse-Chi-square distribution. Definition. The joint distribution of a random element (W, V ) ∈ Rp × R+ is said to be a normal- inverse-chi ...
  4. [4]
    (Scaled) Inverse Chi-Squared Distribution - R
    If x x x has the chi-squared distribution with ν \nu ν degrees of freedom, then 1 / x 1 / x 1/x has the inverse chi-squared distribution with ν \nu ν degrees of ...
  5. [5]
    [PDF] Fundamentals of Probability, Random Processes and Statistics
    m is not available, is discussed at the end of this section. Definition 4.15 (Inverse Chi-Squared Distribution). The inverse chi-squared distribution is a ...
  6. [6]
    [PDF] A Bayesian Approach to the Linear Model with Unequal Variances
    In constructing a suitable exchangeable distribution, a common inverse chi-squared distribution is assumed for the qji at the first-stage of a two-stage ...Missing: definition | Show results with:definition
  7. [7]
    Process Optimization A Statistical Approach
    ... (scaled inverse chi-squared) distribution is the distribution of σ2. 0v2. 0/χ2 v0 , i.e., it is the inverse of a usual χ2 distribution with v0 degrees of ...<|control11|><|separator|>
  8. [8]
    16.4 Scaled Inverse Chi-Square Distribution - Stan
    Generate a scaled inverse Chi-squared variate with degrees of freedom nu and scale sigma; may only be used in generated quantities block.Missing: definition | Show results with:definition
  9. [9]
    2 - parameter Inverse Gamma Distribution - R
    The Inverse Gamma distribution is right-skewed and either for small values of shape (plus modest mu) or very large values of mu (plus moderate shape > 2 ...
  10. [10]
    [PDF] Conjugate Bayesian analysis of the Gaussian distribution
    Oct 3, 2007 · [GCSR04] A. Gelman, J. Carlin, H. Stern, and D. Rubin. Bayesian data analysis. Chapman and Hall, 2004. 2nd edition. [GH94] D. Geiger and D ...
  11. [11]
    The Chi-Square Distribution - Random Services
    Moments. The mean, variance, moments, and moment generating function of the chi-square distribution can be obtained easily from general results for the gamma ...
  12. [12]
    [PDF] Handbook on probability distributions - Rice Statistics
    We first define the standard generalized. Pareto distribution by the following distribu- tion function. F(x) = (. 1 ... 6.5 Inverse chi-squared distribution .
  13. [13]
    InvChiSq: Inverse chi-squared and scaled chi-squared distributions
    Density, distribution function and random generation for the inverse chi-squared distribution and scaled chi-squared distribution.
  14. [14]
    scipy.stats.invgamma — SciPy v1.16.2 Manual
    Invgamma is a special case of gengamma with c=-1, and it is a different parameterization of the scaled inverse chi-squared distribution.
  15. [15]
    None
    Below is a merged summary of the Inverse Chi-Squared Distribution from "Introduction to Bayesian Statistics" by William M. Bolstad (2007), consolidating all information from the provided segments into a comprehensive response. To maximize detail and clarity, I will use a table in CSV format for key attributes (Definition, Derivation, Properties, etc.) and include additional narrative text for context, historical notes, and URLs where applicable.
  16. [16]
    [PDF] Bayesian Scientific Computing
    Feb 1, 2014 · ➢Inverse Chi Squared Distribution as a prior for σ2. ➢Bayesian Inference for the Univariate Gaussian with Unknown Mean and Precision λ ...
  17. [17]
    [PDF] Conjugate Bayesian analysis of the Gaussian distribution - mimuw
    Oct 3, 2007 · We will see that the natural conjugate prior for σ2 is the inverse-chi-squared distribution. 5.1 Likelihood. The likelihood can be written in ...Missing: original | Show results with:original
  18. [18]
    [PDF] Prior distributions for variance parameters in hierarchical models
    This paper discusses prior distributions for hierarchical variance parameters, including a folded-noncentral-t family, uniform, inverse-gamma, and half-t ...
  19. [19]
    [PDF] Bayesian Data Analysis Third edition (with errors fixed as of 20 ...
    This book is intended to have three roles and to serve three associated audiences: an introductory text on Bayesian inference starting from first principles, a ...Missing: GCSR04 | Show results with:GCSR04
  20. [20]
    normGibbs function - RDocumentation
    ' The prior distribution of the variance is times an inverse chi-squared with k a p p a 0 degrees of freedom. The joint prior is the product g ( v a r ) g ( m ...
  21. [21]
    16.3 Inverse Chi-Square Distribution | Stan Functions Reference
    Generate an inverse Chi-squared variate with degrees of freedom nu; may only be used in generated quantities block.
  22. [22]
  23. [23]
    Inverse Chi Squared Distribution
    ### Parameterization, Mean, and Variance for Inverse Chi-Squared Distribution
  24. [24]
    [PDF] Method-of-Moments Estimation - MIT OpenCourseWare
    The gamma and inverse Gaussian distributions have been suggested as possible probability models. How do we compute the parameters of these probability models.<|control11|><|separator|>
  25. [25]
    [PDF] Estimating an Inverse Gamma distribution - arXiv
    Jul 7, 2016 · In this paper we introduce five different algorithms based on method of moments, maximum like- lihood and full Bayesian estimation for ...<|control11|><|separator|>