Inverse-gamma distribution
The inverse gamma distribution is a two-parameter family of continuous probability distributions supported on the positive real numbers, arising as the distribution of the reciprocal of a gamma-distributed random variable.[1][2] If Y \sim \Gamma(\alpha, \beta) with shape parameter \alpha > 0 and scale parameter \beta > 0, then X = 1/Y follows an inverse gamma distribution, denoted X \sim \text{Inv}\Gamma(\alpha, \beta).[3][1] The probability density function is given byf(x \mid \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{-\alpha-1} \exp\left(-\frac{\beta}{x}\right), \quad x > 0,
where \Gamma(\alpha) is the gamma function.[1][2] This distribution is right-skewed and can take on various shapes depending on the parameter values, with heavier tails for smaller \alpha.[3] Key moments of the inverse gamma distribution include the mean \mathbb{E}[X] = \frac{\beta}{\alpha - 1} for \alpha > 1, and the variance \text{Var}(X) = \frac{\beta^2}{(\alpha - 1)^2 (\alpha - 2)} for \alpha > 2; these do not exist for smaller \alpha, reflecting the distribution's heavy-tailed nature.[1][2] Higher moments follow similarly, with the k-th moment finite only for \alpha > k + 1. The cumulative distribution function lacks a closed form but can be expressed in terms of the incomplete gamma function.[1] Parameterizations vary across contexts, sometimes using a rate parameter instead of scale, but the shape-scale form is standard in many statistical applications.[3] In Bayesian statistics, the inverse gamma distribution is particularly notable as the conjugate prior for the precision (inverse variance) or variance of a normal distribution with known mean, ensuring that the posterior distribution remains inverse gamma after updating with data.[2][4] For instance, if the prior is \sigma^2 \sim \text{Inv}\Gamma(\alpha, \beta), the posterior incorporates the sample size and sum of squared deviations, yielding updated parameters \alpha' = \alpha + n/2 and \beta' = \beta + \sum (x_i - \mu)^2 / 2.[2] Beyond Bayesian inference, it models lifetimes in reliability engineering and appears in queueing theory and financial modeling of volatility processes.[3]
Characterization
Probability density function
The inverse-gamma distribution is a two-parameter family of continuous probability distributions supported on the positive real numbers, with shape parameter \alpha > 0 and scale parameter \beta > 0.[5] Its probability density function is f(x \mid \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{-\alpha - 1} \exp\left( -\frac{\beta}{x} \right) for x > 0, and f(x \mid \alpha, \beta) = 0 otherwise.[5] The shape parameter \alpha determines the form of the distribution and the heaviness of its right tail, where smaller \alpha yields heavier tails, while the scale parameter \beta governs the dispersion and central tendency.[6][7] This distribution arises intuitively from the transformation of a gamma random variable: if Y follows a gamma distribution with shape \alpha and rate \beta (equivalently, scale $1/\beta), then X = 1/Y follows an inverse-gamma distribution with the same parameters, obtained by applying the change-of-variable formula to the gamma density.[6] The density features a heavy right tail, reflecting the behavior near zero of the underlying gamma variable, and reaches its mode at x = \beta / (\alpha + 1).[7]Cumulative distribution function
The cumulative distribution function of the inverse-gamma distribution with shape parameter \alpha > 0 and scale parameter \beta > 0 is F(x; \alpha, \beta) = \frac{\Gamma\left(\alpha, \frac{\beta}{x}\right)}{\Gamma(\alpha)}, \quad x > 0, where \Gamma(s) denotes the gamma function and \Gamma(s, z) = \int_z^\infty t^{s-1} e^{-t} \, dt is the upper incomplete gamma function.[8] This formulation relies on the upper incomplete gamma function, which quantifies the tail probability of the gamma distribution and enables the CDF to capture the cumulative probability over (0, x]. Numerical evaluation of F(x) typically involves series expansions, continued fractions, or other approximations for the incomplete gamma function, as implemented in mathematical software libraries; for instance, these computations achieve relative accuracy exceeding 14 decimal digits in double-precision arithmetic for a wide range of parameters.[8][9] As x \to 0^+, \beta/x \to \infty and \Gamma(\alpha, \beta/x) \to 0, so F(x) \to 0; conversely, as x \to \infty, \beta/x \to 0 and \Gamma(\alpha, \beta/x) \to \Gamma(\alpha), yielding F(x) \to 1.[8] The survival function, or the probability that a random variable exceeds x, is $1 - F(x) = \gamma(\alpha, \beta/x)/\Gamma(\alpha), where \gamma(s, z) = \int_0^z t^{s-1} e^{-t} \, dt is the lower incomplete gamma function.[8]Properties
Moments
The moments of the inverse-gamma distribution are derived from its probability density function via integration, yielding expressions in terms of the gamma function. The nth raw moment is given by \mu_n' = \mathbb{E}[X^n] = \beta^n \frac{\Gamma(\alpha - n)}{\Gamma(\alpha)} for \alpha > n, where \Gamma denotes the gamma function. This formula arises from substituting into the moment-generating integral and recognizing the resulting form as a scaled gamma integral after the change of variables t = \beta / x. For n \geq \alpha, the moment does not exist, reflecting the heavy-tailed nature of the distribution when the shape parameter is small. As \alpha approaches n from above, the moments grow large, indicating increasing variability near the boundary.[10] The first raw moment is the mean, \mathbb{E}[X] = \frac{\beta}{\alpha - 1} provided \alpha > 1; otherwise, the mean is infinite, and the distribution places substantial probability mass near zero, leading to undefined or divergent expected value. The second raw moment is \mathbb{E}[X^2] = \frac{\beta^2}{(\alpha - 1)(\alpha - 2)} for \alpha > 2. The variance follows as \mathrm{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 = \frac{\beta^2}{(\alpha - 1)^2 (\alpha - 2)}, also requiring \alpha > 2 for finiteness. When $1 < \alpha \leq 2, the variance is infinite despite a finite mean, characteristic of distributions with power-law tails. These expressions are standard for the parameterization where the PDF is \frac{\beta^\alpha}{\Gamma(\alpha)} x^{-\alpha-1} e^{-\beta/x} for x > 0.[10] Higher-order standardized moments quantify the asymmetry and tail heaviness. The skewness is \gamma_1 = \frac{\mathbb{E}[(X - \mathbb{E}[X])^3]}{(\mathrm{Var}(X))^{3/2}} = \frac{4 \sqrt{\alpha - 2}}{\alpha - 3} for \alpha > 3; the third central moment diverges for \alpha \leq 3, emphasizing the right-skewed, heavy-tailed behavior. As \alpha increases beyond 3, skewness decreases toward zero, approaching symmetry for large shape parameters. The kurtosis (fourth standardized central moment) is \gamma_2 = \frac{\mathbb{E}[(X - \mathbb{E}[X])^4]}{(\mathrm{Var}(X))^2} = 3 \frac{(\alpha - 2)(\alpha + 5)}{(\alpha - 3)(\alpha - 4)} for \alpha > 4, with the excess kurtosis being \gamma_2 - 3 = 6(5\alpha - 11)/((\alpha - 3)(\alpha - 4)). For $3 < \alpha \leq 4, the fourth moment is infinite, resulting in leptokurtic tails that become mesokurtic (approaching 3) as \alpha grows large. Near the boundaries, such as \alpha \to 4^+, kurtosis tends to infinity, underscoring the distribution's sensitivity to low shape values in applications like variance modeling.[11]Mode, median, and entropy
The mode of the inverse-gamma distribution, defined for shape parameter \alpha > 0 and scale parameter \beta > 0, is given by \frac{\beta}{\alpha + 1}, representing the value that maximizes the probability density function. This mode is unique, as the distribution is unimodal over the positive reals for \alpha > 0. The median of the inverse-gamma distribution lacks a closed-form expression and is typically computed numerically or approximated using the inverse of the regularized incomplete gamma function, since the cumulative distribution function involves the incomplete gamma. For special cases, such as \alpha = 1, tighter bounds on the median can be derived relative to the scale parameter. The differential entropy of the inverse-gamma distribution is h(X) = \alpha + \log(\beta) + \log(\Gamma(\alpha)) - (\alpha + 1) \psi(\alpha), where \psi denotes the digamma function. This expression quantifies the average uncertainty in the distribution and arises from integrating the negative log-density weighted by the probability density function. For \alpha > 1, the inverse-gamma distribution exhibits right-skewness, resulting in the ordering mode < median < mean. The inverse-gamma distribution maximizes differential entropy among distributions on the positive reals with fixed mean and fixed harmonic mean.Characteristic function
The characteristic function of the inverse-gamma distribution is its Fourier transform, enabling analytical investigations into properties such as convolutions and asymptotic behavior. For a random variable X following an inverse-gamma distribution with shape parameter \alpha > 0 and scale parameter \beta > 0, the characteristic function is given by \phi(t) = \mathbb{E}[e^{itX}] = \frac{2 (-i \beta t)^{\alpha/2}}{\Gamma(\alpha)} K_{\alpha}\left(2 \sqrt{-i \beta t}\right) for real t, where K_{\alpha}(\cdot) denotes the modified Bessel function of the second kind. This expression is derived by direct integration using the definition of the characteristic function and the probability density function of the inverse-gamma distribution. Substituting the density into the expectation yields the integral \phi(t) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} \int_{0}^{\infty} x^{-\alpha-1} \exp\left(-\frac{\beta}{x} + itx\right) \, dx. This form matches a known integral representation of the modified Bessel function of the second kind, K_{\alpha}(z) = \frac{1}{2} \left(\frac{z}{2}\right)^{\alpha} \int_{0}^{\infty} u^{-\alpha-1} \exp\left(-u - \frac{z^{2}}{4u}\right) \, du, after appropriate substitution and simplification with z = 2\sqrt{-i\beta t}. The characteristic function facilitates high-level analysis of convolutions for sums or linear combinations of independent inverse-gamma random variables, as the characteristic function of the sum is the product of individual characteristic functions, allowing numerical inversion to obtain the resulting density when closed forms are unavailable.[12] It also supports investigations into limit theorems for heavy-tailed distributions like the inverse-gamma, where the behavior of \phi(t) for small t informs convergence properties. The moments of the distribution relate to the characteristic function through its Taylor series expansion around t = 0, where the coefficients correspond to the moments via successive derivatives: the nth moment is \mathbb{E}[X^n] = (-i)^n \frac{d^n}{dt^n} \phi(t) \big|_{t=0}.Sampling methods
The primary method for generating random variates from the inverse-gamma distribution with shape parameter \alpha > 0 and scale parameter \beta > 0 is the inversion method. This approach exploits the reciprocal relationship with the gamma distribution: first, sample Y from a gamma distribution with shape \alpha and scale $1/\beta, then set X = 1/Y.[13] This transformation ensures that X follows the desired inverse-gamma distribution, as the probability density function of the inverse-gamma is derived from that of the gamma via the reciprocal mapping.[14] In practice, this method is straightforward to implement in statistical software. For instance, in SAS/IML, random variates can be generated using theRAND function for the gamma distribution followed by taking the reciprocal, as shown in the macro %RandIGamma(alpha, beta) = (1 / rand('Gamma', alpha, 1/beta)).[13] Similarly, Python's SciPy library implements invgamma.rvs(a=alpha, scale=beta) by internally sampling from the corresponding gamma and computing the inverse.[15] In R, the rinvgamma function from the invgamma package applies the transformation theorem using base R's gamma functions.[16]
Alternative sampling techniques, such as acceptance-rejection methods, may be used for efficiency in specific parameter regimes, particularly when direct inversion is computationally burdensome. Tailored algorithms, including modifications to rejection sampling, have been developed for cases with small \alpha or extreme \beta, where standard gamma generators rely on acceptance-rejection internally but require adjustments to handle the heavy tails of the inverse-gamma.[17]
Numerical considerations arise mainly for small \alpha or large \beta, where the gamma variate Y can be extremely small, leading to large X = 1/Y values that risk floating-point overflow or underflow in computations. For example, the R invgamma package issues warnings for \alpha \leq 0.01 due to unreliability from precision limits in the reciprocal step.[16] In such scenarios, log-space computations or rescaled parameters can mitigate issues, though the probability of extreme values aligns with the distribution's inherent heavy-tailed nature.[13]
To validate generated samples, compare empirical moments (e.g., mean and variance) to theoretical values \beta/(\alpha-1) and \beta^2 / [(\alpha-1)^2 (\alpha-2)], respectively, for \alpha > 2. Additionally, quantile-quantile (Q-Q) plots provide a visual check: plot sample quantiles against theoretical inverse-gamma quantiles; alignment along the 45-degree line indicates accurate sampling. For a brief example with \alpha=3, \beta=0.5, a Q-Q plot of 10,000 samples should show close adherence to the reference line, confirming fidelity to the distribution.
Related distributions
Derivation from the gamma distribution
The inverse-gamma distribution arises as the distribution of the reciprocal of a gamma-distributed random variable. Specifically, if Y \sim \text{Gamma}(\alpha, \beta') where the gamma distribution is parameterized with shape parameter \alpha > 0 and rate parameter \beta' > 0, then X = 1/Y follows an inverse-gamma distribution with shape \alpha and scale \beta = \beta'.[18][19] To derive the probability density function (PDF) of X, apply the transformation of variables formula, accounting for the Jacobian determinant. The PDF of Y is f_Y(y) = \frac{(\beta')^\alpha}{\Gamma(\alpha)} y^{\alpha-1} e^{-\beta' y}, \quad y > 0. Under the transformation x = 1/y, or y = 1/x, the differential is dy/dx = -1/x^2, so the absolute value of the Jacobian is |dy/dx| = 1/x^2. Substituting yields f_X(x) = f_Y(1/x) \cdot \left| \frac{dy}{dx} \right| = \frac{(\beta')^\alpha}{\Gamma(\alpha)} (1/x)^{\alpha-1} e^{-\beta' (1/x)} \cdot \frac{1}{x^2}, \quad x > 0. Simplifying the expression, f_X(x) = \frac{(\beta')^\alpha}{\Gamma(\alpha)} x^{-\alpha+1} e^{-\beta'/x} \cdot x^{-2} = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{-\alpha-1} e^{-\beta/x}, where \beta = \beta' is the scale parameter for the inverse-gamma distribution. This confirms the standard form of the inverse-gamma PDF.[18][19] The moments of the inverse-gamma distribution preserve certain relations from the original gamma under inversion. For instance, the expected value E[X] = E[1/Y] for Y \sim \text{Gamma}(\alpha, \beta') with \alpha > 1 is given by E[1/Y] = \frac{\beta'}{\alpha - 1}, which aligns with the mean of the inverse-gamma E[X] = \beta / (\alpha - 1) under the scale parameterization \beta = \beta'. This relation holds because the r-th negative moment of the gamma distribution is E[Y^{-r}] = (\beta')^r \Gamma(\alpha - r) / \Gamma(\alpha) for \alpha > r, and for r = 1, it simplifies using the gamma function property \Gamma(\alpha) = (\alpha - 1) \Gamma(\alpha - 1).[6][19] The inverse-gamma distribution was recognized in the statistical literature during the 20th century, particularly as Bayesian methods gained prominence for modeling scale parameters.[20]Connections to other distributions
The inverse-gamma distribution is closely related to the scaled inverse chi-squared distribution, which is a specific reparameterization commonly used in Bayesian statistics for modeling variances. Specifically, an inverse-gamma random variable with shape parameter \alpha = \nu/2 and scale parameter \beta = \nu \sigma^2 / 2 follows the same distribution as \sigma^2 times the inverse of a chi-squared random variable with \nu degrees of freedom.[21] The Lévy distribution arises as a special case of the inverse-gamma distribution when the shape parameter \alpha = 1/2 and the scale parameter \beta = c/2, where c > 0 is the scale parameter of the Lévy distribution centered at zero. This connection highlights the heavy-tailed nature of the inverse-gamma family, as the Lévy distribution is a stable distribution with infinite variance.[22] The generalized inverse Gaussian (GIG) distribution serves as a broader extension of the inverse-gamma distribution within a three-parameter family. In the GIG parameterization with parameters p \in \mathbb{R}, a \geq 0, and b > 0, setting a = 0 and p < 0 yields the inverse-gamma distribution with shape parameter \alpha = -p and scale parameter \beta = b/2, where the probability density function simplifies to the standard inverse-gamma form f(x) \propto x^{- \alpha - 1} \exp(-\beta / x) for x > 0.[23] In hierarchical Bayesian models, the inverse-gamma distribution frequently emerges as a marginal distribution for variance parameters. For instance, when precision parameters follow gamma distributions in a conjugate setup, integrating out intermediate hyperparameters results in an inverse-gamma marginal for the variance components, facilitating tractable posterior inference.[24] Common reparameterizations of the inverse-gamma distribution often involve switching between scale and rate parameters or expressing in terms of mean and shape. The following table summarizes key equivalences, assuming the standard shape-scale form with density f(x; \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{-\alpha-1} \exp(-\beta / x) for x > 0:| Parameterization | Shape (\alpha) | Scale/Rate (\beta or \lambda) | Relation | Source |
|---|---|---|---|---|
| Shape-Scale | \alpha > 0 | Scale \beta > 0 | Standard form | [21] |
| Mean-Shape | \alpha > 0 | Mean \mu = \beta / (\alpha - 1) (\alpha > 1) | \beta = \mu (\alpha - 1); requires \alpha > 1 for finite mean | [24] |