Fact-checked by Grok 2 weeks ago

Normal-inverse-gamma distribution

The Normal-inverse-gamma distribution is a four-parameter continuous that jointly specifies a for the mean conditional on the variance and an for the variance itself, serving as the for the mean and variance of a univariate likelihood when both parameters are unknown. This conjugacy ensures that the posterior distribution after observing data remains in the same family, enabling closed-form Bayesian updates without numerical approximation. The distribution is commonly parameterized by hyperparameters μ₀ (prior mean of the mean), κ₀ > 0 ( precision scaling factor, akin to a virtual sample size), α₀ > 0 ( of the inverse-gamma), and β₀ > 0 ( or parameter of the inverse-gamma). The joint is given by
p(\mu, \sigma^2 \mid \mu_0, \kappa_0, \alpha_0, \beta_0) = \mathcal{N}(\mu \mid \mu_0, \sigma^2 / \kappa_0) \cdot \text{IG}(\sigma^2 \mid \alpha_0, \beta_0),
where \mathcal{N} denotes the normal density and IG the inverse-gamma density. Alternative notations, such as using precision \lambda = 1/\sigma^2 with a on \lambda, are equivalent through reparameterization.
Key properties include the marginal distribution of the mean \mu, which follows a non-standardized with location \mu_0, scale \beta_0 / (\alpha_0 \kappa_0), and degrees of freedom $2\alpha_0; this arises from integrating out the variance. The posterior updates are straightforward: for n i.i.d. observations from \mathcal{N}(\mu, \sigma^2), the posterior hyperparameters become \mu_n = (\kappa_0 \mu_0 + n \bar{x}) / \kappa_n, \kappa_n = \kappa_0 + n, \alpha_n = \alpha_0 + n/2, and \beta_n = \beta_0 + (1/2) \sum_{i=1}^n (x_i - \bar{x})^2 + (\kappa_0 n / (2 \kappa_n)) (\bar{x} - \mu_0)^2, where \bar{x} is the sample mean. Consequently, the posterior predictive distribution for a new observation is a with updated location \mu_n, scale \frac{\beta_n}{\alpha_n} \left(1 + \frac{1}{\kappa_n}\right), and degrees of freedom $2\alpha_n. In , the normal-inverse-gamma prior is foundational for modeling uncertainty in normal data with unknown parameters, particularly in hierarchical models, , and empirical Bayes estimation, where it facilitates tractable and robust handling of vague priors through careful hyperparameter choice. It extends naturally to multivariate settings via the normal-inverse-Wishart distribution, but remains distinct in its univariate form for scalar normal models.

Fundamentals

Definition

The is a four-parameter family of joint continuous probability defined over the \mu and variance \sigma^2 (or equivalently, \tau = 1/\sigma^2) of a univariate , making it a key tool in for modeling uncertainty in these parameters. It arises as the product of a conditional for the given the variance and a marginal inverse-gamma for the variance itself, ensuring analytical tractability in posterior computations. Formally, a random vector (\mu, \sigma^2) follows an NIG distribution with parameters \mu_0 (location), \lambda > 0 (precision scaling for the mean), \alpha > 0 (shape), and \beta > 0 (scale), denoted (\mu, \sigma^2) \sim \text{NIG}(\mu_0, \lambda, \alpha, \beta), if the conditional distribution is \mu \mid \sigma^2 \sim \mathcal{N}\left(\mu_0, \frac{\sigma^2}{\lambda}\right) and the marginal distribution is \sigma^2 \sim \text{IG}(\alpha, \beta), where \text{IG}(\alpha, \beta) denotes the inverse-gamma distribution with shape \alpha and scale \beta. This parameterization reflects the scaling of the variance in the conditional normal by the precision factor \lambda, which controls the strength of belief around \mu_0. The NIG distribution originated in the context of conjugate prior families for Bayesian decision theory, with foundational work by Raiffa and Schlaifer (1961), who developed analytical forms for updating beliefs under normal likelihoods with unknown mean and variance. It was later formalized and popularized in computational Bayesian statistics, assuming prior knowledge of the standard normal distribution \mathcal{N}(\mu, \sigma^2) and the inverse-gamma distribution, which is supported on positive reals and serves as the conjugate prior for the variance of a normal with known mean. The NIG plays a central role as the conjugate prior for normal models, enabling straightforward posterior updates when combined with data from a normal likelihood.

Parameter Interpretation

The normal-inverse-gamma distribution is parameterized by four values: \mu_0, \lambda, \alpha, and \beta, which encode prior beliefs about the mean \mu and variance \sigma^2 of an underlying normal distribution in Bayesian models. The parameter \mu_0 represents the prior mean of \mu, serving as a location estimate or best guess for the central tendency based on expert knowledge or historical data. It acts as the expected value around which the conditional normal distribution for \mu is centered, given \sigma^2. The parameter \lambda > 0 scales the variance of the conditional normal distribution for \mu (specifically, the conditional variance is \sigma^2 / \lambda), thereby quantifying prior confidence in \mu_0. A larger \lambda implies tighter concentration around \mu_0, equivalent to greater prior precision or an effective prior sample size, reflecting stronger belief in the prior mean; conversely, small \lambda (e.g., approaching 0) yields diffuse priors with less influence from \mu_0. This interpretation allows \lambda to mimic the information content of hypothetical prior observations. For the inverse-gamma component on \sigma^2, \alpha > 0 is the , which controls the effective prior sample size or for the variance estimate, with larger \alpha indicating a more informative concentrated around a specific scale. The \beta > 0 sets the scale of \sigma^2, influencing its expected magnitude; for \alpha > 1, the of \sigma^2 is \beta / (\alpha - 1), providing a direct way to specify anticipated variance levels based on domain expertise. Together, \alpha and \beta shape the marginal for \sigma^2, balancing tail heaviness and . In practice, parameter selection draws from substantive knowledge, such as setting \mu_0 to a historical average and \lambda to reflect equivalent prior data points (e.g., \lambda = 1 for modest confidence). For \alpha and \beta, values can be chosen to match desired prior moments, like targeting a specific mean and variance for \sigma^2. Noninformative priors, such as \alpha = 0 and \beta = 0, aim for minimal influence but are improper (integrating to infinity), potentially leading to improper posteriors if data are insufficient; thus, weakly informative alternatives (e.g., small positive \alpha, \beta) are recommended to ensure stability while avoiding undue regularization. Sensitivity analyses to prior choices are essential for robust inference. In multivariate settings, the normal-inverse-gamma extends to the normal-inverse-Wishart distribution, where the inverse-gamma on \sigma^2 generalizes to an inverse-Wishart on the covariance matrix \Sigma, accommodating correlations among dimensions that the univariate form cannot capture.

Characterization

Probability Density Function

The normal-inverse-gamma distribution models the joint distribution of a normal mean \mu and variance \sigma^2 > 0, where \mu \mid \sigma^2 \sim \mathcal{N}(\mu_0, \sigma^2 / \lambda) conditionally and \sigma^2 \sim \text{InvGamma}(\alpha, \beta) marginally, with parameters \mu_0 \in \mathbb{R} (location), \lambda > 0 (precision scale), \alpha > 0 (shape), and \beta > 0 (scale). The joint probability density function is f(\mu, \sigma^2 \mid \mu_0, \lambda, \alpha, \beta) = \frac{\lambda^{1/2}}{\sigma \sqrt{2\pi}} \exp\left\{ -\frac{\lambda (\mu - \mu_0)^2}{2 \sigma^2} \right\} \cdot \frac{\beta^\alpha}{\Gamma(\alpha)} (\sigma^2)^{-\alpha - 1} \exp\left\{ -\frac{\beta}{\sigma^2} \right\}, defined over the support \mu \in \mathbb{R}, \sigma^2 > 0. This form follows directly from multiplying the conditional density (with variance \sigma^2 / \lambda) and the marginal inverse-gamma density (with \alpha and \beta). To confirm normalization, integrate the joint density over the support: first, \int_{-\infty}^{\infty} f(\mu, \sigma^2 \mid \mu_0, \lambda, \alpha, \beta) \, d\mu = \frac{\beta^\alpha}{\Gamma(\alpha)} (\sigma^2)^{-\alpha - 1} \exp\left\{ -\frac{\beta}{\sigma^2} \right\}, since the inner normal integral equals 1; then, \int_0^{\infty} \frac{\beta^\alpha}{\Gamma(\alpha)} (\sigma^2)^{-\alpha - 1} \exp\left\{ -\frac{\beta}{\sigma^2} \right\} \, d\sigma^2 = 1, as this is the normalizing integral of the inverse-gamma density. An alternative parameterization expresses the joint in terms of the mean \mu and precision \tau = 1 / \sigma^2 > 0, where \mu \mid \tau \sim \mathcal{N}(\mu_0, 1 / (\lambda \tau)) and \tau \sim \text{Gamma}(\alpha, \beta) (rate parameterization), yielding f(\mu, \tau \mid \mu_0, \lambda, \alpha, \beta) = \frac{(\lambda \tau)^{1/2}}{\sqrt{2\pi}} \exp\left\{ -\frac{\lambda \tau (\mu - \mu_0)^2}{2} \right\} \cdot \frac{\beta^\alpha}{\Gamma(\alpha)} \tau^{\alpha - 1} \exp\left\{ -\beta \tau \right\}. This follows from the \tau = 1 / \sigma^2, with Jacobian |\frac{d\sigma^2}{d\tau}| = 1 / \tau^2, transforming the inverse-gamma to gamma. A related reparameterization for the variance uses the scale-inverse-chi-squared distribution, Inv-\chi^2(\nu, s^2), which is inverse-gamma with \alpha = \nu / 2 and \beta = \nu s^2 / 2; this form is common in modern software implementations such as Stan for its interpretability in terms of \nu and s^2.

Cumulative Distribution Function

The (CDF) of the normal-inverse-gamma distribution, which jointly models a parameter \mu and variance parameter \sigma^2, is defined as F(x, s) = \int_{-\infty}^{x} \int_{0}^{s} f(u, v) \, dv \, du, where f(u, v) denotes the joint of the distribution, supported on \mu \in (-\infty, \infty) and \sigma^2 \in (0, \infty). This represents the probability P(\mu \leq x, \sigma^2 \leq s). The joint PDF f(u, v) takes the form of a for \mu conditional on \sigma^2 = v, multiplied by an inverse-gamma density for v, with hyperparameters typically denoted as location m, precision scalar k > 0, shape \nu/2 > 0, and s^2/2 > 0: f(u, v) = \frac{(k/ (2\pi v))^{1/2} \exp\left\{ -\frac{k}{2v} (u - m)^2 \right\} \cdot (s^2/2)^{\nu/2} v^{-(\nu/2 + 1)} \exp\left( - \frac{s^2}{2v} \right)}{\Gamma(\nu/2)}. Unlike the joint PDF, the normal-inverse-gamma CDF lacks a in terms of elementary functions or standard , due to the complexity of integrating the product of the conditional normal and marginal inverse-gamma components over the rectangular region. Evaluation therefore requires numerical approximation. Common computational approaches include methods for deterministic or Monte Carlo simulation, often via (MCMC) sampling from the joint distribution to estimate the enclosed probability mass empirically. Such methods are implemented in statistical libraries, where the CDF is computed by integrating the posterior parameters numerically. For instance, software packages provide functions to evaluate the CDF at specified points (x, s) by leveraging the updated hyperparameters from updates. In special cases, the joint CDF simplifies by conditioning or marginalizing. Fixing \sigma^2 = s, the conditional distribution of \mu \mid \sigma^2 = s is normal with mean m and variance s/k, so the conditional CDF reduces to the standard normal CDF \Phi\left( \sqrt{k/s} (x - m) \right), scaled appropriately. Fixing \mu = x, the conditional distribution of \sigma^2 given \mu = x is inverse-gamma with shape (\nu + 1)/2 and scale (s^2 + k(x - m)^2)/2, and its CDF is F(s) = \frac{\gamma\left( \frac{\nu + 1}{2}, \frac{s^2 + k(x - m)^2}{2s} \right)}{\Gamma\left( \frac{\nu + 1}{2} \right)}, where \gamma(\cdot, \cdot) is the lower . This joint CDF plays a role in for computing multidimensional tail probabilities, such as $1 - F(x, s), which inform tests on both and parameters simultaneously, as seen in confirmatory designs incorporating historical data.

Properties

Marginal and Conditional Distributions

The normal-inverse-gamma distribution, denoted as NIG(μ₀, λ, α, β), specifies a joint distribution over the μ and variance σ², where the conditional distribution of μ given σ² is , μ | σ² ∼ N(μ₀, σ² / λ), and the of σ² is inverse-gamma, σ² ∼ IG(α, β). The of μ is obtained by integrating the joint over σ². The joint is p(μ, σ²) ∝ (σ²)^{-1/2} \left( -\frac{λ (μ - μ₀)^2}{2 σ²} \right) (σ²)^{-α-1} \left( -\frac{β}{σ²} \right), where the constant of proportionality includes the normalization for the normal and inverse-gamma components. Integrating with respect to σ² yields a proportional to \left[ β + \frac{λ (μ - μ₀)^2}{2} \right]^{-(α + 1/2)}, which recognizes the of a non-central Student-t with μ₀, 2α, and \sqrt{\frac{β}{α λ}}. Thus, μ ∼ t_{2α}\left(μ₀, \sqrt{\frac{β}{α λ}}\right). This result follows from in the exponent and recognizing the as the of an with updated shape α + 1/2 and rate β + \frac{λ (μ - μ₀)^2}{2}. The of σ² remains unchanged from its component in the specification, σ² ∼ (α, β), as the conditional μ | σ² does not affect the marginal upon over μ, which simply scales by the independent of σ²'s parameters. The conditional distribution μ | σ² is directly given by the hierarchical structure as μ | σ² ∼ N(μ₀, σ² / λ), representing a centered at the prior μ₀ with λ / σ². The conditional distribution σ² | μ is derived from the joint by p(σ² | μ) ∝ p(μ | σ²) p(σ²), which simplifies to an inverse-gamma form: σ² | μ ∼ IG\left(α + \frac{1}{2}, β + \frac{λ (μ - μ₀)^2}{2}\right). This arises from combining the exponents in the joint density, yielding (σ²)^{-(α + 3/2)} exp\left( -\frac{β + \frac{λ (μ - μ₀)^2}{2}}{σ²} \right) up to normalization, and recognizing the standard inverse-gamma kernel. Equivalently, this can be expressed as a scaled inverse-chi-squared distribution with degrees of freedom 2α + 1 and scale parameter adjusted by the quadratic term \frac{2\left(β + \frac{λ (μ - μ₀)^2}{2}\right)}{2α + 1}. The non-standard nature stems from the half-integer shift in the shape parameter when α is integer-valued.

Moments and Arithmetic Operations

The normal-inverse-gamma distribution, parameterized as (\mu, \sigma^2) \sim \text{NIG}(\mu_0, \lambda, \alpha, \beta), exhibits the following key moments. The expected value of the mean parameter \mu is \mathbb{E}[\mu] = \mu_0, provided \alpha > 0. The variance of \mu is \text{Var}(\mu) = \frac{\beta}{\lambda (\alpha - 1)} for \alpha > 1. The expected value of the variance parameter \sigma^2 is \mathbb{E}[\sigma^2] = \frac{\beta}{\alpha - 1} for \alpha > 1. Higher moments of \sigma^2 follow from its marginal inverse-gamma distribution and can be expressed using the gamma function: for integer k > 0 with \alpha > k, \mathbb{E}[(\sigma^2)^k] = \frac{\beta^k \Gamma(\alpha - k)}{\Gamma(\alpha)}. The distribution is a member of the exponential family, with natural parameters derived from \alpha, \beta, \mu_0, and \lambda, and sufficient statistics including \mu, \mu^2 / \sigma^2, -\log \sigma^2, and -1 / \sigma^2. This structure facilitates analytical computations for certain properties. The information entropy of the normal-inverse-gamma distribution, which quantifies its uncertainty, is given by H = \frac{1}{2} + \log\left(\sqrt{\frac{2\pi \beta^{3/2}}{\Gamma(\alpha)}}\right) - \log\left(\frac{\lambda}{2}\right) + \alpha - \left(\alpha + \frac{3}{2}\right) \psi(\alpha), where \psi denotes the digamma function; this expression involves logarithmic and digamma terms reflecting the parameters. A for the Kullback-Leibler divergence between two normal-inverse-gamma distributions exists, computed as the expectation under the first distribution of the log-ratio of their densities; it incorporates functions and logarithmic terms on the parameters, analogous to the form for the related . Regarding arithmetic operations, the distribution of the sum of independent normal-inverse-gamma variates—corresponding to the joint prior on means and variances for summed normal random variables—is not a normal-inverse-gamma in closed form, requiring convolutions or numerical methods for exact evaluation; approximations such as moment matching are commonly employed in Bayesian analyses involving multiple components. For scaling, consider a transformation to (k\mu, k^2 \sigma^2) for k > 0, which arises in rescaling normal variates with the prior. This yields another normal-inverse-gamma distribution with adjusted parameters: location k \mu_0, precision scale \lambda, shape \alpha, and scale k^2 \beta. The scaling of the inverse-gamma marginal for \sigma^2 by the factor k^2 directly supports this adjustment.

Inference and Estimation

Conjugacy and Posterior Updates

The is a for the parameters of a likelihood when both the \mu and variance \sigma^2 are unknown. For and identically distributed observations x_1, \dots, x_n \iid \mathcal{N}(\mu, \sigma^2), if the is (\mu, \sigma^2) \sim \mathrm{NIG}(\mu_0, \kappa, \alpha, \beta), then the posterior distribution is also : (\mu, \sigma^2 \mid \mathbf{x}) \sim \mathrm{NIG}(\mu_n, \kappa_n, \alpha_n, \beta_n). This conjugacy ensures that posterior remains analytically tractable within the same parametric family. The posterior parameters are updated as follows: \begin{align*} \kappa_n &= \kappa + n, \\ \mu_n &= \frac{\kappa \mu_0 + n \bar{x}}{\kappa_n}, \\ \alpha_n &= \alpha + \frac{n}{2}, \\ \beta_n &= \beta + \frac{n s^2}{2} + \frac{\kappa n (\bar{x} - \mu_0)^2}{2 \kappa_n}, \end{align*} where \bar{x} = n^{-1} \sum_{i=1}^n x_i is the sample mean and s^2 = n^{-1} \sum_{i=1}^n (x_i - \bar{x})^2 is the sample variance. These updates reflect the incorporation of data evidence: \kappa_n accumulates effective sample size, \mu_n weights the prior mean against the sample mean, \alpha_n increases shape based on observation count, and \beta_n augments the scale with data dispersion and prior-data discrepancy. This conjugacy arises from , where the posterior kernel is the product of the likelihood and prior densities. The normal likelihood expands to p(\mathbf{x} \mid \mu, \sigma^2) \propto (\sigma^2)^{-n/2} \exp\left( -\frac{1}{2\sigma^2} \left[ \sum_{i=1}^n (x_i - \bar{x})^2 + n (\bar{x} - \mu)^2 \right] \right), and multiplying by the normal-inverse-gamma prior—which has a kernel involving \exp\left( -\frac{\kappa (\mu - \mu_0)^2}{2 \sigma^2} - \frac{\beta}{\sigma^2} \right) (\sigma^2)^{-(\alpha + 1)}—yields terms that complete the square in \mu and match the inverse-gamma form for \sigma^2, confirming the posterior family. The update rules support sequential Bayesian inference, where parameters evolve incrementally with each new observation. Starting from the prior (\mu_0, \kappa, \alpha, \beta), adding the first datum updates to (\mu_1, \kappa + 1, \alpha + 1/2, \beta_1), and subsequent points apply the same formulas with n incremented by 1, enabling efficient online learning without refitting the entire dataset. The choice of prior parameters \kappa and \alpha can enhance robustness to outliers in the data. A larger \kappa assigns greater weight to the prior mean \mu_0 in \mu_n, mitigating the impact of aberrant observations on the posterior mean. Meanwhile, \alpha determines the degrees of freedom $2\alpha_n in the posterior predictive distribution, a Student's t-distribution; smaller initial \alpha yields heavier tails post-update, providing greater downweighting of outliers compared to the lighter tails of a normal predictive.

Maximum Likelihood Estimation

The maximum likelihood estimation (MLE) for the parameters of the normal-inverse-gamma (NIG) distribution, denoted as \mu_0, \kappa, \alpha, and \beta, proceeds from a sample of n independent and identically distributed observations (\mu_i, \sigma_i^2)_{i=1}^n drawn from the joint NIG distribution. The likelihood function is given by the product of the individual joint densities: L(\mu_0, \kappa, \alpha, \beta \mid \{(\mu_i, \sigma_i^2)\}) = \prod_{i=1}^n \mathcal{N}(\mu_i \mid \mu_0, \sigma_i^2 / \kappa) \cdot \text{IG}(\sigma_i^2 \mid \alpha, \beta), where \mathcal{N}(\cdot \mid m, v) is the normal density with mean m and variance v, and \text{IG}(\cdot \mid a, b) is the inverse-gamma density with shape a and scale b. To obtain the MLE (\hat{\mu}_0, \hat{\kappa}, \hat{\alpha}, \hat{\beta}), one maximizes the log-likelihood \ell(\mu_0, \kappa, \alpha, \beta) = \sum_{i=1}^n \log \mathcal{N}(\mu_i \mid \mu_0, \sigma_i^2 / \kappa) + \sum_{i=1}^n \log \text{IG}(\sigma_i^2 \mid \alpha, \beta). This involves solving the score equations obtained by setting the partial derivatives \partial \ell / \partial \theta = 0 for each parameter \theta \in \{\mu_0, \kappa, \alpha, \beta\} to zero. However, due to the coupling between parameters—particularly through the conditional variance term \sigma_i^2 / \kappa and the inverse-gamma components—no closed-form solutions exist for the MLE. Numerical optimization techniques are required to solve these equations, such as gradient-based methods. Under standard regularity conditions—such as the parameter space being an open subset of \mathbb{R}^4 with positive \kappa, \alpha, \beta, the existence of finite moments, and identifiability—the MLE is consistent (i.e., \hat{\theta} \to_p \theta_0 as n \to \infty) and asymptotically normal (\sqrt{n} (\hat{\theta} - \theta_0) \to_d \mathcal{N}(0, \mathcal{I}(\theta_0)^{-1}), where \mathcal{I}(\theta_0) is the Fisher information matrix). These properties follow from general MLE theory for regular parametric models. In relation to in the normal model, the MLE for fixed \mu and \sigma^2 from observed data y_i \sim \mathcal{N}(\mu, \sigma^2) simplifies to the sample \bar{y} and biased sample variance s^2 = \sum (y_i - \bar{y})^2 / n, respectively; the NIG distribution extends this framework as a joint on (\mu, \sigma^2) in Bayesian settings, where frequentist MLE of its hyperparameters contrasts with simpler closed-form marginal updates.

Computation and Relations

Sampling Methods

Generating random variates from the distribution is crucial for simulations in , particularly when the NIG serves as a for the and variance of a likelihood. The most efficient and straightforward method is conditional sampling, which leverages the joint structure of the distribution. First, sample the variance \sigma^2 from an \text{IG}(\alpha, \beta). Then, conditional on \sigma^2, sample the \mu from a \mathcal{N}(\mu_0, \sigma^2 / \lambda). This approach is computationally efficient because both the and distributions have well-established random number generators in standard statistical software. An alternative is the marginal-conditional approach, which begins by sampling \mu from its , a with 2\alpha , location \mu_0, and \sqrt{\beta / (\alpha \lambda)}. Then, conditional on \mu, sample \sigma^2 from an adjusted \text{IG}(\alpha + 1/2, \beta + \lambda (\mu - \mu_0)^2 / 2). This method is useful when direct sampling from the t-distribution is preferred or when verifying properties of the marginal. For more complex scenarios, such as hierarchical models where the NIG parameters are themselves random, or (MCMC) methods may be employed. In particular, the conditional distributions enable , where iterates alternate between sampling \sigma^2 \mid \mu from the inverse-gamma and \mu \mid \sigma^2 from , facilitating exploration of the joint posterior in Bayesian hierarchical models. Software implementations simplify these methods. In Python, the SciPy library provides the normal_inverse_gamma.rvs function, which generates joint samples (\mu, \sigma^2) using the conditional approach internally.
python
from scipy.stats import [norm](/page/Norm), [invgamma](/page/Invgamma)
import [numpy](/page/NumPy) as np

def sample_nig(mu0, [lambda_](/page/Lambda), alpha, beta, size=1):
    sigma2 = invgamma.rvs(a=alpha, scale=beta, size=size)
    mu = [norm](/page/Norm).rvs(loc=mu0, scale=np.sqrt(sigma2 / [lambda_](/page/Lambda)), size=size)
    return mu, sigma2

# Example: 1000 samples
mu_samples, sigma2_samples = sample_nig(mu0=0, lambda_=1, alpha=2, beta=3, size=1000)
In R, no built-in function exists for the joint NIG, but the conditional method can be implemented using base functions or the extraDistr package for inverse-gamma sampling.
r
library(extraDistr)

sample_nig <- function(mu0, lambda, alpha, beta, n = 1) {
  sigma2 <- rinvgamma(n, alpha, beta)
  mu <- rnorm(n, mu0, sqrt(sigma2 / lambda))
  return(list(mu = mu, sigma2 = sigma2))
}

samples <- sample_nig(mu0 = 0, lambda = 1, alpha = 2, beta = 3, n = 1000)
To validate generated samples, quantile-quantile (QQ) plots can compare the empirical quantiles of the sampled \mu against the theoretical quantiles of the marginal t-distribution, and similarly for \sigma^2 against the inverse-gamma, ensuring the sampler accurately reproduces the target distribution. The serves as the multivariate extension of the , providing a for the mean vector and of a in Bayesian models for multidimensional data. This extension replaces the univariate inverse-gamma prior on the variance with an inverse-Wishart prior on the , enabling inference in settings such as or where observations follow a likelihood. The for the variance parameter in the normal-inverse-gamma distribution is an , which arises naturally when integrating out the mean parameter and is commonly used as a for the variance of a with known mean. This marginal property facilitates separate inference on the variance component in hierarchical models, such as those in time series analysis or with unknown scale. In hierarchical Bayesian models, integrating out the parameters of a normal-inverse-gamma over the and variance of a yields a Student-t process as the marginal predictive distribution, offering robustness to outliers compared to the . This connection is particularly useful in spatial statistics and , where the heavier tails of the Student-t process model heteroscedasticity or non-Gaussian noise in latent function estimation. The provides an alternative parameterization to the normal-inverse-gamma, employing a on the (inverse variance) rather than the variance itself, while maintaining conjugacy for the mean and of a likelihood. This formulation is equivalent under reparameterization and is often preferred in computational implementations for its direct handling of parameters in algorithms like variational Bayes. Historically, the --gamma distribution builds on earlier conjugate priors for normal parameters, including Jeffreys' non-informative prior, which for the mean and variance of a univariate yields a form proportional to the inverse of the standard deviation and serves as a limiting case of the normal-inverse-gamma family. This link underscores its role in objective , tracing back to foundational work on invariance-based priors in the mid-20th century. Post-2010 developments have highlighted the normal-inverse-gamma's utility in variational inference approximations for scalable Bayesian methods, such as in high-dimensional and state-space models, where mean-field approximations leverage its conjugacy to enable efficient posterior computations on large datasets. These applications facilitate in big data contexts, including variable selection and dynamic modeling, by approximating intractable posteriors with tractable normal-inverse-gamma forms.

References

  1. [1]
    [PDF] Conjugate Bayesian analysis of the Gaussian distribution
    Oct 3, 2007 · This can be achieved by setting ν0 = 0 instead of ν0 = −1. 6 Normal-inverse-Gamma (NIG) prior ... Gelman, J. Carlin, H. Stern, and D. Rubin ...
  2. [2]
    [PDF] Chapter 9 The exponential family: Conjugate priors - People @EECS
    9.0.4 Univariate Gaussian distribution and normal-inverse-gamma priors. As a ... Such a procedure generally yields a well-defined prior, but that prior is not ...
  3. [3]
    Misinformation in the conjugate prior for the linear model with ...
    The conjugate normal inverse-gamma prior is. µ|σ2. ∼ N(µ0,σ2/n0) σ2. ∼ IG(a ... Bayesian Data Analysis. Boca. Raton: Chapman & Hall CRC. 375, 381. George ...<|control11|><|separator|>
  4. [4]
    [PDF] Bayesian Data Analysis Third edition (with errors fixed as of 20 ...
    This book is intended to have three roles and to serve three associated audiences: an introductory text on Bayesian inference starting from first principles, a ...
  5. [5]
    [PDF] The Multivariate Distributions: Normal and inverse Wishart
    110). 14. Page 15. inverse Wishart distribution. ▷ The inverse Wishart distribution is the multivariate version of the Gamma distribution. ▷ The full ...
  6. [6]
    [PDF] The Conjugate Prior for the Normal Distribution 1 Fixed variance (σ2 ...
    Feb 8, 2010 · 2 Random variance (σ2), fixed mean (µ) 2.1 Posterior. Assuming µ is fixed, then the conjugate prior for σ2 is an inverse Gamma distribution: z ...
  7. [7]
    Normal distribution — CPrior 0.4.0 documentation
    Bayesian model with a normal likelihood and a normal-inverse-gamma prior distribution. ... cdf – Cumulative distribution function evaluated at (x, sig2).
  8. [8]
    Inverse-Gamma Distribution — statslib documentation
    Cumulative Distribution Function¶. The cumulative distribution function of the inverse-Gamma distribution: F ( x ; α , β ) = ∫ 0 x f ( z ; α , β ) d z = 1 ...
  9. [9]
    [PDF] Student-t as a mixture of normals
    If X is normal with mean 0 and variance W, and W follows inverse gamma, then the marginal distribution of X is Student-t with ν degrees of freedom.
  10. [10]
    [PDF] The Normal Exponential Family with Normal-Inverse-Gamma Prior
    r. Relative precision of µ versus data. The precision of µ is rρ. ν. Degrees of freedom of precision of ρ. m Mean of µ is m. s. Mean of ρ is ν/s.
  11. [11]
    [PDF] Inverse Gamma Distribution
    Oct 3, 2008 · These notes write up some basic facts regarding the inverse gamma distribution, also called the inverted gamma distribution. In a sense.Missing: joint σ²<|control11|><|separator|>
  12. [12]
    None
    ### Summary of Entropy Formula for Normal-Inverse-Gamma Distribution
  13. [13]
    Proof: Kullback-Leibler divergence for the normal-gamma distribution
    Dec 6, 2019 · In other words, the KL divergence between two normal-gamma distributions over x and y is equal to the sum of a multivariate normal KL ...
  14. [14]
    How is the sum of normally-scaled inverse gamma variates ...
    Feb 1, 2012 · I'd like to compute the posterior distribution for the sums (μ1+μ2,V1+V2) which govern the distribution of Y1+Y2, given a sample of (Y1,Y2).The purpose of scaling the normal variance in NIG-distributionIs there an analytic distribution for the sum of random variables ...More results from stats.stackexchange.comMissing: properties summation
  15. [15]
    InverseGammaDistribution - Wolfram Language Documentation
    InverseGammaDistribution[α,β]. represents an inverse gamma distribution with shape parameter α and scale parameter β. InverseGammaDistribution[α,β,γ,μ].Missing: formal | Show results with:formal
  16. [16]
    Proof: Conjugate prior distribution for the univariate Gaussian
    Mar 3, 2021 · A conjugate prior is a prior distribution that, when combined with the likelihood function, leads to a posterior distribution that belongs to the same family ...
  17. [17]
    [PDF] Estimating an Inverse Gamma distribution - arXiv
    Jul 7, 2016 · In this paper we introduce five different algorithms based on method of moments, maximum like- lihood and full Bayesian estimation for ...
  18. [18]
    [PDF] Trustworthy Multimodal Regression with Mixture of Normal-inverse ...
    Normal Inverse-Gamma (NIG) distribution with parameters τ = (δ, γ, α, β) can be considered as a higher-order conjugate prior of the Gaussian distribution ...Missing: moments | Show results with:moments
  19. [19]
    scipy.stats.normal_inverse_gamma — SciPy v1.16.2 Manual
    Normal-inverse-gamma distribution. The normal-inverse-gamma distribution is the conjugate prior of a normal distribution with unknown mean and variance.
  20. [20]
    [PDF] Thompson Sampling for Linear Bandit Problems with Normal ... - arXiv
    Mar 6, 2023 · We consider Thompson sampling for linear bandit problems with finitely many independent arms, where rewards are sampled from normal ...
  21. [21]
    [PDF] Fast Gibbs sampling for the local and global trend Bayesian ... - arXiv
    Jun 29, 2024 · a normal-inverse-gamma mixture [11]. That is, if a random variable follows a Student t-distribution with degree-of-freedom ν, location µ and ...<|control11|><|separator|>
  22. [22]
    [PDF] Stat 591 Notes – Gibbs sampler examples Ryan Martin (rgmartin ...
    Nov 11, 2013 · Example 2: Normal–inverse gamma sampling. Consider a distribution known as the normal–inverse gamma,1 with parameters m ∈. (−∞,∞), r > 0 ...
  23. [23]
    Sampling from the normal-gamma distribution in R - Cross Validated
    Mar 15, 2013 · Does anyone know of a way to sample from the normal-gamma bivariate distribution or the normal-inverse-gamma bivariate distribution in R?marginal distribution of normal $\mu$ with inverse gamma prior on ...Testing $A\mu + b = 0$ for a normal distribution - Cross ValidatedMore results from stats.stackexchange.comMissing: mu_0
  24. [24]
    [PDF] extraDistr: Additional Univariate and Multivariate Distributions
    Description. Density, distribution function and random generation for the inverse-gamma distribution. Usage dinvgamma(x, alpha, beta = 1, log = FALSE) pinvgamma ...
  25. [25]
    [PDF] Wishart Distributions and Inverse-Wishart Sampling
    Wishart distribution is as a conjugate prior for multivariate normal sampling. This leads to a d-dimensional analog of the inverse-gamma-normal conjugate.
  26. [26]
    [PDF] The Wishart and Inverse Wishart Distributions
    May 25, 2012 · The Inverse-Wishart distribution is frequently used as the prior on the variance/covariance matrix parameter (Σ) of a multivariate normal ...
  27. [27]
    [PDF] A Compendium of Conjugate Priors - Applied Mathematics Consulting
    The Gamma-Normal distribution is the most common conjugate prior for the simultaneous inference of mean and precision parameters of a Normal process. We ...
  28. [28]
    [PDF] Chapter 8: Sampling distributions of estimators - Stat@Duke
    Nov 1, 2012 · In Chapter 7.3 we saw: If µ is known, the Inverse-Gamma distribution is a conjugate prior for σ2. Example 7.3.15: If the prior is σ2 ∼ IG ...
  29. [29]
    Student-t Processes as Alternatives to Gaussian Processes - arXiv
    Feb 18, 2014 · We show surprising equivalences between different hierarchical Gaussian process models leading to Student-t processes, and derive a new ...
  30. [30]
    [PDF] Student-t Processes as Alternatives to Gaussian Processes
    We show surprising equivalences between dif- ferent hierarchical Gaussian process models leading to Student-t processes, and derive a new sampling scheme for ...
  31. [31]
    Gamma and Inverse Gamma Distributions - SAS Help Center
    Sep 29, 2025 · The preceding statements specify four different gamma and inverse gamma distributions with various scale and inverse scale parameter values.
  32. [32]
    [PDF] Jeffreys priors 1 Priors for the multivariate Gaussian - People @EECS
    Feb 10, 2010 · We will consider three cases of conjugate priors: the case when the covariance is fixed, the case when the mean is fixed and the general case.
  33. [33]
    [PDF] Lecture 6. Prior distributions
    Normal with unknown mean and variance: Jeffreys rule applied directly gives pJ(μ, σ2) ∝ 1/σ3, and leads to result shown earlier in which limiting dependent ...
  34. [34]
    [PDF] Derivation of a Variational-Bayes Linear Regression Algorithm using ...
    Jun 12, 2017 · Bayes linear regression algorithm using a normal inverse-gamma generative model, ... the Kullback-Leibler divergence KL(q||p) between q(z) and p(z ...
  35. [35]
    [PDF] Scalable Bayesian dynamic covariance modeling with variational ...
    Nov 10, 2017 · We implement gradient-based variational inference routines for Wishart and in- verse Wishart processes, which we apply as Bayesian models ...Missing: post- | Show results with:post-