Inverse Gaussian distribution
The Inverse Gaussian distribution, also known as the Wald distribution, is a two-parameter continuous probability distribution supported on the positive real line (0, \infty), with probability density functionf(x; \mu, \lambda) = \sqrt{\frac{\lambda}{2\pi x^3}} \exp\left( -\frac{\lambda (x - \mu)^2}{2 \mu^2 x} \right),
where \mu > 0 is the mean parameter and \lambda > 0 is a shape parameter that controls the distribution's precision or skewness.[1] Its mean is \mu, variance is \mu^3 / \lambda, skewness is $3 \sqrt{\mu / \lambda}, and excess kurtosis is $15 \mu / \lambda.[2] Originally derived in 1915 by Erwin Schrödinger as the distribution of the first passage time for a Brownian motion with positive drift to reach a fixed positive level, the distribution was independently obtained around the same time by Marian Smoluchowski in the context of particle diffusion.[3] Maurice Tweedie formalized its statistical properties and proposed the name "inverse Gaussian" in the 1940s and 1950s, highlighting its inverse relationship to the Gaussian distribution via cumulant generating functions and its role as a member of the exponential dispersion family, specifically the Tweedie distributions with power parameter p = 3. This naming reflects the reciprocal connection between the time to cover a unit distance in drifted Brownian motion and the distance covered in unit time.[4] Key properties include its positive skewness, unimodal shape, and closure under certain transformations, such as the reciprocal of an inverse Gaussian random variable following a scaled inverse Gaussian distribution. It belongs to the exponential family, facilitating its use in generalized linear models, and admits a variance-mean mixture representation with a normal distribution mixed over a gamma distribution, which underlies extensions like the normal inverse Gaussian distribution for heavier-tailed data.[5] Applications span reliability engineering, where it models lifetimes of mechanical systems subject to wear; finance, as the Wald distribution in sequential probability ratio tests for hypothesis testing; insurance, for claim size modeling with positive skew; and physics, including pollen particle motion in fluids and wind speed distributions in energy forecasting.[3][2] Its tractable moments and sampling methods, such as rejection sampling or the algorithm of Michael, Schucany, and Haas, make it computationally appealing for simulation and inference.[6][7]
Fundamentals
Definition and Parameters
The inverse Gaussian distribution is a two-parameter family of continuous probability distributions defined on the positive real numbers, commonly applied to model positively skewed data, such as first passage times in stochastic processes like Brownian motion with positive drift. It arises naturally in contexts involving the reciprocal of a Gaussian random variable under certain transformations, hence its name.[8] The distribution is parameterized by two positive real numbers: \mu > 0, which serves as a location-like parameter representing the mean, and \lambda > 0, a shape parameter that influences the dispersion and tail behavior. The support of the distribution is x > 0. The standard probability density function is given by f(x; \mu, \lambda) = \sqrt{\frac{\lambda}{2\pi x^3}} \exp\left( -\frac{\lambda (x - \mu)^2}{2 \mu^2 x} \right), for x > 0, as introduced by Tweedie.[8] The parameter \mu directly equals the mean of the distribution, E[X] = [\mu](/page/MU), while the variance is \mathrm{Var}(X) = \mu^3 / \lambda, indicating that larger values of \lambda reduce the relative spread. In the limit of large \lambda, the distribution approximates a normal distribution centered at \mu with mode near \mu, and \lambda governs the heaviness of the right tail, with smaller \lambda leading to greater skewness.Probability Density Function
The probability density function (PDF) of the inverse Gaussian distribution arises as the distribution of the first hitting time \tau_a = \inf\{t > 0 : X(t) \geq a\} for a Brownian motion with positive drift X(t) = \nu t + \sigma W(t), where \nu > 0 is the drift, \sigma > 0 is the diffusion coefficient, W(t) is standard Brownian motion, and a > 0 is the barrier level. Using the reflection principle for Brownian paths, the PDF of \tau_a is derived as f(t) = \frac{a}{\sigma \sqrt{2\pi t^3}} \exp\left( -\frac{(a - \nu t)^2}{2\sigma^2 t} \right), \quad t > 0. This form reflects the Gaussian nature of Brownian increments combined with the t^{-3/2} scaling from the hitting time geometry.[9] Reparameterizing via \mu = a / \nu (mean) and \lambda = a^2 / \sigma^2 (shape) yields the standard PDF f(x; \mu, \lambda) = \sqrt{\frac{\lambda}{2\pi x^3}} \exp\left( -\frac{\lambda (x - \mu)^2}{2 \mu^2 x} \right), \quad x > 0, with \mu > 0 and \lambda > 0. This reparameterization highlights the inverse relationship to Gaussian processes, as originally studied in this context.[8] The PDF is unimodal and positively skewed, with the mode located at m = \mu \left[ \sqrt{1 + \frac{9\mu^2}{4\lambda^2}} - \frac{3\mu}{2\lambda} \right], which lies strictly between 0 and \mu. For fixed \mu, increasing \lambda shifts the mode rightward toward \mu and narrows the distribution, reflecting reduced variance \mu^3 / \lambda; conversely, small \lambda produces a left-skewed peak near 0 with a heavy right tail. Graphically, the PDF resembles a right-tailed bell curve, starting at 0, rising sharply to the mode, and decaying gradually, with the tail becoming more pronounced as \lambda decreases relative to \mu.[10][11] Asymptotically, as x \to 0^+, the PDF behaves like x^{-3/2} modulated by \exp(-\lambda / (2x)), leading to rapid decay to 0 despite the singularity in the prefactor. For large x, it exhibits exponential decay \sim \exp(-\lambda x / (2\mu^2)), characteristic of light-tailed behavior on the right. The distribution belongs to the exponential family with log-concave carrier measure x^{-3/2}, ensuring the PDF is log-concave overall; this property implies a unimodal likelihood function, facilitating maximum likelihood estimation.Cumulative Distribution Function
The cumulative distribution function (CDF) of the inverse Gaussian distribution IG(μ, λ) with shape parameter μ > 0 and scale parameter λ > 0 is F(x; \mu, \lambda) = \Phi\left( \sqrt{\frac{\lambda}{x}} \left( \frac{x}{\mu} - 1 \right) \right) + e^{2\lambda / \mu} \Phi\left( - \sqrt{\frac{\lambda}{x}} \left( \frac{x}{\mu} + 1 \right) \right), for x > 0, where \Phi denotes the CDF of the standard normal distribution, and F(x; \mu, \lambda) = 0 for x \leq 0. This closed-form expression, which expresses the CDF in terms of the normal CDF, was first derived by Shuster (1968) using a transformation of the integral form of the CDF. The CDF can be obtained by direct integration of the corresponding probability density function (PDF), f(x; \mu, \lambda) = \sqrt{\frac{\lambda}{2\pi x^3}} \exp\left( -\frac{\lambda (x - \mu)^2}{2 \mu^2 x} \right), for x > 0. The integration involves a substitution, such as letting z = \sqrt{\frac{\lambda (x - \mu)^2}{2 \mu^2 x}}, which simplifies the exponent and leads to terms that integrate to the normal CDF after completing the square and handling the boundary contributions. Alternatively, the CDF arises naturally from the connection to stochastic processes: the inverse Gaussian distribution models the first passage time to a fixed level in a Brownian motion with positive drift, where the CDF can be derived using the reflection principle for diffusion processes. The survival function, or complementary CDF, is S(x; \mu, \lambda) = 1 - F(x; \mu, \lambda), which simplifies to S(x; \mu, \lambda) = \Phi\left( - \sqrt{\frac{\lambda}{x}} \left( \frac{x}{\mu} - 1 \right) \right) + e^{2\lambda / \mu} \Phi\left( - \sqrt{\frac{\lambda}{x}} \left( \frac{x}{\mu} + 1 \right) \right). This form highlights the tail behavior, particularly for small λ, where the scale parameter controls the precision; low λ values result in a heavy right tail due to the increased variance \mu^3 / \lambda, making large x more probable relative to the mean μ.[12] The quantile function, defined as the inverse F^{-1}(p; \mu, \lambda) for $0 < p < 1, lacks a closed-form solution and requires numerical methods for evaluation, such as bisection or Newton-Raphson iteration applied to the CDF equation.[13]Moments and Characteristics
Mean, Variance, and Skewness
The mean of an inverse Gaussian random variable X \sim \text{IG}(\mu, \lambda) is E[X] = \mu, where \mu > 0 is the location parameter.[14] The variance is \text{Var}(X) = \mu^3 / \lambda, with \lambda > 0 serving as the shape parameter that controls dispersion.[14] These moments can be derived by direct integration of the probability density function or via the moment-generating function M(t) = \exp\left(\frac{\lambda}{\mu} \left(1 - \sqrt{1 - \frac{2\mu^2 t}{\lambda}}\right)\right) for t < \lambda/(2\mu^2).[14] The skewness is \gamma_1 = 3 \sqrt{\mu / \lambda}, which is always positive and increases as the ratio \mu / \lambda grows, reflecting the distribution's inherent right-tailed asymmetry.[14] The excess kurtosis is \gamma_2 = 15 \mu / \lambda, indicating leptokurtosis that becomes more pronounced with larger \mu / \lambda, meaning heavier tails than a normal distribution.[14] These moments capture the positive skew and peakedness typical of first passage times in Brownian motion with positive drift, where longer times are more probable due to the process's diffusive nature, leading to applications in reliability and survival analysis.[8] The mean and variance also underpin method-of-moments estimation for the parameters.[14]Characteristic Function
The characteristic function of a random variable X following an inverse Gaussian distribution with parameters \mu > 0 and \lambda > 0, denoted X \sim IG(\mu, \lambda), is given by \psi(t) = \mathbb{E}[e^{itX}] = \exp\left( \frac{\lambda}{\mu} \left(1 - \sqrt{1 - \frac{2i\mu^2 t}{\lambda}} \right) \right), where i is the imaginary unit and t \in \mathbb{R}.[14][3] This expression can be derived as the Fourier transform of the probability density function of the inverse Gaussian distribution, which involves integrating \int_0^\infty e^{itx} f(x; \mu, \lambda) \, dx where f(x; \mu, \lambda) = \sqrt{\frac{\lambda}{2\pi x^3}} \exp\left( -\frac{\lambda (x - \mu)^2}{2\mu^2 x} \right) for x > 0.[15] Alternatively, it arises from the representation of the inverse Gaussian as the first passage time distribution for a Brownian motion with positive drift \nu = 1/\mu and variance parameter \sigma^2 = 1/\lambda hitting a positive barrier at level a > 0, scaled appropriately, leveraging the known characteristic function for such hitting times.[3][15] The logarithm of the characteristic function, \log \psi(t), serves as the cumulant-generating function, from which all moments of the distribution can be obtained by successive differentiation with respect to t and evaluation at t = 0. For instance, the first cumulant yields the mean \mu, and the second cumulant yields the variance \mu^3 / \lambda.[14][3] The form of \psi(t) implies stability under convolution for sums of independent inverse Gaussian random variables sharing the same \lambda parameter, as the characteristic function of the sum is the product of the individual characteristic functions, preserving the inverse Gaussian family.[15] Unlike the characteristic function of the normal distribution, which is \exp(i\mu t - \frac{1}{2}\sigma^2 t^2) and symmetric, the inverse Gaussian's \psi(t) exhibits asymmetry reflective of its positive support and skewness, with the square root term introducing complex branching; it shares infinite divisibility with the gamma distribution, facilitating Lévy process representations.[3][15]Distributional Properties
Summation of Independent Variables
The sum of n independent and identically distributed random variables X_1, \dots, X_n, each following an inverse Gaussian distribution \mathrm{IG}(\mu, \lambda), itself follows an inverse Gaussian distribution with updated parameters \mathrm{IG}(n\mu, n^2\lambda).[16] This reproductive property arises from the multiplicative nature of characteristic functions for independent random variables. Specifically, the characteristic function of an \mathrm{IG}(\mu, \lambda) random variable is given by \phi(t) = \exp\left\{ \frac{\lambda}{\mu} \left( 1 - \sqrt{1 - \frac{2 i t \mu^2}{\lambda}} \right) \right\}, and for the sum, the product of n such functions simplifies to the characteristic function of \mathrm{IG}(n\mu, n^2\lambda), confirming the closed-form result.[16] For the more general case of independent but non-identically distributed inverse Gaussian random variables X_j \sim \mathrm{IG}(\mu_j, \lambda_j) for j = 1, \dots, n, the sum \sum_{j=1}^n X_j follows an inverse Gaussian distribution if and only if the ratio \mu_j^2 / \lambda_j is identical across all j, equal to some constant c > 0.[16] Under this condition, the sum is distributed as \mathrm{IG}\left( \sum_{j=1}^n \mu_j, \left( \sum_{j=1}^n \mu_j \right)^2 / c \right). The derivation follows from the product of the individual characteristic functions, which takes the form of an inverse Gaussian characteristic function precisely when the terms \mu_j^2 / \lambda_j align to yield a common structure inside the square root. Without this constancy condition, the probability density function of the sum is obtained via convolution of the individual densities, which generally lacks a simple closed form.[16] The inverse Gaussian distribution is infinitely divisible, meaning that for any n, it can be represented as the sum of n independent and identically distributed positive random variables. This property underpins its utility in modeling processes requiring decomposition into finer components, such as in renewal theory, where sums of inter-arrival times correspond to cumulative waiting times in diffusion-based renewal processes.Scaling and Location Transformations
The Inverse Gaussian distribution exhibits specific behavior under affine transformations, which is useful for understanding its flexibility in modeling scaled or shifted phenomena. Consider a random variable X \sim \text{IG}(\mu, \lambda), where \mu > 0 is the mean parameter and \lambda > 0 is the shape parameter, with probability density function f(x; \mu, \lambda) = \sqrt{\frac{\lambda}{2\pi x^3}} \exp\left( -\frac{\lambda (x - \mu)^2}{2 \mu^2 x} \right), \quad x > 0. For scaling, let Y = cX where c > 0. The mean of Y is E[Y] = c\mu, and the variance is \text{Var}(Y) = c^2 \cdot \mu^3 / \lambda = (c\mu)^3 / (c\lambda), which matches the form for an Inverse Gaussian with updated parameters. To derive this via PDF substitution, the density of Y is g(y) = \frac{1}{c} f\left(\frac{y}{c}; \mu, \lambda\right) = \sqrt{\frac{\lambda}{2\pi (y/c)^3}} \exp\left( -\frac{\lambda ((y/c) - \mu)^2}{2 \mu^2 (y/c)} \right) \cdot \frac{1}{c}. Simplifying the exponent: \lambda ((y/c) - \mu)^2 / (2 \mu^2 (y/c)) = \lambda (y - c\mu)^2 / (2 (c\mu)^2 y ), and the prefactor becomes \sqrt{(c\lambda)/(2\pi y^3)}, yielding g(y) = \sqrt{(c\lambda)/(2\pi y^3)} \exp\left( -(c\lambda) (y - c\mu)^2 / (2 (c\mu)^2 y ) \right). Thus, Y \sim \text{IG}(c\mu, c\lambda).[17] For location transformations, consider Z = X + a where a > 0. The Inverse Gaussian family is not closed under such shifts, as there is no closed-form expression for the density of Z. The PDF of Z would require the transformation f_Z(z) = f(z - a; \mu, \lambda) for z > a, but this does not simplify to another Inverse Gaussian density due to the altered support and the specific form of the original PDF involving $1/x terms. Approximations, such as saddlepoint methods or numerical integration, are typically employed for practical computations in shifted scenarios.[17] A single-parameter form arises when fixing the variance \sigma^2, leading to \lambda = \mu^3 / \sigma^2. Here, the distribution is parameterized solely by \mu, with the shape adjusted to maintain constant variance, which is useful in applications requiring variance stabilization, such as certain stochastic processes. Alternatively, setting \mu = 1 yields the standard Wald distribution, a one-parameter case varying only \lambda. This form simplifies analysis in contexts like first-passage times.[8] The reciprocal transformation W = 1/X follows a reciprocal Inverse Gaussian distribution, a special case of the generalized Inverse Gaussian with parameter p = -1/2. Briefly, in the limit as \mu \to \infty, this connects to the Lévy distribution, relevant for certain Brownian motion hitting times without drift. Derivation follows from substituting w = 1/x into the PDF, yielding h(w) = \frac{1}{w^2} f\left(\frac{1}{w}; \mu, \lambda\right) = \sqrt{\frac{\lambda}{2\pi}} w^{-1/2} \exp\left( -\frac{\lambda (1 - \mu w)^2}{2 \mu^2 w} \right), \quad w > 0, which matches the reciprocal form after reparameterization.[18]Exponential Family Form
The inverse Gaussian distribution can be expressed as a member of the two-parameter exponential family, providing a canonical parameterization that highlights its sufficient statistics and natural parameters for statistical modeling. The probability density function is rewritten in the formf(x \mid \eta) = h(x) \exp\left( \eta^\top T(x) - A(\eta) \right),
where h(x) = (2\pi x^3)^{-1/2} is the base measure for x > 0, T(x) = (x, 1/x) is the vector of sufficient statistics, \eta = (\eta_1, \eta_2) is the vector of natural parameters with \eta_1 = -\lambda/(2\mu^2) and \eta_2 = -\lambda/2, and A(\eta) = -2\sqrt{\eta_1 \eta_2} - \frac{1}{2} \log(-2\eta_2) is the log-partition function.[19] The inverse Gaussian forms a curved exponential subfamily because the natural parameters satisfy the relation \eta_1 = \eta_2 / \mu^2, implying that the parameter space is a curve in the full two-parameter space and inference requires adjustments for curvature.[20] This exponential family representation is particularly useful in generalized linear models (GLMs), where the inverse Gaussian serves as the response distribution with mean \mu > 0 and dispersion parameter \phi = 1/\lambda. The canonical link function is g(\mu) = 1/\mu^2, which directly relates the linear predictor \eta = X^\top \beta to the natural parameter associated with the mean structure, ensuring orthogonality between model components and simplifying iterative estimation algorithms like iteratively reweighted least squares.[21] In this framework, the variance function is V(\mu) = \mu^3, reflecting the distribution's heteroscedasticity. For model diagnostics in inverse Gaussian GLMs, the deviance measures goodness-of-fit as D = \sum_i (y_i - \hat{\mu}_i)^2 / (y_i \hat{\mu}_i^2), which under the null follows an approximate chi-squared distribution scaled by the degrees of freedom after estimating the dispersion.[22] Deviance residuals are defined as r_{D,i} = \operatorname{sign}(y_i - \hat{\mu}_i) \sqrt{ (y_i - \hat{\mu}_i)^2 / (y_i \hat{\mu}_i^2) }, providing a signed measure of discrepancy useful for residual plots and outlier detection, while Pearson residuals r_{P,i} = (y_i - \hat{\mu}_i) / \sqrt{\hat{\mu}_i^3 / \hat{\phi}} assess fit against the variance structure.[22] The exponential family form aids maximum likelihood estimation by concentrating the likelihood through the sufficient statistics T(x) = (x, 1/x), reducing dimensionality for large samples and enabling closed-form updates in GLM fitting.[19] For Bayesian methods, it supports conjugate priors on the natural parameters, facilitating posterior sampling via techniques like variational inference or MCMC, and leverages the family's regularity for asymptotic approximations in credible intervals and hypothesis testing.
Stochastic Process Connections
Relation to Brownian Motion
The inverse Gaussian distribution emerges naturally in the study of stochastic processes as the probability distribution governing the first passage time of a Brownian motion with positive drift to a fixed positive barrier. Consider a one-dimensional Brownian motion process defined by X_t = \nu t + \sigma W_t, where W_t is a standard Wiener process (with W_0 = 0), \nu > 0 denotes the constant positive drift, \sigma > 0 is the diffusion coefficient, and the process starts at X_0 = 0. The first passage time \tau_a = \inf\{t \geq 0 : X_t = a\} to a barrier level a > 0 follows an inverse Gaussian distribution with shape parameter \mu = a / \nu and scale parameter \lambda = a^2 / \sigma^2.[9] This connection provides a probabilistic foundation for the inverse Gaussian, linking it directly to the dynamics of particles or assets subject to random fluctuations with a systematic trend. The derivation of this distribution can be obtained using the reflection principle adapted for drifted paths or by solving the Kolmogorov forward (Fokker-Planck) equation for the transition density with an absorbing boundary at a. In the reflection approach, the probability of paths reaching a by time t accounts for the drift through an exponential tilting of the no-drift case, yielding the inverse Gaussian density after differentiation of the cumulative distribution. Alternatively, the Kolmogorov equation \partial_t p(x,t) = -\nu \partial_x p(x,t) + \frac{\sigma^2}{2} \partial_{xx} p(x,t), with absorbing conditions p(a,t) = 0 and initial condition p(x,0) = \delta(x), integrates to the same result via separation of variables or Laplace transforms.[9][23] The historical origins trace back to Erwin Schrödinger's 1915 derivation of this first passage time density in the context of gravitational fall experiments modeled by drifted diffusion, predating the formal naming of the distribution. Schrödinger's work established the explicit form of the density, highlighting its role in quantifying the time for a particle to traverse a distance under combined deterministic and random forces. This seminal contribution was independently corroborated by Marian Smoluchowski in the same year, solidifying the inverse Gaussian's place in early stochastic modeling. In broader diffusion settings, the inverse Gaussian generalizes to first passage times across absorbing barriers in processes with linear drift, such as in barrier option pricing or neuronal firing models, where the barrier represents an absorption threshold. For intuitive understanding, simulations of drifted Brownian paths reveal that while some trajectories cross the barrier quickly due to the positive drift, others wander before hitting, producing the characteristic right-skewed shape of the inverse Gaussian density that matches empirical hitting time histograms.[24]Zero Drift Case
In the zero drift case of the Brownian motion with unit variance, the first hitting time to a positive level a > 0 follows the Lévy distribution, which arises as the limiting case of the inverse Gaussian distribution when the drift parameter \nu \to 0^+. This corresponds to letting the mean parameter \mu = a / \nu \to \infty in the inverse Gaussian \operatorname{IG}(\mu, \lambda) while fixing the shape parameter \lambda = a^2.[25] The probability density function of the inverse Gaussian distribution is given by f(x; \mu, \lambda) = \sqrt{\frac{\lambda}{2\pi x^3}} \exp\left( -\frac{\lambda (x - \mu)^2}{2 \mu^2 x} \right), \quad x > 0. To derive the limiting density, expand the exponent: \frac{(x - \mu)^2}{\mu^2 x} = \frac{\mu^2 - 2\mu x + x^2}{\mu^2 x} = \frac{1}{x} - \frac{2}{\mu} + \frac{x}{\mu^2}. As \mu \to \infty, this simplifies to $1/x, so the exponent becomes -\lambda / (2x). The prefactor remains unchanged, yielding the limiting density f(x) = \sqrt{\frac{\lambda}{2\pi x^3}} \exp\left( -\frac{\lambda}{2x} \right), \quad x > 0, which is the probability density function of the Lévy distribution with location parameter 0 and scale parameter \lambda. A similar limiting procedure applies to the cumulative distribution function of the inverse Gaussian, confirming convergence to the Lévy distribution.[26] In this limiting case, the moments of the distribution exhibit special properties: the mean and variance are both infinite, reflecting the heavy tail behavior characteristic of the Lévy distribution. This distribution is a stable distribution with stability index \alpha = 1/2 and skewness parameter \beta = 1, belonging to the domain of attraction of stable laws with the same index. Additionally, as a stable distribution, it is infinitely divisible, allowing representation as the law of a Lévy process at time 1.[27]Parameter Estimation
Maximum Likelihood Estimation
The maximum likelihood estimator (MLE) for the mean parameter \mu of the inverse Gaussian distribution, based on an independent sample x_1, \dots, x_n, is the sample mean \hat{\mu} = \bar{x}.[28] The MLE for the shape parameter \lambda is then \hat{\lambda} = n \left/ \sum_{i=1}^n \left( \frac{1}{x_i} - \frac{1}{\bar{x}} \right) \right..[29] These estimators are obtained by maximizing the log-likelihood function \ell(\mu, \lambda; x_i) = \frac{n}{2} \log \lambda - \frac{1}{2} \sum_{i=1}^n \log(x_i^3) - \frac{\lambda}{2\mu^2} \sum_{i=1}^n \frac{(x_i - \mu)^2}{x_i}, which yields a system of score equations that decouple to provide the closed-form expressions above.[29] Although the estimators are explicit, numerical optimization methods such as Fisher scoring may be employed in practice for robustness, particularly in extended models or when implementing in software.[30] The inverse Gaussian belongs to the exponential family, which implies that the sufficient statistics \sum x_i and \sum 1/x_i underpin the MLEs.[28] Asymptotically, as n \to \infty, the MLEs are normally distributed: \sqrt{n} (\hat{\theta} - \theta) \to N(0, I(\theta)^{-1}), where \theta = (\mu, \lambda) and the Fisher information matrix is diagonal, I(\theta) = n \begin{pmatrix} \mu^{-2} & 0 \\ 0 & (2\lambda^2)^{-1} \end{pmatrix}. This yields asymptotic variances \mu^2 / n for \hat{\mu} and $2\lambda^2 / n for \hat{\lambda}.[30] The observed information matrix provides consistent estimates of these variances for finite n.[30] For small samples, \hat{\mu} is unbiased, but \hat{\lambda} exhibits upward bias approximately equal to $3\lambda / n. An unbiased estimator is obtained by multiplying \hat{\lambda} by (n-3)/n for n > 3.[31]Method of Moments
The method of moments offers a simple, closed-form approach to estimating the parameters of the inverse Gaussian distribution by matching the first two population moments to their sample counterparts. For a random sample y_1, y_2, \dots, y_n from an inverse Gaussian distribution with parameters \mu > 0 and \lambda > 0, the population mean is \mathbb{E}(Y) = \mu and the population variance is \mathrm{Var}(Y) = \mu^3 / \lambda. Equating the sample mean \bar{y} = n^{-1} \sum_{i=1}^n y_i to \mu yields the estimator \hat{\mu} = \bar{y}. Similarly, equating the sample variance s_y^2 = n^{-1} \sum_{i=1}^n (y_i - \bar{y})^2 to \mu^3 / \lambda and substituting \hat{\mu} gives \hat{\lambda} = \bar{y}^3 / s_y^2.[2] These plug-in estimators are consistent as the sample size increases, since the sample mean and variance converge to their population values. However, they are generally less efficient than maximum likelihood estimators, particularly for moderate to large samples, as the method of moments does not account for the full likelihood structure and can exhibit higher mean squared error in simulations. For small samples (e.g., n = 10), the method of moments may occasionally outperform maximum likelihood in terms of bias and efficiency for certain parameterizations, but this advantage diminishes with larger n.[32] Higher-order sample moments, such as skewness \gamma_1 = 3 \sqrt{\mu / \lambda} and kurtosis \kappa = 3 + 15 \mu / \lambda, can be computed from the data to assess robustness of the fit or validate assumptions, providing checks beyond the first two moments. The method of moments is particularly advantageous for quick initial approximations due to its non-iterative nature or in scenarios where maximum likelihood estimation fails, such as when observations are identical or the likelihood surface is poorly behaved.[2]Generation and Computation
Sampling Algorithms
Generating random variates from the inverse Gaussian distribution is crucial for Monte Carlo simulations, bootstrap procedures, and empirical studies in fields like reliability engineering and finance. Several algorithms have been developed to produce these variates efficiently, leveraging the distribution's connection to Brownian motion and its cumulative distribution function (CDF), which can be expressed in terms of the standard normal CDF. One standard approach is the inverse CDF method, also known as the inversion method. This technique generates a uniform random variable U \sim \mathcal{U}(0,1) and solves for X in the equation F(X) = U, where F is the CDF of the inverse Gaussian distribution IG(\mu, \lambda). Since the CDF lacks a closed-form inverse, numerical root-finding methods, such as Newton-Raphson or bisection, are applied to the equation involving the normal CDF term. This method ensures exact sampling but can be computationally intensive for high-precision requirements due to the iterative solving process.[33] A more efficient exact algorithm was introduced by Michael, Schucany, and Haas in 1976, utilizing a transformation with multiple roots. The method starts by generating a chi-squared random variate Y \sim \chi^2(1) and a uniform U \sim \mathcal{U}(0,1). It then solves the quadratic equation \frac{\lambda (x - \mu)^2}{\mu^2 x} = Y, which yields two positive roots x_1 < \mu < x_2 where x_2 = \mu^2 / x_1. The smaller root x_1 is selected with probability \mu / (\mu + x_1); otherwise, the larger root x_2 is selected. This approach requires one chi-squared and one uniform random number per variate and achieves high efficiency across parameter ranges.[34] In Bayesian contexts, Gibbs sampling provides a way to draw from the posterior distribution of inverse Gaussian parameters or generate variates conditional on data. For example, given data from IG(\mu, \lambda), the conditional posterior for \lambda given \mu and the data is gamma-distributed under a gamma prior; the conditional for \mu given \lambda and the data is typically non-standard (e.g., involving a truncated generalized Student's t after reparameterization θ = 1/μ) and may require Metropolis-Hastings steps. This method is particularly useful for hierarchical models and inference in survival analysis.[35] Efficiency comparisons among these methods highlight trade-offs depending on the parameters. The Michael et al. algorithm is exact with no rejections, making it superior to pure inversion for most practical cases; in contrast, the inverse CDF method's computational cost grows with the number of iterations needed for root-finding, often 5-10 per sample. Gibbs sampling, while versatile for Bayesian applications, incurs autocorrelation in chains, requiring thinning for independent variates, with effective sample sizes reduced by 20-50% for high-dimensional posteriors.[34] For implementation, a basic pseudocode outline of the Michael et al. algorithm is as follows:This pseudocode captures the core steps using a direct computation of the roots and probabilistic selection; actual implementations may vary slightly in root calculation for numerical stability.[34]Generate Y ~ χ²(1) // or Z ~ N(0,1), Y = Z² Let ν = λ / μ² Let x1 = μ * (sqrt(1 + 2 * ν / Y) - 1) / (ν / Y) // Smaller root; alternative formulations exist Let x2 = μ² / x1 // Larger root Generate U ~ Uniform(0,1) Let p = μ / (μ + x1) If U ≤ p: X = x1 Else: X = x2 Return XGenerate Y ~ χ²(1) // or Z ~ N(0,1), Y = Z² Let ν = λ / μ² Let x1 = μ * (sqrt(1 + 2 * ν / Y) - 1) / (ν / Y) // Smaller root; alternative formulations exist Let x2 = μ² / x1 // Larger root Generate U ~ Uniform(0,1) Let p = μ / (μ + x1) If U ≤ p: X = x1 Else: X = x2 Return X
Numerical Evaluation and Software
The probability density function (PDF) of the inverse Gaussian distribution can be evaluated directly using the formula f(x; \mu, \lambda) = \sqrt{\frac{\lambda}{2\pi x^3}} \exp\left( -\frac{\lambda (x - \mu)^2}{2 \mu^2 x} \right), \quad x > 0, where \mu > 0 is the mean parameter and \lambda > 0 is the shape parameter.[11] For numerical stability, particularly when x is large, the PDF is computed in log-scale to avoid underflow or overflow in the exponential term.[11][4] The cumulative distribution function (CDF) is given by F(x; \mu, \lambda) = \Phi\left( \frac{\lambda (x - \mu)}{\mu \sqrt{\lambda x}} \right) + \exp\left( \frac{2\lambda}{\mu} \right) \Phi\left( -\frac{\lambda (x + \mu)}{\mu \sqrt{\lambda x}} \right), where \Phi denotes the standard normal CDF.[11][4] Direct evaluation requires careful handling of the \exp(2\lambda / \mu) term, which can cause overflow for large \lambda / \mu; log-scale computations and conditional evaluation of terms mitigate this issue, ensuring accuracy down to machine precision.[11] Alternative approaches for the CDF, especially in the tails, include saddlepoint approximations, which replace the normal base with an inverse Gaussian base for improved accuracy over standard normal approximations.[36] For the two-parameter inverse Gaussian, maximum likelihood estimates (MLEs) have closed forms: \hat{\mu} = \bar{x} (sample mean) and \hat{\lambda} = n / \sum_{i=1}^n (1/x_i - 1/\hat{\mu}), but numerical optimization is often used in software for robustness or when fitting generalized forms with location/scale parameters. Newton-Raphson iterations provide efficient convergence for MLE in extended models, such as three-parameter variants, starting from moment-based initials.[37] The expectation-maximization (EM) algorithm is implemented for inverse Gaussian mixtures or random effects models, iteratively updating parameters via complete-data likelihoods.[38] Software implementations support accurate PDF and CDF evaluation alongside MLE fitting. In R, thestatmod package provides dinvgauss and pinvgauss functions, achieving 16-digit precision via log-scale and handling extreme parameters; for example, fitting is performed with fitdistr from MASS using MLE.[11] Python's scipy.stats.invgauss (introduced in SciPy 0.10.0 in 2012 and updated through recent versions as of 2025) computes PDF/CDF using the Boost Math library and supports fitting via fit with numerical optimization (e.g., least-squares or MLE), as in params = invgauss.fit(data).[39] MATLAB's Statistics and Machine Learning Toolbox offers pdf, cdf, and mle for the InverseGaussianDistribution object, enabling parameter estimation with pd = fitdist(data, 'InverseGaussian').[40]
Numerical challenges arise when \lambda is small, leading to heavy right tails and requiring high-precision arithmetic to resolve near-zero probabilities without truncation errors; implementations like statmod address this through mode-initialized iterations for related computations.[11]