Half-normal distribution

The half-normal distribution is a continuous probability distribution defined on the non-negative real numbers, representing the distribution of the absolute value of a random variable following a normal distribution with mean zero and scale parameter \sigma > 0.^[1]^[2] It serves as a special case of both the folded normal distribution (with location parameter \mu = 0) and the truncated normal distribution, and is equivalent to the chi distribution with one degree of freedom when \sigma = 1.^[1]^[3] The probability density function (PDF) of the half-normal distribution is given by
f(x \mid \sigma) = \frac{\sqrt{2}}{\sigma \sqrt{\pi}} \exp\left( -\frac{x^2}{2\sigma^2} \right)
for x \geq 0, and zero otherwise, where the mode is at x = 0 and the density is strictly decreasing.^[4]^[1] The cumulative distribution function (CDF) is
F(x \mid \sigma) = \erf\left( \frac{x}{\sigma \sqrt{2}} \right) = 2\Phi\left( \frac{x}{\sigma} \right) - 1,
for x \geq 0, where \erf is the error function and \Phi is the standard normal CDF.^[2]^[4] Key moments include the mean \mu = \sigma \sqrt{2/\pi} \approx 0.7979\sigma, the variance \sigma^2 (1 - 2/\pi) \approx 0.3634\sigma^2, skewness \sqrt{2}(4 - \pi)/(\pi - 2)^{3/2} \approx 0.9953, and excess kurtosis $8(\pi - 3)/(\pi - 2)^2 \approx 0.8784.^[3]^[4] Higher-order moments are \mathbb{E}[X^n] = 2^{n/2} \sigma^n \Gamma((n+1)/2) / \sqrt{\pi} for n \geq 1.^[1] This distribution finds applications in statistical modeling of non-negative phenomena, such as measurement errors, lifetimes under fatigue in manufacturing, and process deviations in quality control charts.^[2] It is also used as a prior for variance parameters in Bayesian inference and in reliability engineering for estimating failure rates.^[5]^[6] Parameter estimation typically employs maximum likelihood, with the scale parameter \hat{\sigma} satisfying \hat{\sigma} = \sqrt{2/\pi} \cdot \mathbb{E}[X] for the sample mean in large samples.^[2]

Definition

Probability density function

The half-normal distribution is defined as the distribution of the absolute value of a random variable following a normal distribution with mean zero and variance \sigma^2. If X \sim \mathcal{N}(0, \sigma^2), then Y = |X| follows a half-normal distribution with scale parameter \sigma > 0.^[7]^[1] The probability density function of Y is

f(y; \sigma) = \frac{\sqrt{2 / \pi}}{\sigma} \exp\left( -\frac{y^2}{2 \sigma^2} \right), \quad y \geq 0,

and f(y; \sigma) = 0 for y < 0.^[3]^[7] The support of the distribution is the non-negative real line [0, \infty).^[1] This PDF arises by folding the symmetric normal density over the positive axis, which doubles the probability mass on y \geq 0 due to the reflection of the negative tail, and the normalizing constant \sqrt{2 / \pi} ensures the total probability integrates to 1 over the half-line, as derived from the integral of the standard normal density from 0 to \infty, which equals $1/2, combined with the folding factor of 2.^[7]^[1] Qualitatively, the PDF exhibits a bell-like shape truncated at zero, with its mode at y = 0 where the density achieves its maximum value of \sqrt{2 / \pi} / \sigma, and it decreases monotonically toward zero as y increases to infinity.^[3]

Cumulative distribution function

The cumulative distribution function of the half-normal distribution with scale parameter \sigma > 0 is

F(y; \sigma) = \begin{cases} 0 & \text{if } y < 0, \\ \operatorname{erf}\left( \frac{y}{\sigma \sqrt{2}} \right) & \text{if } y \geq 0, \end{cases}

where \operatorname{erf}(z) = \frac{2}{\sqrt{\pi}} \int_0^z e^{-t^2} \, dt denotes the error function.^[2] This expression arises from integrating the probability density function over [0, y], which corresponds to the probability that the absolute value of a zero-mean normal random variable with standard deviation \sigma falls between 0 and y. Equivalently, it equals the normal cumulative distribution function evaluated from -y to y, scaled by the folding at zero: F(y; \sigma) = 2\Phi(y / \sigma) - 1 for y \geq 0, where \Phi is the cumulative distribution function of the standard normal distribution.^[2] The quantile function (inverse cumulative distribution function) is y_p = \sigma \sqrt{2} \, \operatorname{erf}^{-1}(p) for $0 < p < 1, which facilitates the generation of random variates.^[2] At the boundaries, F(0; \sigma) = 0 and \lim_{y \to \infty} F(y; \sigma) = 1.^[2]

Properties

Moments

The moments of the half-normal distribution Y, defined as the distribution of |X| where X \sim \mathcal{N}(0, \sigma^2), can be derived by integrating powers of Y against its probability density function or by leveraging the even moments of the underlying normal distribution.^[8] The mean is given by
\mathbb{E}[Y] = \sigma \sqrt{\frac{2}{\pi}}.
This follows from the integral \mathbb{E}[Y] = \int_0^\infty y \cdot \frac{\sqrt{2}}{\sigma \sqrt{\pi}} \exp\left( -\frac{y^2}{2\sigma^2} \right) \, dy, which, via the substitution u = y^2 / (2\sigma^2), simplifies to \sigma \sqrt{2/\pi}.^[8]^[3] The variance is
\mathrm{Var}(Y) = \sigma^2 \left( 1 - \frac{2}{\pi} \right).
It is computed as \mathbb{E}[Y^2] - (\mathbb{E}[Y])^2, where \mathbb{E}[Y^2] = \sigma^2 since Y^2 = X^2 follows a chi-squared distribution with one degree of freedom.^[8]^[3] Higher-order moments exhibit positive skewness and moderate excess kurtosis, reflecting the distribution's asymmetry and heavier right tail compared to the normal. The skewness coefficient is
\gamma_1 = \sqrt{2} \frac{4 - \pi}{(\pi - 2)^{3/2}} \approx 0.995,
a parameter-free measure derived from the third central moment normalized by the cube of the standard deviation.^[8]^[3] The excess kurtosis is
\gamma_2 = \frac{8(\pi - 3)}{(\pi - 2)^2} \approx 0.870,
indicating leptokurtic tails relative to the normal distribution; the raw kurtosis is thus approximately 3.870. This is obtained from the fourth central moment divided by the fourth power of the standard deviation.^[8]^[3] The general raw moment for k > -1 is
\mathbb{E}[Y^k] = \sigma^k \cdot 2^{k/2} \cdot \frac{\Gamma\left( \frac{k+1}{2} \right)}{\sqrt{\pi}}.
This closed-form expression arises from evaluating the integral \int_0^\infty y^k f_Y(y) \, dy using the substitution t = y^2 / (2\sigma^2), which transforms it into a Gamma function integral: \mathbb{E}[Y^k] = 2^{k/2} \sigma^k \int_0^\infty t^{(k-1)/2} e^{-t} \, dt / \sqrt{\pi} = \sigma^k 2^{k/2} \Gamma((k+1)/2) / \sqrt{\pi}. For positive integer k, it equivalently expresses in terms of double factorials.^[8]^[3]

Characteristic function and generating functions

The characteristic function of a half-normal random variable Y \sim \mathrm{HN}(0, \sigma^2), defined as \phi_Y(t) = \mathbb{E}[e^{itY}], is expressed as

\phi_Y(t) = e^{-\sigma^2 t^2 / 2} \left[1 + \erf\left(i \frac{\sigma t}{\sqrt{2}}\right)\right],

where \erf denotes the error function extended to complex arguments via analytic continuation. This form arises from the integral representation \phi_Y(t) = \sqrt{\frac{2}{\pi}} \int_0^\infty e^{itx} e^{-x^2/(2\sigma^2)} \, dx / \sigma, leveraging the symmetry of the underlying normal distribution and properties of the Gaussian integral. An equivalent expression uses the cumulative distribution function \Phi of the standard normal: \phi_Y(t) = 2 e^{-\sigma^2 t^2 / 2} \Phi(i \sigma t).^[9] The complex error function ensures the function is well-defined, though numerical evaluation requires care due to the imaginary argument. The moment-generating function M_Y(t) = \mathbb{E}[e^{tY}] follows from analytic continuation of the characteristic function, yielding

M_Y(t) = e^{\sigma^2 t^2 / 2} \left[1 + \erf\left(\frac{\sigma t}{\sqrt{2}}\right)\right]

for all real t, reflecting the positive support of the distribution. This expression is obtained by substituting t \to -it in \phi_Y(t), exploiting the entire nature of the error function. The domain of convergence is the entire real line, unlike distributions with unbounded negative support, but the half-line restriction introduces challenges in asymptotic expansions for large |t|, where the error function approaches its asymptotic limits slowly. The cumulant-generating function is K_Y(t) = \log M_Y(t), from which cumulants are derived as successive derivatives at t=0; the first cumulant equals the mean \sigma \sqrt{2/\pi}, and the second equals the variance \sigma^2 (1 - 2/\pi), consistent with the moments of the distribution. These generating functions facilitate theoretical analyses, such as convolutions of half-normal variables, which appear in limit theorems for non-negative processes like absolute deviations or folded errors in statistical models. For instance, the characteristic function supports singularity analysis in bivariate generating functions, yielding half-normal limits for parameters in combinatorial structures under zero drift.^[10] The restricted support complicates direct application of some Fourier inversion techniques compared to full-line distributions, necessitating analytic continuation for inversion formulas.

Parameter estimation

Method of moments

The method of moments for estimating the scale parameter σ of the half-normal distribution involves computing the sample moments and setting them equal to the corresponding population moments. The population mean is E[Y] = σ √(2/π) and the population second raw moment is E[Y²] = σ².^[11] To implement this, calculate the sample mean m₁ = (1/n) ∑{i=1}^n y_i and the sample second raw moment m₂ = (1/n) ∑{i=1}^n y_i², where n is the sample size and y_i are the observations.^[11] Using the first moment matching, the estimator is obtained by solving m₁ = \hat{σ} √(2/π), yielding \hat{σ} = m₁ √(π/2). This estimator is unbiased, since the sample mean is unbiased for E[Y] and the transformation is linear, and it is consistent as n → ∞ by the law of large numbers.^[11]^[12] An alternative estimator uses the second raw moment matching, solving m₂ = \hat{σ}², giving \hat{σ} = √m₂. This estimator is the same as the maximum likelihood estimator. It is unbiased for σ² but biased downward for σ due to the concavity of the square root function. It is consistent as n → ∞.^[11]^[12] The first moment estimator is simple to compute but less efficient than the maximum likelihood estimator for small samples, as it does not utilize the full shape information of the distribution.^[11]

Maximum likelihood estimation

The maximum likelihood estimator (MLE) for the scale parameter \sigma of the half-normal distribution is derived from the log-likelihood function for an independent and identically distributed sample y_1, \dots, y_n > 0. The probability density function of the half-normal distribution is f(y \mid \sigma) = \sqrt{\frac{2}{\pi}} \frac{1}{\sigma} \exp\left(-\frac{y^2}{2\sigma^2}\right) for y > 0, leading to the log-likelihood

\ell(\sigma) = n \log\left(\sqrt{\frac{2}{\pi}} \frac{1}{\sigma}\right) - \frac{1}{2\sigma^2} \sum_{i=1}^n y_i^2 = \frac{n}{2} \log\left(\frac{2}{\pi}\right) - n \log \sigma - \frac{1}{2\sigma^2} \sum_{i=1}^n y_i^2.

Maximizing \ell(\sigma) with respect to \sigma > 0 yields the MLE \hat{\sigma}_{\text{MLE}} = \sqrt{\frac{1}{n} \sum_{i=1}^n y_i^2}, which corresponds to the root mean square of the sample.^[13] This estimator arises directly from the invariance property of maximum likelihood estimation applied to the folded normal distribution with known mean zero, where the half-normal is a special case; the squared observations y_i^2 match those of the underlying normal distribution, preserving the MLE for the scale.^[13] The MLE \hat{\sigma}_{\text{MLE}} is consistent, with n \hat{\sigma}_{\text{MLE}}^2 / \sigma^2 following a \chi^2_n distribution scaled appropriately, converging to the true \sigma as n \to \infty. Under standard regularity conditions, it is asymptotically efficient and normally distributed: \sqrt{n} (\hat{\sigma}_{\text{MLE}} - \sigma) \xrightarrow{d} \mathcal{N}\left(0, \frac{\sigma^2}{2}\right), where the asymptotic variance \sigma^2 / 2 derives from the Fisher information I(\sigma) = 2n / \sigma^2.^[13] For datasets potentially including zeros or subject to censoring (e.g., in regression residuals truncated at zero), modifications to the likelihood are required to account for the truncation or censoring mechanism, though explicit forms depend on the specific setup.

Applications

In statistical diagnostics

The half-normal plot serves as a key diagnostic tool in statistics for assessing the normality of residuals in regression models and identifying significant effects in experimental designs. It involves plotting the ordered absolute values of residuals against the theoretical quantiles of the half-normal distribution, where a straight-line pattern indicates that the residuals follow a half-normal distribution, supporting the assumption of underlying normality in the errors. In regression diagnostics, the half-normal plot is particularly useful for detecting deviations from normality and pinpointing outliers or influential observations. For instance, points that deviate substantially from the straight line suggest non-normal behavior or potential anomalies in the data. This approach extends to generalized linear models, where simulated envelopes around the plot provide a reference band derived from multiple simulations under the assumed model, aiding in the visual identification of systematic departures.^[14] A prominent application is Daniel's plot, employed in the analysis of factorial experiments within design of experiments (DOE) to evaluate factor effects. Here, the absolute values of estimated effects are ranked and plotted against half-normal quantiles; effects aligning with the line are deemed negligible due to noise, while those straying indicate active factors. In unreplicated two-level factorial designs, this method allows practitioners to distinguish real effects from random variation without formal hypothesis tests.^[15] Simulation-based testing enhances the half-normal plot by generating half-normal variables from the fitted model to construct confidence envelopes, enabling comparison of observed residuals or effects against expected distributions for outlier detection. If observed points fall outside the envelope, it signals potential outliers or model misspecification, providing a robust, non-parametric way to validate assumptions. This technique is especially valuable in ANOVA contexts, where effects from factorial designs are ranked and plotted against half-normal quantiles to isolate significant factors amid experimental noise.^[14] The half-normal plot was introduced by Cuthbert Daniel in 1959 specifically for interpreting two-level factorial experiments, revolutionizing the visual screening of effects in industrial and scientific studies.^[15]

In physical modeling

The half-normal distribution is employed in physical modeling to represent the magnitudes of symmetric errors in measurements, where the underlying errors follow a normal distribution but only non-negative deviations are physically meaningful, such as in metrology for constrained positive quantities. For instance, when evaluating uncertainties in experimental data, the absolute value of normally distributed errors yields a half-normal distribution, allowing for the quantification of deviation scales without negative values that lack physical interpretation. This approach is particularly useful in Bayesian Type A evaluations under the Guide to the Expression of Uncertainty in Measurement (GUM), where the half-normal arises from truncating a normal prior at zero to reflect physical boundaries.^[16] In reliability engineering, the half-normal distribution models the distribution of failure times or material strengths when the underlying process is governed by a normal distribution, but only positive excursions are relevant, such as in scenarios involving fracture toughness or tensile strengths. The scale parameter σ in this context interprets the variability of the underlying normal process, representing the standard deviation of the symmetric deviations before folding to the positive domain.^[3] In signal processing, the half-normal distribution describes the amplitude of Gaussian noise, particularly after envelope detection or modulus operations, such as in the Fourier transform of white Gaussian noise where the magnitude follows a half-normal form. This is evident in ultrasonic guided wave detection for weak Doppler signals, where the noise amplitude's half-normal behavior aids in signal-to-noise ratio assessments and detection thresholds. Similarly, in acoustics, intensity fluctuations due to spatial variations in room measurements, like reverberation times, can be modeled with half-normal distributions to account for uncertainties from normal-distributed phase or amplitude perturbations. In optics, such as interferometric tomography, half-normal models arise from the absolute values of normal phase errors, influencing stability analyses of wavefront reconstructions.^[17]^[18]

Relation to the normal distribution

The half-normal distribution is defined as the distribution of the absolute value of a random variable following a normal distribution with mean zero. Specifically, if X \sim \mathcal{N}(0, \sigma^2), then Y = |X| has a half-normal distribution with scale parameter \sigma > 0. This construction preserves the scale \sigma of the underlying normal distribution while restricting the support of Y to the non-negative real line [0, \infty).^[19]/05:_Special_Distributions/5.13:_The_Folded_Normal_Distribution) The cumulative distribution function of Y is derived from the symmetry of the normal distribution via the probability integral transform:

F_Y(y) = P(Y \leq y) = 2 \Phi\left( \frac{y}{\sigma} \right) - 1, \quad y \geq 0,

where \Phi denotes the cumulative distribution function of the standard normal distribution \mathcal{N}(0,1). For y < 0, F_Y(y) = 0. This relation highlights the half-normal as a folded version of the centered normal, capturing only the positive deviations. Samples from the half-normal distribution can be straightforwardly generated by drawing from \mathcal{N}(0, \sigma^2) and applying the absolute value operation.^[20] In contrast to the symmetric normal distribution centered at zero, the half-normal is asymmetric, with its probability density function achieving its maximum (mode) at zero and monotonically decreasing thereafter. This peaked behavior at the origin reflects the folding process, which concentrates mass near zero compared to the bell-shaped normal. If the underlying normal has a non-zero mean \mu \neq 0, the absolute value yields the more general folded normal distribution; the half-normal is the specific case where \mu = 0.^[3]/05:_Special_Distributions/5.13:_The_Folded_Normal_Distribution)

Connections to chi and Rayleigh distributions

The half-normal distribution is equivalent to the chi distribution with one degree of freedom. Specifically, if Z \sim \mathcal{N}(0, 1), then |Z| follows the standard chi distribution with k = 1, which is identical to the standard half-normal distribution with scale parameter \sigma = 1. For the general case where the underlying normal has variance \sigma^2, the half-normal random variable is \sigma times a standard chi(1) variate.^[21]^[22] This equivalence extends to moments: the mean of the half-normal distribution with parameter \sigma is \sigma \sqrt{2/\pi}, matching the mean of the chi(1) distribution scaled by \sigma, while the variance is \sigma^2 (1 - 2/\pi). The half-normal thus serves as the k=1 special case within the broader chi distribution family, which generalizes the Euclidean norm of a k-dimensional standard multivariate normal vector.^[21] The Rayleigh distribution connects to the half-normal through this shared framework, as the Rayleigh with parameter \sigma is the chi distribution with k=2, representing the norm of a two-dimensional standard multivariate normal scaled by \sigma. In contrast to the half-normal's mean of \sigma \sqrt{2/\pi}, the Rayleigh has mean \sigma \sqrt{\pi/2} and variance \sigma^2 (4 - \pi)/2, highlighting their distinct roles despite common origins. In applications to multivariate normal distributions with zero mean, the absolute value of any single marginal component follows a half-normal distribution, whereas the full vector norm follows a chi distribution with degrees of freedom equal to the dimension; for dimension 2, this yields the Rayleigh.^[23]^[24]