Folded normal distribution
The folded normal distribution is a continuous probability distribution on the non-negative real line [0, ∞), obtained as the distribution of the absolute value of a random variable following a normal distribution with mean μ and variance σ².[1] Its probability density function is given by f(x; \mu, \sigma) = \frac{1}{\sigma \sqrt{2\pi}} \left[ \exp\left( -\frac{(x - \mu)^2}{2\sigma^2} \right) + \exp\left( -\frac{(x + \mu)^2}{2\sigma^2} \right) \right] for x \geq 0.[1] The cumulative distribution function is F(x; \mu, \sigma) = \Phi\left( \frac{x - \mu}{\sigma} \right) + \Phi\left( \frac{x + \mu}{\sigma} \right) - 1, with Φ the standard normal CDF.[2] Key properties include its unimodal shape (or monotonically decreasing when μ is small relative to σ), closure under scaling, and relation to the non-central chi-squared distribution with one degree of freedom when appropriately scaled.[1] The mean is \mu_f = \mu \left[ 2\Phi\left( \frac{\mu}{\sigma} \right) - 1 \right] + \sigma \sqrt{\frac{2}{\pi}} \exp\left( -\frac{\mu^2}{2\sigma^2} \right), which simplifies in the special case of the half-normal distribution (μ = 0) to \sigma \sqrt{2/\pi}, while the variance is \sigma_f^2 = \mu^2 + \sigma^2 - \mu_f^2.[1] Higher moments, such as skewness and kurtosis, depend on the ratio θ = μ/σ; for instance, as θ increases, the distribution approaches a normal distribution, with skewness decreasing and kurtosis minimizing around θ ≈ 1.8.[3] Originally discussed in statistical literature for analyzing folded data in the 1960s, the distribution has applications in modeling magnitudes of normally distributed errors, such as deviations in automobile strut alignments, body mass index data, and economic process limits where negative values are impossible or unobservable.[3][1] Parameter estimation typically relies on methods of moments using the first two or second and fourth sample moments, with efficiency varying by θ; for θ > 0.62, the first two moments provide better estimates, while for smaller θ, the second and fourth are preferable.[3]Overview
Definition and Motivation
The folded normal distribution arises as the distribution of the absolute value of a normal random variable, providing a model for non-negative quantities that originate from symmetric processes around zero. It generalizes the half-normal distribution, which corresponds to the case of a zero-mean normal variable, by allowing the underlying normal to have a non-zero mean. This distribution was introduced by Leone, Nelson, and Nottingham in 1961 as a way to handle scenarios where negative values from a normal process are "folded" over to the positive side, such as when algebraic signs are omitted in measurements.[4] Formally, if X \sim \mathcal{N}(\mu, \sigma^2), then the random variable Y = |X| follows a folded normal distribution with parameters \mu and \sigma > 0. The support of Y is the non-negative real line [0, \infty), and the distribution degenerates to a point mass at zero in the limiting case where \mu = 0 and \sigma \to 0. This construction reflects the folding mechanism, where the probability mass from the negative tail of the normal is mirrored onto the positive axis.[4] The motivation for the folded normal distribution stems from its utility in modeling phenomena that are inherently non-negative but derived from normally distributed errors or deviations, such as absolute measurement errors in engineering or the magnitudes of displacements in stochastic processes. For instance, it applies to situations like recording only the absolute deviations in automobile strut alignments, where the underlying errors may be symmetric but the observed quantities cannot be negative. This makes it particularly valuable in fields requiring positive-valued models with Gaussian-like tails, extending beyond the restrictive zero-mean assumption of the half-normal.[4][1]Parameters and Support
The folded normal distribution is parameterized by two real-valued parameters inherited from the underlying normal distribution: \mu \in \mathbb{R}, which serves as the location parameter and determines the point of folding by shifting the mean of the normal variable before taking its absolute value, and \sigma > 0, which acts as the scale parameter controlling the dispersion or spread of the distribution.[5] When \mu = 0, the folded normal distribution reduces to the half-normal distribution, which is symmetric around zero before folding.[5] The support of the folded normal distribution is the non-negative real line, y \geq 0, where the entire probability mass is concentrated.[5] The distribution is continuous, so P(Y = 0) = 0 for any \sigma > 0, though the density value at the boundary increases toward \sqrt{2/\pi}/\sigma as \mu approaches 0.[5] The parameter \sigma must be strictly positive to ensure a well-defined probability distribution, while \mu has no such restriction and can represent symmetric (\mu = 0) or asymmetric (\mu \neq 0) folding cases; note that the distributions for \mu and -\mu are identical due to the absolute value operation.[5] At the lower boundary, as y \to 0^+, the probability density approaches \frac{\sqrt{2/\pi}}{\sigma} \exp\left( -\frac{\mu^2}{2\sigma^2} \right), reflecting the contribution from the folded negative tail of the underlying normal.[5] At the upper boundary, as y \to \infty, the density decays exponentially, mirroring the tail behavior of the normal distribution from which it is derived.[5]Distribution Functions
Probability Density Function
The folded normal distribution arises as the distribution of the absolute value Y = |X|, where X follows a normal distribution with mean \mu and standard deviation \sigma > 0.[3] To derive its probability density function, consider the transformation Y = |X|. For y \geq 0, the density f_Y(y) accounts for contributions from both X = y (with probability density f_X(y)) and X = -y (with probability density f_X(-y)), since both map to the same y. Thus, f_Y(y) = f_X(y) + f_X(-y), where f_X(x) = \frac{1}{\sigma \sqrt{2\pi}} \exp\left( -\frac{(x - \mu)^2}{2\sigma^2} \right) is the normal density. Substituting yields the explicit form \begin{aligned} f_Y(y \mid \mu, \sigma) &= \frac{1}{\sigma \sqrt{2\pi}} \left[ \exp\left( -\frac{(y - \mu)^2}{2\sigma^2} \right) + \exp\left( -\frac{(y + \mu)^2}{2\sigma^2} \right) \right], \quad y \geq 0, \end{aligned} and f_Y(y) = 0 for y < 0.[3][6] This formula can equivalently be expressed using the standard normal density \phi(z) = \frac{1}{\sqrt{2\pi}} \exp\left( -\frac{z^2}{2} \right): f_Y(y \mid \mu, \sigma) = \frac{1}{\sigma} \left[ \phi\left( \frac{y - \mu}{\sigma} \right) + \phi\left( \frac{y + \mu}{\sigma} \right) \right], \quad y \geq 0. The derivation follows directly from the change-of-variable technique for the absolute value transformation, preserving the total probability mass without additional normalization constants.[6] A special case occurs when \mu = 0, reducing the folded normal to the half-normal distribution: f_Y(y \mid 0, \sigma) = \frac{2}{\sigma \sqrt{2\pi}} \exp\left( -\frac{y^2}{2\sigma^2} \right), \quad y \geq 0. This reflects the symmetry of the underlying normal distribution about zero, doubling the density on the positive axis.[3] The density integrates to 1 over [0, \infty), confirming it is properly normalized. To see this, compute \int_0^\infty f_Y(y) \, dy = \int_0^\infty f_X(y) \, dy + \int_0^\infty f_X(-y) \, dy. The first integral is P(X \geq 0), and the second, by substitution u = -y, becomes \int_{-\infty}^0 f_X(u) \, du = P(X < 0). Their sum is P(X \geq 0) + P(X < 0) = 1.[6]Cumulative Distribution Function
The cumulative distribution function (CDF) of the folded normal distribution is given by F(y \mid \mu, \sigma) = \Phi\left( \frac{y - \mu}{\sigma} \right) + \Phi\left( \frac{y + \mu}{\sigma} \right) - 1 for y \geq 0, where \Phi denotes the CDF of the standard normal distribution.[7] This form arises from the equivalence with the non-central chi distribution with one degree of freedom.[8] To derive this expression, consider a random variable Y = |X|, where X \sim N(\mu, \sigma^2). The CDF is then F(y) = P(Y \leq y) = P(-y \leq X \leq y) = \Phi\left( \frac{y - \mu}{\sigma} \right) - \Phi\left( \frac{-y - \mu}{\sigma} \right) for y \geq 0.[8] Applying the symmetry property \Phi(-z) = 1 - \Phi(z) to the second term yields the simplified form above.[7] A special case occurs when \mu = 0, reducing the folded normal to the half-normal distribution with CDF F(y \mid 0, \sigma) = 2\Phi\left( \frac{y}{\sigma} \right) - 1 for y \geq 0.[8] The CDF is continuous and strictly increasing on [0, \infty), starting at F(0) = 0 and approaching 1 as y \to \infty.[7] This monotonicity ensures the existence and uniqueness of the quantile function F^{-1}(p) for p \in (0, 1), which is invertible and facilitates numerical computation of quantiles, though it generally requires iterative methods due to the involvement of the normal CDF.[7]Statistical Properties
Moments
The moments of the folded normal distribution can be derived by computing the expected value E[Y^k] = \int_0^\infty y^k f(y; \mu, \sigma) \, dy, where f(y; \mu, \sigma) is the probability density function of the distribution. This integral leverages properties of the underlying normal distribution, such as its moment-generating function or direct substitution using the standard normal density and cumulative distribution function \Phi, often resulting in expressions involving the error function or Gaussian integrals.[3] The mean is given by E[Y] = \mu \left( 2\Phi\left(\frac{\mu}{\sigma}\right) - 1 \right) + \sigma \sqrt{\frac{2}{\pi}} \exp\left( -\frac{\mu^2}{2\sigma^2} \right). The second raw moment is E[Y^2] = \mu^2 + \sigma^2, reflecting that Y^2 = X^2 for the underlying normal random variable X \sim N(\mu, \sigma^2). The variance follows as \operatorname{Var}(Y) = E[Y^2] - (E[Y])^2 = \mu^2 + \sigma^2 - \mu_f^2, where \mu_f = E[Y].[3][1] In the special case where \mu = 0, the folded normal reduces to the half-normal distribution, with mean E[Y] = \sigma \sqrt{2/\pi} and variance \operatorname{Var}(Y) = \sigma^2 (1 - 2/\pi). Higher moments follow the general integration approach, yielding closed-form expressions for low orders using normal moments and the CDF; for instance, the third and fourth raw moments involve additional terms with \Phi and the standard normal density. Skewness and kurtosis are then obtained from the central moments: skewness as the third central moment divided by the cube of the standard deviation, and kurtosis as the fourth central moment divided by the square of the variance (with excess kurtosis subtracting 3). For the half-normal case (\mu = 0), the skewness is \sqrt{2} (4 - \pi) / (\pi - 2)^{3/2} \approx 0.995, and the kurtosis is approximately 5.545 (excess kurtosis ≈ 2.545). In general, skewness \gamma_1(\theta) and excess kurtosis \kappa(\theta) (with \theta = \mu / \sigma) decrease from their half-normal values at \theta = 0 to 0 and 0, respectively, as \theta \to \infty, with \kappa(\theta) minimizing around \theta \approx 1.8.[3][1] As |\mu| / \sigma \to \infty, the probability mass below zero becomes negligible, so the moments of the folded normal approach those of a normal distribution truncated at zero, which asymptotically match the untruncated normal moments.[3]Mode and Median
The mode of the folded normal distribution is the value that maximizes its probability density function. When the parameter μ ≤ 0, the mode occurs at 0, as the density is monotonically decreasing from that point. For μ > 0, the mode is at 0 if μ < σ; otherwise, it occurs at the positive value ŷ > 0 that solves the equation ŷ = -\frac{\sigma^2}{2\mu} \log\left( \frac{\mu - \hat{y}}{\mu + \hat{y}} \right). This transcendental equation has no closed-form solution and requires numerical methods, such as root-finding algorithms, to solve for ŷ. When μ/σ is large (specifically μ > 3σ), the mode approximates μ, reflecting the convergence of the folded normal to the underlying normal distribution.[1] The median of the folded normal distribution is the value m > 0 such that the cumulative distribution function equals 0.5. The CDF is given by F(x) = \frac{1}{2} \left[ \erf\left( \frac{x - \mu}{\sigma \sqrt{2}} \right) + \erf\left( \frac{x + \mu}{\sigma \sqrt{2}} \right) \right], where \erf denotes the error function. In general, no closed-form expression exists for m, so it is computed numerically by inverting F(x) = 0.5, often using methods like bisection or Newton-Raphson that leverage the inverse normal CDF for efficiency. For the special case μ = 0 (reducing to the half-normal distribution), the median simplifies to m = \sigma \sqrt{2} , \erfinv(0.5) \approx 0.6745 \sigma.[1][9] For μ ≥ 0, the folded normal distribution exhibits right-skewness, leading to the ordering mode ≤ median ≤ mean. Equality holds in the limiting case as μ/σ → ∞, where the distribution becomes symmetric and approximates the normal with mean μ. When μ = 0, the mode is strictly at 0, while the median and mean exceed 0, highlighting the skewness.[1]Characteristic Function
The characteristic function of a random variable Y following the folded normal distribution FN(\mu, \sigma^2), defined as Y = |X| where X \sim N(\mu, \sigma^2), is given by \psi_Y(t) = e^{i\mu t - \frac{1}{2}\sigma^2 t^2} \Phi\left(\frac{\mu}{\sigma} + i \sigma t \right) + e^{-i\mu t - \frac{1}{2}\sigma^2 t^2} \Phi\left(-\frac{\mu}{\sigma} + i \sigma t \right), where \Phi(\cdot) denotes the cumulative distribution function of the standard normal distribution, extended to complex arguments via the error function: \Phi(z) = \frac{1}{2} + \frac{1}{2} \erf\left(\frac{z}{\sqrt{2}}\right). This expression arises from the definition \psi_Y(t) = \int_0^\infty e^{ity} f_Y(y) \, dy, where f_Y(y) is the probability density function of the folded normal, which combines contributions from the positive and negative parts of the underlying normal density: f_Y(y) = f_X(y) + f_X(-y) for y > 0. Substituting yields \psi_Y(t) = \int_0^\infty e^{ity} f_X(y) \, dy + \int_0^\infty e^{ity} f_X(-y) \, dy, and each integral corresponds to the characteristic function of an unnormalized truncated normal distribution, expressible in terms of the complex-valued normal CDF after completing the square in the exponent. In the special case where \mu = 0, the distribution reduces to the half-normal, and the characteristic function simplifies to \psi_Y(t) = 2 e^{-\frac{1}{2}\sigma^2 t^2} \Phi\left(i \sigma t \right). This form highlights the role of the imaginary error function in capturing the transform for nonnegative support. The moment-generating function M_Y(t) = \mathbb{E}[e^{tY}] is obtained by analytic continuation as M_Y(t) = \psi_Y(-it), yielding M_Y(t) = e^{\mu t + \frac{1}{2}\sigma^2 t^2} \left[1 - \Phi\left(-\frac{\mu}{\sigma} - \sigma t \right) \right] + e^{-\mu t + \frac{1}{2}\sigma^2 t^2} \left[1 - \Phi\left(\frac{\mu}{\sigma} - \sigma t \right) \right], which exists for all real t due to the subexponential tails of the distribution. The cumulant-generating function is then \log M_Y(t), from which cumulants can be derived as successive derivatives at t=0.[5] The characteristic function facilitates the computation of higher-order moments via Taylor expansion: the k-th raw moment is \mathbb{E}[Y^k] = i^{-k} \frac{d^k}{dt^k} \psi_Y(t) \bigg|_{t=0}, providing a general method to verify explicit moment formulas and analyze asymptotic behavior through series expansions or saddlepoint approximations.[5]Parameter Estimation
Method of Moments
The method of moments (MOM) estimation for the folded normal distribution equates the sample mean \bar{y} and the sample second raw moment m_2 = \frac{1}{n} \sum_{i=1}^n y_i^2 to the corresponding population moments E[Y] and E[Y^2] = \mu^2 + \sigma^2.[10] This yields the system: \bar{y} = \hat{\mu} \left(2\Phi\left(\frac{\hat{\mu}}{\hat{\sigma}}\right) - 1\right) + \hat{\sigma} \sqrt{\frac{2}{\pi}} \exp\left(-\frac{\hat{\mu}^2}{2\hat{\sigma}^2}\right), m_2 = \hat{\mu}^2 + \hat{\sigma}^2, where \Phi is the cumulative distribution function of the standard normal distribution.[10] To obtain the estimators \hat{\mu} and \hat{\sigma}, first compute the ratio r = \bar{y}^2 / m_2 and solve numerically for \hat{\theta} = \hat{\mu}/\hat{\sigma} from r = \frac{\left[ \theta \left(2\Phi(\theta) - 1\right) + \sqrt{\frac{2}{\pi}} \exp\left(-\frac{\theta^2}{2}\right) \right]^2}{\theta^2 + 1}. Iteration or table lookup may be required to find \hat{\theta}, after which \hat{\sigma}^2 = m_2 / (\hat{\theta}^2 + 1) and \hat{\mu} = \hat{\theta} \hat{\sigma}.[10] An alternative approach, Method II, uses the second and fourth sample raw moments m_2 and m_4 = \frac{1}{n} \sum_{i=1}^n y_i^4. Compute B = m_2^2 / m_4 and solve the equation (1 + \theta^2)^2 B = \theta^4 + 6 \theta^2 + 3 numerically for \hat{\theta} > 0. Then, \hat{\sigma}^2 = m_2 / (1 + \hat{\theta}^2) and \hat{\mu} = \hat{\theta} \hat{\sigma}. This method is more efficient than the first for \theta > 0.62.[10] The MOM estimators are consistent as n \to \infty, owing to the law of large numbers applied to the sample moments and the continuous mapping theorem.[11] They exhibit bias for finite samples, arising from the nonlinear dependence in the moment equations.[12] Asymptotic variances, including for \hat{\theta}, can be derived via the delta method on the sample moments, with approximate expressions such as \operatorname{var}(\hat{\theta}) \approx \operatorname{var}(r) / (dr/d\theta)^2.[10] In the special case where \mu = 0 is assumed (reducing to the half-normal distribution), the estimator simplifies to \hat{\sigma} = \bar{y} / \sqrt{2/\pi}.[10]Maximum Likelihood Estimation
The likelihood function for an independent sample y_1, \dots, y_n (y_i \geq 0) from the folded normal distribution is given by L(\mu, \sigma \mid \mathbf{y}) = \prod_{i=1}^n \frac{ \phi\left( \frac{y_i - \mu}{\sigma} \right) + \phi\left( \frac{y_i + \mu}{\sigma} \right) }{ \sigma }, where \phi denotes the standard normal probability density function. The corresponding log-likelihood function can be expressed as \ell(\mu, \sigma) = -\frac{n}{2} \log (2\pi \sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^n (y_i - \mu)^2 + \sum_{i=1}^n \log \left( 1 + \exp\left( -\frac{2 \mu y_i}{\sigma^2} \right) \right). Maximizing \ell(\mu, \sigma) with respect to \mu and \sigma > 0 yields the maximum likelihood estimators (MLEs) \hat{\mu} and \hat{\sigma}, but the score equations lack a closed-form solution and must be solved numerically.[13] One effective numerical approach is the expectation-maximization (EM) algorithm, which treats the underlying signs of the normal variates as missing data. In the E-step, compute the conditional expectations E[s_i \mid y_i, \mu^{(t)}, \sigma^{2(t)}] = \frac{ \exp\left( \frac{2 \mu^{(t)} y_i }{ \sigma^{2(t)} } \right) - 1 }{ \exp\left( \frac{2 \mu^{(t)} y_i }{ \sigma^{2(t)} } \right) + 1 }, where s_i = \pm 1 indicates the sign of the latent normal variable. In the M-step, update \mu^{(t+1)} = \frac{1}{n} \sum_{i=1}^n y_i \, E[s_i \mid y_i, \mu^{(t)}, \sigma^{2(t)}], \quad \sigma^{2(t+1)} = \frac{1}{n} \sum_{i=1}^n y_i^2 - \left( \mu^{(t+1)} \right)^2. Iterate these steps from suitable initial values (e.g., method-of-moments estimates) until convergence, such as when the change in \ell is below a small threshold like $10^{-6}.[14] Alternatively, direct optimization of the log-likelihood using unconstrained numerical methods, such as the Newton-Raphson or Nelder-Mead simplex algorithm, avoids the EM framework while enforcing \sigma > 0 via reparameterization (e.g., optimizing over \log \sigma). Good starting values, like the sample mean and standard deviation, help ensure convergence to the global maximum.[15] When \mu = 0, the folded normal reduces to the half-normal distribution, and the MLE simplifies to \hat{\sigma}^2 = n^{-1} \sum_{i=1}^n y_i^2. The MLEs \hat{\mu} and \hat{\sigma} are asymptotically efficient and normally distributed as n \to \infty, with asymptotic covariance given by the inverse Fisher information matrix.[13] Computationally, the log-likelihood is non-convex, and for small n, multiple local maxima can occur, necessitating robust initialization and validation (e.g., via multiple starts or profile likelihoods) to identify the global solution.[15]Applications and Related Concepts
Practical Applications
In statistics, the folded normal distribution is commonly used to model the absolute values of errors or deviations, such as in regression residuals where only magnitudes are observed without signs.[16] It also finds application in metrology for characterizing measurement uncertainties, exemplified by its use in analyzing the magnitude of deviations in automobile strut alignments.[1] In reliability engineering, the folded normal distribution serves as a model for failure times or wear processes exhibiting symmetric behavior around a mean but constrained to non-negative values, such as in fitting bus-motor failure data to assess component lifetimes.[17] It has been incorporated into reliability analysis techniques, including learning functions for estimating failure probabilities in structural systems via Kriging methods.[18] In finance, the folded normal distribution has been applied in value-at-risk calculations and asymmetric multivariate stochastic volatility models.[19][20] Relatedly, its special case, the half-normal distribution, supports probability-severity risk matrices by quantifying the magnitude of financial losses from symmetric underlying risks.[21] In biology and ecology, the folded normal distribution describes absolute differences in evolutionary traits, such as changes in body size between ancestral and descendant species, capturing the folded nature of directional selection outcomes.[22] It has also been applied in simulations of migratory behavior to model mating preferences based on distance-related traits in bird populations transitioning from migratory to resident states.[23] Software implementations facilitate practical use of the folded normal distribution. In Python, the SciPy library provides thescipy.stats.foldnorm class for generating random variates, computing density functions, and performing simulations with parameters for shape, location, and scale.[24] In R, the VGAM package includes functions like dfoldnorm and rfoldnorm for density evaluation, random generation, and fitting to data in applied scenarios.[25]
The folded normal distribution is particularly effective for moderate ratios of the underlying normal mean μ to standard deviation σ, where it approximates symmetric positive deviations without excessive skewness.[1] For datasets exhibiting heavy tails or strong positive skewness, alternatives like the lognormal distribution provide better fits, as they accommodate multiplicative processes more naturally while remaining supported on positive reals.[26]