Log-normal distribution
In probability theory, the log-normal distribution is a continuous probability distribution defined for positive real numbers, where the natural logarithm of the random variable follows a normal distribution.[1] It is parameterized by two values: μ (the mean of the underlying normal distribution) and σ (its standard deviation, with σ > 0), such that if Y ~ N(μ, σ²), then X = e^Y has a log-normal distribution.[2] The probability density function is given byf(x; \mu, \sigma) = \frac{1}{x \sigma \sqrt{2\pi}} \exp\left( -\frac{(\ln x - \mu)^2}{2\sigma^2} \right)
for x > 0, and zero otherwise.[1] Key statistical properties distinguish the log-normal distribution from the normal distribution, as it is inherently right-skewed and cannot take negative values, making it suitable for modeling multiplicative processes or phenomena bounded below by zero.[3] The expected value (mean) is E[X] = e^{μ + σ²/2}, while the variance is Var(X) = (e^{σ²} - 1) e^{2μ + σ²}, both of which depend exponentially on the parameters and highlight the distribution's sensitivity to σ for larger spreads.[1] Unlike the normal distribution, it lacks a closed-form moment-generating function but has a cumulative distribution function expressed via the standard normal CDF: F(x) = Φ((ln x - μ)/σ), where Φ is the cumulative distribution function of the standard normal.[2] These properties arise from the exponential transformation, which stretches the positive tail and compresses values near zero. The log-normal distribution was first formally described in 1879 by Francis Galton and Lindsay McAlister in the context of velocity distributions, building on earlier observations of skewed data patterns dating back to the 19th century.[4] It has since become a foundational model in various fields due to its ability to capture real-world data exhibiting multiplicative effects, such as growth rates or error accumulation.[5] In finance, it underpins the modeling of stock prices and asset returns under assumptions like geometric Brownian motion, where returns are normally distributed but prices are log-normally distributed.[3] In reliability engineering, it describes failure times for systems subject to fatigue, corrosion, or degradation, such as cycles-to-failure in materials or repair durations in maintenance.[6] Biological applications include modeling organism sizes, population growth, or species abundance, while in environmental science, it fits distributions like particle sizes or pollutant concentrations (e.g., radon levels in homes).[3] These uses leverage its flexibility for positive, skewed data, often validated through logarithmic transformation to normality.