Fact-checked by Grok 2 weeks ago

Geometric standard deviation

The geometric standard deviation (GSD) is a measure of for , serving as the multiplicative analog to the arithmetic standard deviation when the is the appropriate , especially for lognormally distributed data. It quantifies variability by exponentiating the standard deviation of the natural logarithms of the data, yielding a unitless factor greater than or equal to 1 that describes how much the data spreads multiplicatively around the geometric mean—for instance, a GSD of 1.2 indicates that approximately 68% of the values lie between the geometric mean divided by 1.2 and multiplied by 1.2 in lognormal distributions. Introduced by biostatistician T.B.L. Kirkwood in to address limitations of arithmetic measures for skewed, ratio-based data, the GSD transforms the dataset via logarithms to normalize it before applying standard deviation calculations, then back-transforms the result to preserve the original scale's multiplicative nature. The formal definition for a sample x_1, x_2, \dots, x_n > 0 involves computing the sample standard deviation s of y_i = \ln(x_i), followed by \text{GSD} = e^s, where the natural logarithm ensures the measure is invariant to proportional scaling of the data. This approach contrasts with the arithmetic standard deviation, which is additive and suited to normal distributions, as the GSD cannot be added or subtracted but instead multiplies or divides the to form confidence-like intervals. Key properties of the GSD include its dimensionless quality and minimum value of 1 (achieved when all data are identical), making it ideal for expressing relative variability in percentages via the geometric coefficient of variation, defined as $100(\text{GSD} - 1)\%. In practice, software implementations like , , and compute it directly from log-transformed data, often adjusting for in sample estimates. The measure assumes lognormality for optimal interpretability, where it captures about two-thirds of the data within the factor bounds, but it can be applied more broadly to positive skewed datasets with caution. Applications of the GSD span fields requiring analysis of multiplicative processes, such as environmental science for pollutant concentrations (e.g., reporting geometric means for radionuclides like ^{210}\text{Pb} at 0.52 mBq m^{-3}), aerosol engineering for particle size distributions in pharmaceuticals (where GSD = d_{84}/d_{50} or similar percentiles define spread), and finance for modeling investment returns or compounded growth rates. In biomedical research, it evaluates assay variability and bioequivalence, such as inter-laboratory differences in drug potency, while in demography, it assesses population growth fluctuations over time. These uses highlight its utility in summarizing data where ratios or percentages dominate, ensuring interpretations remain proportional rather than absolute.

Core Concepts

Definition

The geometric standard deviation (GSD) is a measure of applicable to sets of , especially those exhibiting multiplicative variability or following a . It quantifies the spread of data on a multiplicative by taking the of the standard deviation of the natural logarithms of the data values, thereby transforming the additive spread in the logarithmic domain back to the original . For a sample of n positive values x_1, x_2, \dots, x_n > 0, the GSD is calculated as \sigma_g = \exp\left( \sqrt{\frac{1}{n-1} \sum_{i=1}^n (\ln x_i - \mu_g)^2} \right), where \mu_g = \frac{1}{n} \sum_{i=1}^n \ln x_i is the of the logarithms (also known as the log-mean). In the population context, for a log-normal with parameters \mu (mean of the logarithms) and \sigma (standard deviation of the logarithms), the GSD simplifies to \sigma_g = \exp(\sigma). Intuitively, the GSD represents a multiplicative factor indicating how much the spread around the ; for instance, roughly 68% of the observations lie within a factor of \sigma_g above or below the when the logarithms are normally distributed. This contrasts with additive interpretations of spread in other measures. The term "geometric standard deviation" was introduced by T. B. L. Kirkwood in within the framework of log-normal .

Properties

The geometric standard deviation (GSD) is scale-invariant, meaning that if all data points are multiplied by a positive constant k > 0, the GSD remains unchanged, thereby preserving the relative ratios among the data values. This property stems from its definition as the exponential of the standard deviation of the natural logarithms of the data, which converts multiplicative scaling into an additive shift in the log-space without affecting the dispersion measure. The sample GSD is a biased of the population , generally biased downward in finite samples because the underlying sample standard deviation of the log-transformed underestimates the population value. An unbiased for the variance in the log-space uses the divisor n-1, but correcting the GSD itself for bias requires an adjustment factor that accounts for the nonlinearity of the ; approximate methods suffice for larger n. Confidence intervals for the GSD can be constructed using Fieller's theorem for parameters involving ratios on the log-scale or nonparametric bootstrap resampling of the log-transformed data. For related parameters like the \mu_g, an approximate 95% is given by \exp\left( \mu_g \pm \frac{1.96 \sigma}{\sqrt{n}} \right), where \sigma is the standard deviation of the logs and n is the sample size; similar log-scale transformations apply to derive intervals for the GSD by exponentiating bounds on the log-dispersion. On the log-scale, the GSD exhibits symmetry analogous to the arithmetic standard deviation for data, as it directly quantifies the spread of the logarithms, rendering it appropriate for positively skewed datasets where logs approximate . The GSD has well-defined limits: it equals 1 when all data values are identical, reflecting zero on the log-scale, and approaches infinity as the variance of the log-transformed data grows without bound.

Mathematical Foundations

Derivation

The geometric standard deviation addresses the limitations of the standard deviation when dealing with positive that exhibit multiplicative variability, such as rates or concentrations, where relative changes are more relevant than absolute ones. For such skewed distributions, a logarithmic normalizes the , converting products into sums and enabling the standard deviation to capture relative effectively on the transformed scale. This approach is particularly justified for approximately following a , where the logs are normally distributed, allowing standard statistical tools to measure spread in a way that translates to multiplicative factors on the original scale. To derive the formula, begin with a sample of n positive observations x_1, x_2, \dots, x_n > 0. Apply the natural logarithm to each: y_i = \ln x_i for i = 1, \dots, n. Compute the sample mean of the transformed values: \bar{y} = \frac{1}{n} \sum_{i=1}^n y_i. The sample standard deviation of the y_i is then s_y = \sqrt{ \frac{1}{n-1} \sum_{i=1}^n (y_i - \bar{y})^2 }. The geometric standard deviation \sigma_g is obtained by exponentiating this quantity: \sigma_g = e^{s_y}. This step reverses the transformation, yielding a unitless factor that quantifies the typical multiplicative deviation from the e^{\bar{y}}, analogous to how the standard deviation measures additive spread. This derivation assumes all data points are strictly positive, as the logarithm is for non-positive values; in cases involving zeros, they are typically excluded from the calculation or handled by adding a small positive constant before transformation to approximate the limit behavior. An alternative views the geometric standard deviation through the population variance of the logs. For a X > 0 where \ln X has variance \mathrm{Var}(\ln X) = \sigma^2, the geometric standard deviation satisfies \ln \sigma_g = \sqrt{\mathrm{Var}(\ln X)}, so \sigma_g = e^{\sqrt{\mathrm{Var}(\ln X)}}. This formulation connects to the of the , where the second derivative at zero yields the variance of \ln X, confirming the exponential relationship for multiplicative scale. Under the assumption that the log-transformed data y_i follow a , the sample variance s_y^2 = \frac{1}{n-1} \sum_{i=1}^n (y_i - \bar{y})^2 is an unbiased of the population log-variance \sigma^2, as (n-1)s_y^2 / \sigma^2 follows a with n-1 , ensuring E[s_y^2] = \sigma^2. Consequently, s_y provides an unbiased basis for estimating \sigma_g = e^{s_y}, though the exponential introduces slight in the final estimate.

Relationship to Log-Normal Distribution

The geometric standard deviation (GSD) serves as a key parameter in the log-normal distribution, which models positive random variables subject to multiplicative effects. For a random variable X \sim \mathrm{LN}(\mu, \sigma^2), where \mu and \sigma > 0 are the location and shape parameters of the underlying normal distribution of \ln X, the geometric mean is given by \exp(\mu) and the GSD by \sigma_g = \exp(\sigma). This parameterization highlights how the GSD quantifies dispersion on the logarithmic scale, making it particularly suitable for data exhibiting multiplicative errors, such as growth processes or financial returns, where variability is proportional rather than additive. The GSD connects directly to the (CV) of the , providing a between measures of relative spread. Specifically, the CV is \sqrt{\exp(\sigma^2) - 1}, and the GSD relates to the CV by \sigma_g = \exp\left( \sqrt{ \ln (1 + \mathrm{CV}^2 ) } \right). This relation underscores the GSD's role in capturing the factor by which observations deviate multiplicatively from the ; for instance, approximately 68% of the data lie within the interval [\mu_g / \sigma_g, \mu_g \sigma_g], analogous to the empirical rule for distributions but on a scale. Higher moments of the log-normal distribution can also be expressed in terms of the parameters of the underlying normal distribution, revealing the influence of \sigma on shape characteristics. The skewness is (e^{\sigma^2} + 2) \sqrt{e^{\sigma^2} - 1}, while the excess kurtosis is e^{4\sigma^2} + 2e^{3\sigma^2} + 3e^{2\sigma^2} - 3. These expressions show how larger \sigma values amplify positive skewness and leptokurtosis, common in log-normal data. Estimation of the GSD from a sample assuming log-normality follows the maximum likelihood approach for the underlying parameters. The is \hat{\sigma}_g = \exp\left( \sqrt{\frac{1}{n} \sum_{i=1}^n (\ln x_i - \hat{\mu})^2} \right), where \hat{\mu} = \frac{1}{n} \sum_{i=1}^n \ln x_i is the sample of the log-transformed . This method leverages the log-transformation to yield unbiased estimates under the log-normal assumption.

Geometric Standard Score

The geometric standard score, often denoted as the geometric z-score z_g, provides a normalized measure of how far a positive value x deviates from the \mu_g in terms of the geometric standard deviation \sigma_g. It is defined by the formula z_g = \frac{\ln(x / \mu_g)}{\ln \sigma_g}, where the logarithm is typically the natural logarithm, ensuring the score captures multiplicative relationships in the data. This formulation standardizes deviations on a , making it particularly suitable for datasets where ratios rather than differences are meaningful, such as growth rates or concentrations. The derivation of the geometric standard score stems directly from applying the standard z-score to the logarithms of the data. For a dataset following a , the logarithms \ln x are normally distributed with mean \ln \mu_g and deviation \sigma_{\ln x}, yielding the logarithmic z-score z = \frac{\ln x - \ln \mu_g}{\sigma_{\ln x}}. Since the geometric deviation is related by \sigma_g = \exp(\sigma_{\ln x}), it follows that \ln \sigma_g = \sigma_{\ln x}, so z_g = z. This equivalence preserves the probabilistic properties of the distribution while adapting to the original multiplicative scale. In interpretation, a geometric standard score of z_g = 0 indicates that x equals \mu_g, while values with |z_g| < 1 lie within one geometric standard deviation multiplicatively—meaning x is between \mu_g / \sigma_g and \mu_g \cdot \sigma_g. This makes it valuable for detecting outliers in positively skewed, positive-valued data, as it emphasizes relative rather than absolute deviations, analogous to the arithmetic z-score but for ratio-based scales. For log-normally distributed data, z_g follows a standard normal distribution, enabling the use of normal quantiles for confidence intervals or thresholds; for instance, approximately 95% of values satisfy |z_g| < 1.96. Confidence bands can thus be constructed as x \in \mu_g \exp(z_g \ln \sigma_g), providing multiplicative intervals. As an example, consider x = 10, \mu_g = 5, and \sigma_g = 1.5. Substituting into the formula gives z_g = \frac{\ln(10 / 5)}{\ln 1.5} = \frac{\ln 2}{\ln 1.5} \approx \frac{0.693}{0.405} \approx 1.71, indicating that 10 is about 1.71 geometric standard deviations above the geometric mean, or roughly three times larger than \mu_g adjusted for the spread.

Comparison to Arithmetic Measures

The arithmetic standard deviation (ASD) quantifies the additive spread of data points around the arithmetic mean, measuring absolute deviations in the original scale, whereas the geometric standard deviation (GSD) quantifies the multiplicative spread around the geometric mean, capturing relative or proportional variations, which is particularly suitable for datasets where ratios exceed 1. In formula terms, the ASD is given by \sigma = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2}, where \bar{x} is the , allowing computation on any real-valued data but sensitive to outliers and zeros; the GSD, defined as e^{s} with s as the standard deviation of the logarithms of the data, circumvents issues with zeros or negatives by requiring strictly positive values but interprets dispersion as a unitless factor. The GSD is preferable for log-skewed or lognormally distributed data, such as income distributions or biological measurements like cell sizes, where multiplicative effects dominate, while the ASD is more appropriate for symmetric, Gaussian-distributed data exhibiting additive variability. The GSD relates to the coefficient of variation (CV, defined as ASD divided by the arithmetic mean) through an approximation where GSD ≈ 1 + CV for small variability, reflecting their shared focus on relative dispersion, though the GSD emphasizes multiplicative scaling. Key limitations of the GSD include its undefined status for non-positive data and less intuitive interpretation for absolute errors compared to the ASD, which remains applicable across broader data types despite potential distortion by skewness. For the dataset {1, 2, 4}, the ASD is approximately 1.53 (measuring additive spread around the arithmetic mean of 2.33), while the GSD is exactly 2 (indicating that data points are spread by a multiplicative factor of 2 around the geometric mean of 2).

Applications

In Probability and Statistics

In probability and statistics, the geometric standard deviation (GSD) plays a key role in hypothesis testing for log-normal data, particularly through adaptations of t-tests that account for the multiplicative nature of the distribution. The log-normal t-test compares geometric means between two groups by first log-transforming the data and then applying a standard t-test on the logs, assuming equal GSDs across groups; this assumption is tested using an F-test on the log-transformed variances, where unequal GSDs (indicated by a small P-value) suggest differences in spread as significant as those in location. When GSDs differ, the Welch log-normal t-test is preferred, as it adjusts for unequal variances on the log scale, providing better control of type I error rates and higher power, especially with imbalanced sample sizes or moderate to large GSDs (e.g., GeoSD > 2). These methods outperform direct application of normal t-tests on untransformed data, which can reduce power by up to 50% for GeoSDs around 4 and effect sizes of threefold differences. For constructing confidence intervals in Bayesian frameworks, GSD informs the parameterization of log-normal priors, where the prior on the log-scale standard deviation (σ) corresponds to a GSD of e^σ, enabling credible intervals that bound the while respecting the distribution's asymmetry. This approach is particularly useful for inference on ratios or multiplicative effects, as the resulting intervals are asymmetric and naturally constrained to positive values, aligning with the log-normal's properties. The GSD exhibits greater robustness to outliers than the arithmetic standard deviation (ASD) in heavy-tailed log-normal distributions, as the log transformation compresses extreme values, reducing their leverage on the measure of . For instance, in datasets with GeoSDs of 3 or higher, the GSD remains more stable relative to the ASD because outliers contribute less to the variance on the log scale. In simulation methods, log-normal samples are generated with a specified GSD for estimation of variances in probabilistic models, by setting the log-scale standard deviation as ln(GSD) in generators. This facilitates variance estimation for statistics like portfolio risks or process yields, where repeated sampling (e.g., 10,000 iterations) with fixed GSD approximates the distribution's tail behavior for reliable . Such simulations are essential for validating procedures under log-normality, ensuring that estimated variances converge to true values even for GeoSDs up to 5. Implementations of GSD are available in statistical software for both computation and simulation. In R, the EnvStats package provides the geoSD() function to calculate the sample GSD as the exponential of the standard deviation of log-transformed data, supporting robust estimation for positive-valued vectors. For simulation, R's rlnorm() generates log-normal samples by specifying sdlog = ln(GSD). In Python, SciPy's stats.gstd() computes the GSD directly, while stats.lognorm.rvs(s=ln(GSD), scale=geometric_mean) produces samples for applications, integrating seamlessly with for large-scale variance computations.

In Finance and Other Fields

In finance, the geometric standard deviation (GSD) serves as a measure of for asset returns, particularly when analyzing multiplicative changes in stock prices through daily multipliers (1 + r_i, where r_i is the return). For instance, given daily multipliers of {1.01, 0.99, 1.02}, the GSD approximates 1.015, indicating a 1.5% multiplicative that captures the compounded variability in returns. This approach aligns with the Black-Scholes model, where is parameterized as the standard deviation of log-returns (ln(1 + r_i)), equivalent to the natural logarithm of the GSD for price multipliers, enabling accurate option pricing under log-normal assumptions. In and , GSD quantifies variability in growth rates, such as bacterial replication s, where cell counts often follow log-normal distributions due to multiplicative processes. For example, fluorescence-based estimates of bacterial densities use s and GSDs to account for the skewed, positive of microbial growth data, providing robust summaries within a multiplicative of approximately 3 (GSD ≈ 3.06). Similarly, in environmental , GSD describes the spread of concentrations, like indoor PM2.5 levels with a geometric mean of 41.1 μg/m³ and GSD of 1.3 in urban settings, or CO at 4.9 ppm with GSD 4.3 in rural areas, highlighting log-normal variability from sources like biomass burning. In engineering reliability analysis, GSD indicates the spread in failure times modeled as log-normal distributions, where the parameter reflects multiplicative uncertainty in component lifespans. For physical systems like electronic devices, the GSD derived from the standard deviation of log-transformed failure times helps assess dispersion in time-to-failure, supporting predictions of reliability under exponential-like degradation but with positive skew. Beyond these domains, GSD applies to particle size distributions in physics, where it parameterizes the width of log-normal spectra, such as geometric standard deviations around 2.0 for atmospheric particles influencing formation. In economics, it measures in wages or , treating distributions as log-normal to capture relative disparities; for instance, multiplicative models use GSD alongside geometric means to evaluate skewed data more naturally than counterparts. The GSD's advantage over arithmetic standard deviation lies in its inherent handling of percentage or multiplicative changes, preserving scale-invariance for positive, skewed data like returns or concentrations, thus providing a more intuitive measure of relative variability.

References

  1. [1]
    gstd — SciPy v1.16.2 Manual
    ### Definition and Formula for Geometric Standard Deviation
  2. [2]
    The geometric mean and ... - GraphPad Prism 10 Statistics Guide
    The geometric SD is a value you always multiply or divide by. This is very different than a ordinary SD which has the same units as the data, so can be added to ...
  3. [3]
    Compute the geometric mean, geometric standard deviation, and ...
    Oct 2, 2019 · The quantity GSD = exp(σ) is defined to be the geometric standard deviation. The sample estimate is exp(s), where s is the standard deviation of ...
  4. [4]
    Geometric Standard Deviation - an overview | ScienceDirect Topics
    Geometric standard deviation (σ_g) is defined as a measure of the spread of a lognormal distribution, calculated using the diameters at which specific ...
  5. [5]
    [PDF] Better than Average: Calculating Geometric Means Using SAS
    Examples are varied and include examining compounded interest rates or returns on investments, assessing population changes in longitudinal data, or ...<|control11|><|separator|>
  6. [6]
    Page non trouvée | Physiologie et Thérapeutique
    - **Status**: Insufficient relevant content.
  7. [7]
    [PDF] Geometric Means and Measures of Dispersion
    The geometric mean of X is e", and let us define the geometric standard deviation (GSD) to be e°. (Note that it would make no difference if logarithms and ...
  8. [8]
    [PDF] Industrial Hygiene Exposure Assessment- Data Analysis and ...
    ... sample geometric standard deviation (gsd) sample median (x) sample 95th ... bias the sample GM upward and the sample GSD downward. There are various ...
  9. [9]
    Estimate of confidence intervals for geometric mean diameter and ...
    Aug 9, 2025 · For both geometric mean diameter and geometric standard deviation, the confidence intervals were calculated so that both values of population ...
  10. [10]
    What Does It “Mean”? A Review of Interpreting and Calculating ... - NIH
    Estimation of the geometric standard deviation (for Ln-transformed values). The following equation describes a method for obtaining an estimate [3] of the ...
  11. [11]
    1.3 - Unbiased Estimation | STAT 415 - STAT ONLINE
    In summary, we have shown that, if X i is a normally distributed random variable with mean μ and variance σ 2 , then S 2 is an unbiased estimator of σ 2 . It ...
  12. [12]
    Log-normal Distribution | Real Statistics Using Excel
    Log-normal Distribution. Basic Concepts. Definition 1: A random variable x ... geometric standard deviation” without clear specification! Reply. Charles.<|control11|><|separator|>
  13. [13]
    gzscore — SciPy v1.16.2 Manual
    Compute the geometric z score of each strictly positive value in the sample, relative to the geometric mean and standard deviation.
  14. [14]
    [PDF] Interpreting z-scores derived from log-transformed data
    Dec 18, 2004 · Regardless of this, z-scores based on log- transformed data can still be treated as symmetric: a z- score of –3.5 has the same importance as ...
  15. [15]
    1.3.6.6.9. Lognormal Distribution - Information Technology Laboratory
    The case where θ = 0 and m = 1 is called the standard lognormal distribution. The case where θ equals zero is called the 2-parameter lognormal distribution. The ...
  16. [16]
    [PDF] Mean, What do you Mean?
    The term geometric standard deviation was introduced in a letter to the Editor of Biometrics by ... Plackett, R. L. (1958), “Studies in the History of Probability ...Missing: origin | Show results with:origin
  17. [17]
    Lognormal t test (and Welch ... - GraphPad Prism 10 Statistics Guide
    The lognormal unpaired t test assumes that the two populations have the same geometric standard deviation (GeoSD). You can choose the Welch lognormal t test ...
  18. [18]
    Analyzing lognormal data: A nonmathematical practical guide
    A lognormal distribution with a small GeoSD looks very similar to a normal distribution with a CV = GeoSD -1 (Haeckel and Wosniok, 2010).
  19. [19]
    [PDF] Bayesian Inference for Mean of the Lognormal Distribution
    In this paper Bayes estimator and the associated credible intervals are derived for the mean of lognormal distribution.Missing: GSD | Show results with:GSD
  20. [20]
    [PDF] Bayesian Confidence Intervals for the Ratio of Means of Lognormal ...
    In this paper we consider a variety of issues from a Bayesian perspective, namely, the problem of constructing Bayesian confidence intervals for the ratio of ...Missing: GSD | Show results with:GSD
  21. [21]
    Chapter 18 The lognormal distribution | JABSTB
    Likewise, the parameter eσ e σ is the geometric standard deviation, and as shown above it is not equivalent to the arithmetic standard deviation ofX X . What ...
  22. [22]
    [PDF] Monte Carlo comparison of estimation methods for the two ...
    Abstract. In the paper we compare performance of estimation methods for the two-parameter lognormal distribution via the Monte Carlo simulation.
  23. [23]
    Geometric Standard Deviation. - R
    The sample geometric standard deviation is the antilog of the sample standard deviation of the log-transformed observations.Missing: applications | Show results with:applications
  24. [24]
    The Changing Nature of Stock and Bond Volatility
    Jan 2, 2019 · The inflation-adjusted geometric standard deviation of bonds was 30 percent higher than the nominal standard geometric deviation for the 1871– ...
  25. [25]
    [PDF] SOME DRAWBACKS OF BLACK-SCHOLES - NYU Stern
    A key input to the Black-Scholes formula is σ, the standard deviation of the stock's continuously compounded rate of return. (The continuously compounded ...
  26. [26]
    Robust estimation of bacterial cell count from optical density - Nature
    Sep 17, 2020 · Fluorescence values are analyzed geometric mean and geometric standard deviation, rather than the more typical arithmetic statistics, due to the ...
  27. [27]
    Indoor air pollution concentrations and cardiometabolic health ...
    The geometric mean (GM) daily average indoor PM2.5 concentration was 41.1 μg/m3 (geometric standard deviation [GSD] 1.3) in Lima, 35.8 μg/m3 (GSD 1.4) in Tumbes ...
  28. [28]
    8.1.6.4. Lognormal - Information Technology Laboratory
    Note: If time to failure, t f , has a lognormal distribution, then the (natural) logarithm of time to failure has a normal distribution with mean μ = ln T 50 ...Missing: geometric engineering<|separator|>
  29. [29]
    [PDF] A Study of the Application of the Lognormal Distribution to ... - DTIC
    Nov 1, 2024 · The usual mathematical formulation of availability assumes an exponential distribution for failure and repair times.
  30. [30]
    [PDF] Some Useful Formulae for Particle Size Distributions and Optical ...
    Jul 26, 2025 · median volume and σv the geometric standard deviation. ... which is true for a log-normal distribution. For a multi-mode log-normal distribution ...Missing: parameterization | Show results with:parameterization
  31. [31]
    Multiplicative parameters and estimators: applications in economics ...
    Oct 19, 2015 · We discuss multiplicative models, in which the geometric mean and the geometric standard deviation are more natural than their arithmetic ...
  32. [32]
    Geometric Mean - an overview | ScienceDirect Topics
    The geometric mean has an advantage over the arithmetic mean in that it is less affected by extreme values in a skewed distribution; in the above example, the ...