Estimation statistics
Estimation statistics is a data analysis framework that emphasizes estimating the magnitude and precision of effects using effect sizes and confidence intervals (CIs), rather than dichotomous decisions from null hypothesis significance testing (NHST).[1] Developed as part of "the new statistics," it addresses limitations of traditional inferential statistics by focusing on quantification of uncertainty and practical significance.[2] Originating in the late 20th century and gaining prominence in the 21st with proponents like Geoff Cumming, it promotes tools such as precision planning for study design and visualizations like the Gardner–Altman plot to aid interpretation. At its core, estimation statistics relies on point and interval estimation to approximate population parameters from sample data, but prioritizes effect size measures (e.g., Cohen's d) alongside CIs to convey the plausibility of different effect magnitudes.[2] For example, a 95% CI for an effect size provides a range of values compatible with the data, highlighting uncertainty without binary rejection or acceptance of a null hypothesis. This approach aligns with evidence-based decision making in fields like psychology, medicine, and social sciences, where understanding effect precision informs real-world applications.[3] While rooted in classical estimation methods, estimation statistics critiques the overreliance on p-values and advocates for reporting full estimation results to avoid misinterpretations.[1] It encourages Bayesian perspectives in some contexts but primarily uses frequentist CIs, balancing bias and variance to enhance reliability in high-stakes research. As of 2025, it continues to influence open science practices through software like esci (Estimation Statistics with Confidence Intervals).[4]Introduction
Core Concepts
Estimation statistics represents a paradigm shift in statistical inference, emphasizing the direct estimation of population parameters, effect sizes, and associated uncertainties over the traditional reliance on null hypothesis significance testing (NHST). This approach seeks to provide more informative and nuanced insights into data by focusing on the magnitude and precision of effects rather than binary decisions about whether to reject a null hypothesis. By prioritizing estimation, researchers can better quantify the strength of relationships or differences in a study, facilitating practical interpretation and decision-making in fields such as psychology, medicine, and social sciences. Central to estimation statistics are principles that promote the assessment of estimate precision and the avoidance of dichotomous outcomes. Precision is evaluated through measures that indicate how closely a sample-based estimate approximates the true population value, often visualized via intervals that capture a range of plausible values. Compatibility intervals, proposed as an alternative framing for confidence intervals, represent the set of parameter values deemed compatible with the observed data at a specified level (e.g., 95%), shifting emphasis from long-run frequency properties to direct evidential support. This principle discourages interpretations like "significant" or "not significant," which can oversimplify evidence and lead to misleading conclusions, instead encouraging a continuous evaluation of uncertainty and effect magnitude. Key terminology in estimation statistics includes point estimates and interval estimates. A point estimate is a single value derived from sample data that serves as the best guess for an unknown population parameter; for instance, the sample mean \bar{x} estimates the population mean \mu. Interval estimates extend this by providing a range around the point estimate to quantify uncertainty, such as a confidence interval, which helps assess the reliability of the evidence supporting the estimate. These tools enable researchers to report not just a central tendency but also the degree of precision, promoting a more comprehensive understanding of the data's implications. Consider an example where a clinical trial estimates the effect size of a new treatment compared to a control, yielding a point estimate of 0.45 with a 95% confidence interval of [-0.2, 1.1]. This interval overlaps with zero, indicating that the data are compatible with no effect, but rather than declaring the result "non-significant," estimation statistics highlights the potential for a positive effect while underscoring the imprecision due to the wide range. A foundational component in constructing interval estimates is the standard error of the mean (SE), which measures the variability of the sample mean as an estimate of the population mean. The formula is given by \text{SE} = \frac{\sigma}{\sqrt{n}}, where \sigma is the population standard deviation and n is the sample size.[5] In practice, \sigma is often replaced by the sample standard deviation s when the population value is unknown. The SE decreases with larger n, reflecting improved precision, and is used to build intervals, such as \bar{x} \pm z \cdot \text{SE} for a 95% confidence interval where z \approx 1.96.[6] This quantifies how sample data inform the likely range for the population parameter, central to the estimation paradigm.Relation to Inferential Statistics
Inferential statistics encompasses methods for drawing conclusions about a population from a sample, with estimation forming one of its primary pillars alongside hypothesis testing.[7] Point estimation yields a single value approximating an unknown population parameter, such as the sample mean as an estimate of the population mean, while interval estimation provides a range likely containing the parameter, incorporating uncertainty through measures like standard errors.[7] This dual approach enables researchers to quantify both central tendencies and the precision of inferences, distinguishing estimation from mere data summarization by extending results beyond the observed sample to the broader population.[8] Estimation statistics is firmly rooted in the frequentist framework, where confidence intervals are defined by their long-run frequency properties: across repeated samples from the same population, a specified proportion—such as 95%—of these intervals will contain the true parameter value.[9] This interpretation emphasizes the procedure's reliability over any probability statement about a particular interval, aligning with Jerzy Neyman's foundational work on statistical estimation.[9] In contrast, Bayesian credible intervals offer a subjective probability that the parameter falls within the interval given the data and prior beliefs, though without delving into their computational details here.[10] A key application of estimation lies in evidence synthesis, particularly meta-analysis, where effect sizes from individual studies are pooled to derive a more precise overall estimate of an intervention's impact.[11] For instance, in analyses of continuous outcomes, the standardized mean difference—Cohen's d—serves as a common effect size metric, computed asd = \frac{M_1 - M_2}{SD_{\text{pooled}}}
where M_1 and M_2 are the means of the two groups, and SD_{\text{pooled}} is the pooled standard deviation; these d values are then weighted and combined across studies to assess cumulative evidence. This pooling enhances statistical power and generalizability, allowing estimation to integrate diverse findings into a cohesive summary.[11] Estimation differs fundamentally from descriptive statistics by inferring population characteristics rather than merely describing sample features, relying on variability metrics like confidence intervals to gauge the reliability of extrapolations.[8] Although it shares the frequentist foundations of Neyman-Pearson theory—developed for optimal hypothesis testing—estimation shifts emphasis from rejecting null hypotheses via test statistics to directly reporting intervals that capture parameter uncertainty.[12] This focus promotes a more nuanced view of evidence, prioritizing effect magnitudes and precision over binary decisions.[12]