Fact-checked by Grok 2 weeks ago

Statistic

A statistic is a numerical calculated from a sample of that describes or summarizes a of that sample, such as its , variability, or . In contrast, a is a corresponding numerical that describes a of the entire from which the sample is drawn, though parameters are typically unknown and must be estimated using statistics. This distinction is fundamental in , where sample statistics serve as estimators for population parameters to make generalizations about larger groups based on limited . Common examples of statistics include the sample mean (the average value in the sample), the sample proportion (the fraction of the sample exhibiting a particular trait), the sample median (the middle value when are ordered), and the sample standard deviation (a measure of around the mean). These are computed directly from observable sample and are essential tools in for summarizing datasets. In inferential statistics, such measures enable hypothesis testing, construction, and prediction, allowing researchers to draw reliable conclusions about populations despite sampling variability. The use of statistics is pivotal in across disciplines, as they transform into interpretable insights, quantify , and support evidence-based in fields like , , , and social sciences. For instance, in clinical trials, sample statistics such as means and proportions help evaluate treatment efficacy by estimating population-level effects. Advances in computational tools have further amplified their role, facilitating the analysis of massive datasets while maintaining rigorous statistical principles to ensure validity and .

Fundamentals

Definition

A statistic is any of the observations in a random sample drawn from a , typically denoted as T(\mathbf{X}), where \mathbf{X} = (X_1, X_2, \dots, X_n) represents the sample of n independent and identically distributed random variables from the underlying . This function transforms the raw sample into a summary value that captures essential features of the sample without invoking knowledge of the population's unknown characteristics. Statistics serve as the foundational tools in statistical analysis for describing and summarizing sample data, enabling inferences about the broader population while remaining agnostic to specific distributional assumptions beyond the sample's randomness. For instance, they provide quantifiable measures of central tendency, variability, or other properties directly from the observed values, facilitating data reduction and pattern recognition in empirical studies. The term "statistic," in its modern sense as a sample-derived distinct from a , was introduced by R. A. in his 1922 paper "On the Mathematical Foundations of Theoretical Statistics," where he described such functions as "statistical derivates... which are designed to estimate the values of the of the hypothetical ." This distinction emphasized the role of statistics in , separating observable sample summaries from fixed but unknown traits. The concept of a statistic presupposes familiarity with random variables—stochastic entities that model uncertain outcomes—and probability distributions, which specify the likelihood of those outcomes and form the basis for sampling from populations. These building blocks ensure that statistics inherit probabilistic properties from the sample, allowing for rigorous analysis of their behavior under repeated sampling.

Relation to Parameters

In statistics, a key distinction exists between a parameter and a statistic: a parameter is a fixed but typically unknown numerical characteristic that describes an entire , such as the population mean, whereas a statistic is a calculable value derived from a sample of data that summarizes the sample's properties. This separation underscores the inferential role of , as they provide observable approximations to otherwise inaccessible population parameters based on limited data. Statistics play a central role in point estimation, where they function as plug-in estimators to approximate unknown parameters; for instance, the sample mean serves as an estimator for the population mean \mu. In general, an estimator \hat{\theta} for a parameter \theta is expressed as a function of the sample data X, denoted mathematically as \hat{\theta} = T(X), where T is the estimating function that transforms the observed sample into a point estimate. This formulation allows statisticians to use sample-based computations as direct substitutes for population values in practical analysis. Within the framework of , statistics enable the approximation of through the concept of repeated sampling: under this approach, are treated as fixed unknowns, and the behavior of a statistic across hypothetical repeated samples from the provides a basis for assessing how well the statistic approximates the true value. This repeated-sampling perspective grounds the reliability of by evaluating long-run performance, thereby linking observable sample statistics to inferences about the broader .

Examples

Univariate Statistics

Univariate statistics encompass descriptive measures applied to a single variable in a sample, providing summaries of and to facilitate interpretation and . These statistics are computed directly from the observed points, offering practical insights into the without assuming an underlying beyond basic ordering or arithmetic. Common univariate statistics include measures of location such as the sample , , and , alongside metrics like the sample variance, , and . The sample , denoted \bar{X}, serves as a primary measure of , representing the arithmetic of the sample values. It is calculated using the formula \bar{X} = \frac{1}{n} \sum_{i=1}^{n} X_i, where n is the sample size and X_i are the individual observations. This statistic is particularly useful for symmetric distributions and when estimating the population \mu, as it weights each data point equally. For example, in a sample of test scores {70, 80, 90}, the sample is \bar{X} = (70 + 80 + 90)/3 = 80, indicating the average performance. The sample is sensitive to outliers, which can skew it away from the typical value. Non-parametric measures of , such as the and , are robust to outliers and do not rely on arithmetic means, making them suitable for skewed or . The is the middle value in an ordered sample, providing a measure of that divides the into equal halves. For an odd sample size n = 2m + 1, the is the (m+1)th ordered value; for an even sample size n = 2m, it is the of the mth and (m+1)th ordered values. Consider the ordered sample {1, 3, 5, 7, 9} (odd size): the is 5. For {1, 3, 5, 7} (even size), the is (5 + 3)/2 = 4, or more precisely the of the two central values. The is the value that occurs most frequently in the sample, useful for identifying peaks in categorical or ; multimodal samples have multiple modes if values share the highest frequency. In the sample {2, 2, 3, 4, 4}, there are two modes: 2 and 4. Unlike the mean, these measures prioritize ordering over , enhancing their applicability in non-normal distributions. Measures of quantify the spread of the sample values around the center, essential for understanding data variability. The sample variance, denoted s^2, assesses the average squared deviation from the sample and is given by the unbiased s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})^2. This formula uses n-1 in the denominator, known as , to account for the lost when estimating the from the sample itself, ensuring the statistic is unbiased for the variance \sigma^2. Introduced by in the context of estimation for astronomical data in the early , the adjustment corrects the underestimation bias inherent in dividing by n, as the sample introduces dependency among the deviations. For the earlier test score sample {70, 80, 90}, s^2 = \frac{1}{2} [(70-80)^2 + (80-80)^2 + (90-80)^2] = 100, indicating moderate spread. Simpler dispersion measures include the and (IQR), which avoid squaring and are computationally straightforward. The is the difference between the values in the sample, R = \max(X_i) - \min(X_i), offering a quick but outlier-sensitive of total ; for {70, 80, 90}, R = 90 - 70 = 20. The IQR, a more robust alternative, is the difference between the third Q_3 (75th ) and first Q_1 (25th ), IQR = Q_3 - Q_1, focusing on the middle 50% of the ordered data to mitigate extreme values' influence. In the ordered sample {1, 3, 5, 7, 9}, Q_1 = 3, Q_3 = 7, so IQR = 4. These measures complement variance by providing interpretable scales in the original units, aiding preliminary data exploration.

Multivariate Statistics

Multivariate statistics extend univariate summaries to datasets with multiple variables, capturing joint relationships and dependencies among them. A key example is the sample mean vector, which generalizes the univariate sample mean to a vector of means for each variable in a multivariate sample. For a sample of n observations from a p-variate distribution, the sample mean vector \bar{\mathbf{X}} is defined as \bar{\mathbf{X}} = \frac{1}{n} \sum_{i=1}^n \mathbf{X}_i, where each \mathbf{X}_i is a p \times 1 observation vector. Under the assumption of a multivariate normal distribution, this vector serves as the maximum likelihood estimator for the population mean vector \boldsymbol{\mu}. The sample provides a comprehensive summary of the pairwise covariances and variances across variables, forming the basis for many multivariate analyses. For the same p-variate sample, the unbiased sample covariance matrix \mathbf{S} is given by \mathbf{S} = \frac{1}{n-1} \sum_{i=1}^n (\mathbf{X}_i - \bar{\mathbf{X}})(\mathbf{X}_i - \bar{\mathbf{X}})^\top, with elements s_{jk} = \frac{1}{n-1} \sum_{i=1}^n (X_{i j} - \bar{X}_j)(X_{i k} - \bar{X}_k), where X_{i j} denotes the j-th component of the i-th . This matrix is symmetric and positive semi-definite, estimating the covariance matrix \boldsymbol{\Sigma}. From the covariance matrix, the Pearson correlation coefficient measures the strength and direction of linear association between two variables, normalizing covariance by their standard deviations. For variables X and Y, Pearson's r is computed as r = \frac{\text{cov}(X, Y)}{s_X s_Y}, where \text{cov}(X, Y) is the sample covariance and s_X, s_Y are the sample standard deviations; r ranges from -1 to 1, with values near \pm 1 indicating strong linear relationships. This coefficient, introduced by Karl Pearson in 1895, underpins tests for linear dependence in bivariate cases and extends to correlation matrices in higher dimensions. Principal component scores offer a to summarize multivariate data by projecting observations onto directions of maximum variance, facilitating . These scores are linear combinations of the original variables, \mathbf{Z}_i = \mathbf{V}^\top (\mathbf{X}_i - \bar{\mathbf{X}}), where \mathbf{V} contains the eigenvectors of \mathbf{S} corresponding to the largest eigenvalues, ordered to maximize explained variance sequentially. The concept was first introduced by in 1901, with Harold Hotelling's 1933 work formalizing principal components analysis for analyzing complexes of statistical variables. Such , particularly principal components, find wide application in , enabling visualization and modeling of high-dimensional data by capturing dominant patterns while minimizing information loss; for instance, in , they condense thousands of gene expressions into a few components for pattern detection.

Properties

Sampling Distribution

In statistics, the of a statistic T(\mathbf{X}), where \mathbf{X} = (X_1, \dots, X_n) is a random sample from a with probability density or mass f(\theta) parameterized by \theta, describes the of all possible values of T obtained from repeated random sampling of fixed size n. This distribution is often unknown in closed form and must be approximated, either analytically or computationally, to enable about \theta. For example, the of the sample \bar{X} arises from all possible samples of size n from the , providing the basis for understanding variability in estimates. A cornerstone of sampling distributions is the (CLT), which states that for independent and identically distributed s X_i with finite \mu and variance \sigma^2 > 0, the standardized sample \sqrt{n} (\bar{X} - \mu)/\sigma converges in to a standard normal Z \sim N(0,1) as n \to \infty. The CLT was first discovered by in 1810 through approximations for the , later generalized by in 1901 to allow non-identical distributions under moment conditions. A proof sketch using moment generating functions (MGFs) proceeds as follows: the MGF of the standardized sum S_n = \sqrt{n} (\bar{X} - \mu)/\sigma is M_{S_n}(t) = [M_X(t/\sqrt{n\sigma^2}) \exp(-t \mu /\sqrt{n\sigma^2})]^n, where M_X is the MGF of each X_i. For small u = t/\sqrt{n}, expand \log M_X(u) \approx \mu u + (\sigma^2 u^2)/2, yielding M_{S_n}(t) \to \exp(t^2/2), the MGF of N(0,1), by the continuity theorem for MGFs. Exact sampling distributions are available for certain statistics under normality assumptions. For a random sample from N(\mu, \sigma^2) with unknown \sigma^2, the statistic \sqrt{n} (\bar{X} - \mu)/S follows a with n-1 , where S^2 is the sample variance. Independently, (n-1) S^2 / \sigma^2 follows a with n-1 . The of a statistic T in its , when continuous, is given by f_T(t) = \int f_{\mathbf{X}}( \mathbf{x} ) \, d\mathbf{x}, where the is over the \{ \mathbf{x} : T(\mathbf{x}) = t \} and f_{\mathbf{X}} is the joint density of the sample. For the normal approximation in the CLT, f_Z(z) = \frac{1}{\sqrt{2\pi}} \exp\left( -\frac{z^2}{2} \right), \quad z \in \mathbb{R}. When exact or asymptotic forms are intractable, modern approximations like the bootstrap method, introduced by Bradley Efron in 1979, estimate the by resampling with replacement from the observed data to generate an empirical distribution of the statistic.

Bias and Consistency

In statistics, the bias of an estimator T(\mathbf{X}) for a \theta is defined as \Bias(T) = \E[T(\mathbf{X})] - \theta, measuring the expected deviation from the true value. An estimator is unbiased if its bias is zero for all \theta, ensuring that, on average, it equals the it estimates. Bias arises in finite samples due to the estimator's functional form and can lead to systematic over- or underestimation, prompting trade-offs with variance where reducing one may increase the other. The (MSE) provides a comprehensive measure of an estimator's accuracy, decomposing as \MSE(T) = \Var(T) + [\Bias(T)]^2, combining variability and systematic error. This decomposition highlights that even unbiased estimators can have high MSE if their variance is large, while biased estimators might achieve lower MSE by balancing the terms, as seen in shrinkage methods. Consistency addresses long-run performance, requiring that the converges to the true as the sample size n \to \infty. An T_n is in probability if \plim_{n \to \infty} T_n = \theta, meaning it approaches \theta with probability one. Stronger mean-square consistency holds if \E[(T_n - \theta)^2] \to 0, implying in both and variance. For example, the sample variance s^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2 is an unbiased estimator of the variance \sigma^2 under , with zero . In contrast, maximum likelihood estimators (MLEs), such as the sample mean for a , are often biased in finite samples but asymptotically unbiased and consistent as n \to \infty, with vanishing at rate O(1/\sqrt{n}). The Cramér-Rao bound establishes a lower on the variance of unbiased estimators, stating that for regular conditions, \Var(T) \geq \frac{1}{n I(\theta)}, where I(\theta) is the ; this bound, building on Ronald Fisher's work in the , quantifies the minimum achievable precision. In non-i.i.d. settings, such as independent but non-identically distributed observations, robust estimators like density power divergence-based methods maintain by downweighting outliers, ensuring under contamination while preserving in clean data.

Efficiency

In statistics, the efficiency of an estimator measures its precision relative to other estimators or theoretical benchmarks, particularly focusing on minimizing variance for a given level of . For two estimators T_1 and T_2 with the same , the relative efficiency of T_1 with respect to T_2 is defined as the of their variances, \frac{\text{Var}(T_2)}{\text{Var}(T_1)}; a value greater than 1 indicates that T_1 is more efficient, as it achieves lower variability in estimating the . This comparison assumes unbiasedness or equal to isolate variance as the key of performance./07%3A_Estimation/7.03%3A_Characteristics_of_Estimators) A fundamental theoretical limit on the variance of unbiased estimators is provided by the Cramér-Rao lower bound (CRLB), which states that for an unbiased estimator T of a parameter \theta based on a sample of size n, \text{Var}(T) \geq \frac{1}{n I(\theta)}, where I(\theta) is the Fisher information measuring the amount of information about \theta in the data. This bound was independently derived by Harald Cramér in 1946 and C. R. Rao in 1945. The derivation relies on the score function, defined as the derivative of the log-likelihood with respect to \theta, s(X; \theta) = \frac{\partial}{\partial \theta} \log f(X; \theta), where f(X; \theta) is the likelihood function for the sample X. Under regularity conditions allowing differentiation under the integral (such as the existence of the expected value of the second derivative of the log-likelihood), the expected value of the score is zero: E[s(X; \theta)] = 0, because \int \frac{\partial}{\partial \theta} f(x; \theta) \, dx = \frac{\partial}{\partial \theta} \int f(x; \theta) \, dx = \frac{\partial}{\partial \theta} 1 = 0. Additionally, the variance of the score equals the Fisher information: \text{Var}(s(X; \theta)) = I(\theta) = E\left[\left(\frac{\partial}{\partial \theta} \log f(X; \theta)\right)^2\right]. To derive the bound, consider the covariance between the unbiased estimator T and the score: \text{Cov}(T, s(X; \theta)) = E[(T - \theta) s(X; \theta)], since E[T] = \theta and E = 0. By the Cauchy-Schwarz inequality applied to this covariance, |\text{Cov}(T, s)|^2 \leq \text{Var}(T) \cdot \text{Var}(s) = \text{Var}(T) \cdot I(\theta). Expanding the covariance gives \text{Cov}(T, s) = E\left[(T - \theta) \frac{\partial}{\partial \theta} \log f\right] = E\left[(T - \theta) \frac{1}{f} \frac{\partial f}{\partial \theta}\right] = E\left[T \frac{1}{f} \frac{\partial f}{\partial \theta}\right] - \theta E\left[\frac{1}{f} \frac{\partial f}{\partial \theta}\right]. The second term is zero, and the first term equals 1 because E\left[T \frac{\partial \log f}{\partial \theta}\right] = \frac{\partial}{\partial \theta} E[T] = \frac{\partial \theta}{\partial \theta} = 1 under interchangeability of expectation and derivative. Thus, \text{Cov}(T, s) = 1, so $1 \leq \text{Var}(T) \cdot I(\theta), yielding \text{Var}(T) \geq \frac{1}{I(\theta)} for a single observation and extending to \frac{1}{n I(\theta)} for n i.i.d. samples. Equality holds when T is a linear function of the score, which occurs for estimators like the maximum likelihood estimator (MLE) under certain conditions. The MLE attains asymptotic efficiency, meaning its variance approaches the CRLB as the sample size n increases, under regularity conditions such as the of \theta, twice differentiability of the log-likelihood, and finite moments of the score and its derivatives. Specifically, \sqrt{n} (\hat{\theta}_{MLE} - \theta) \xrightarrow{d} \mathcal{N}(0, I(\theta)^{-1}), so the asymptotic variance equals the inverse , matching the CRLB scaled by n. This efficiency makes the MLE a for large-sample . The here quantifies the curvature of the log-likelihood, tying into broader concepts of in statistical models. In the specific context of linear models, the Gauss-Markov theorem establishes that the ordinary (OLS) estimator is the best linear unbiased (BLUE), with minimal variance among all linear unbiased estimators of the regression coefficients. Assuming a Y = X\beta + \epsilon where \epsilon has zero mean and constant variance with uncorrelated errors, the theorem proves that \text{Var}(\hat{\beta}_{OLS}) \leq \text{Var}(\tilde{\beta}) for any other linear unbiased \tilde{\beta} = AY, using the fact that the difference in estimators has zero mean and the variance comparison follows from positive semi-definiteness of the . Modern statistical theory extends these ideas through concepts like super-efficiency, introduced by Lucien Le Cam in the 1950s, which describes estimators that can achieve variance below the CRLB at certain parameter points, though such superefficiency occurs only on sets of Lebesgue measure zero and cannot hold broadly without violating regularity. Le Cam's 1953 result showed that sequences of superefficient estimators are limited to negligible sets in the parameter space, highlighting the robustness of the CRLB as a general . This phenomenon underscores that while efficiency bounds guide estimator selection, pathological cases require careful consideration in .

Advanced Concepts

Sufficiency

In , a sufficient statistic provides a summary of the data that retains all the information about an unknown parameter θ that is contained in the full sample. The concept was first introduced by in his seminal 1922 paper on the foundations of theoretical . Formally, a statistic T(\mathbf{X}) is sufficient for θ if the conditional of the sample \mathbf{X} given T(\mathbf{X}) = t does not depend on θ. This property allows for data reduction without loss of inferential information about θ. An equivalent and operational characterization is given by the Fisher-Neyman factorization theorem, which states that T(\mathbf{X}) is sufficient for θ if and only if the joint probability density (or mass) function of the sample can be factored as f(\mathbf{x} \mid \theta) = g(T(\mathbf{x}), \theta) \cdot h(\mathbf{x}), where g depends on θ only through T(\mathbf{x}) and h does not depend on θ. Fisher originally established this for discrete distributions, while Jerzy Neyman extended it to general cases in his 1935 work. This theorem provides a practical method to identify sufficient statistics by examining the likelihood function. Examples of sufficient statistics vary by distribution family. For distributions belonging to the , such as or , the sample sum (or a of sufficient components) is typically sufficient, as the density naturally factors with the sum serving as the key term linking to θ. In contrast, for distributions outside the , like the on [0, θ], the sample maximum is the minimal sufficient statistic, illustrating data reduction to a single value without information loss. The utility of sufficient statistics is highlighted by the Rao-Blackwell , which demonstrates their role in improving . If T is sufficient for θ and \hat{\theta}' is any unbiased estimator of θ, then the \hat{\theta} = E[\hat{\theta}' \mid T] is also unbiased and has variance less than or equal to that of \hat{\theta}', with equality only if \hat{\theta}' is a function of T. This , independently developed by in 1945 and in 1947, underscores how conditioning on a sufficient statistic reduces estimation variance while preserving unbiasedness. A minimal sufficient statistic is the coarsest such , defined as a that induces equivalence classes on the where two samples belong to the same class if their likelihood ratio L(\theta \mid \mathbf{x}_1)/L(\theta \mid \mathbf{x}_2) is constant in θ. This ensures it is a function of every other sufficient statistic, providing the maximal data compression while fully capturing parameter information; alluded to this notion in his early work, with formal developments following in subsequent decades.

Completeness and Ancillarity

In statistics, a statistic T is said to be complete if, for every g, the condition \mathbb{E}_\theta[g(T)] = 0 for all \theta \in \Theta implies that g(T) = 0 with respect to the of T. ensures that no non-trivial unbiased based on T has zero across all parameter values, which is crucial for uniqueness in . Specifically, the Lehmann-Scheffé states that if T is a complete for \theta and \delta(X) is an unbiased of some function \tau(\theta), then the \mathbb{E}[\delta(X) \mid T] is the unique uniformly (UMVUE) of \tau(\theta). This , developed in the mid-20th century, provides a foundational guarantee for optimal unbiased when complete sufficient statistics exist. An ancillary statistic is one whose distribution does not depend on the unknown parameter \theta, meaning it provides no direct information about \theta but can influence inference when combined with other statistics. A classic example arises in the bivariate normal distribution with unknown means and variances (assuming correlation parameter known), where the sample correlation coefficient r is ancillary, as its distribution is free of the marginal parameters. Another example occurs with i.i.d. samples from a Uniform(0, \theta) distribution, where the ratio X_{(1)}/X_{(n)} (with X_{(i)} denoting order statistics) is ancillary, since its distribution depends only on the sample size and not on \theta. Such ancillaries are useful for conditioning to eliminate nuisance parameters in inference. Basu's theorem connects completeness and ancillarity by establishing independence: if T(X) is a complete sufficient statistic and V(X) is ancillary for the parameter \theta, then T(X) and V(X) are independent for every \theta \in \Theta. This result, which facilitates simplified inference by separating parameter-dependent and parameter-free components, holds under standard regularity conditions and underscores the non-overlapping roles of sufficient and ancillary statistics. In Bayesian contexts, analogs to completeness appear in the uniqueness of conjugate priors, which yield posteriors that fully capture the likelihood's information without extraneous variation, though classical completeness focuses on frequentist unbiasedness.

Information Content

The information content of a statistic quantifies the amount of information it carries about an unknown parameter \theta in a , providing a theoretical measure of its utility for inference. This concept, rooted in , assesses how sensitive the distribution of the statistic is to changes in \theta, thereby bounding the precision of estimators derived from it. Central to this is the , which serves as a key metric for evaluating the informational value of data summaries like statistics. The Fisher information contained in a statistic T with density f(T \mid \theta) is defined as I_T(\theta) = \mathbb{E}\left[ \left( \frac{\partial \log f(T \mid \theta)}{\partial \theta} \right)^2 \right], where the expectation is taken with respect to the distribution of T given \theta. This scalar (for one-dimensional \theta) or matrix (for vector \theta) measures the expected curvature of the log-likelihood of T, reflecting the variability in the score function \partial \log f / \partial \theta. For the full sample X = (X_1, \dots, X_n) with likelihood L(\theta \mid X), the total Fisher information is I(\theta) = -\mathbb{E}\left[ \partial^2 \log L(\theta \mid X) / \partial \theta^2 \right], assuming regularity conditions such as differentiability and finite moments hold. A sufficient statistic T preserves all available about \theta, meaning I_T(\theta) = I(\theta), as it fully captures the dependence structure between the sample and the parameter without loss. This underscores the of sufficient reduction: the statistic T attains the maximum possible from the sample, enabling based on T alone to match that from the full data. In contrast, non-informative or ancillary statistics, whose distributions do not depend on \theta, yield I_T(\theta) = 0, providing no contribution to . For transformed statistics, the Fisher information follows a chain rule analogous to differentiation, ensuring additivity under conditional independence. Specifically, for jointly distributed statistics T and U, the total information satisfies I_{(T,U)}(\theta) = I_T(\theta) + I_{U \mid T}(\theta), where I_{U \mid T}(\theta) is the conditional information in U given T. This decomposition highlights how information accumulates or partitions across data components, with parameter-free transformations of a statistic preserving its total information content. The framework of information content was pioneered by R.A. in the , who introduced these ideas in his foundational work on likelihood and to quantify data utility beyond mere sufficiency. In modern contexts, particularly post-2000s , connects to measures, such as tracking information flow through layers to optimize parameter estimation tasks and mitigate issues like . For instance, it bounds between inputs and representations, informing scalable approximations in architectures.

References

  1. [1]
    S.1 Basic Terminology | STAT ONLINE - Penn State
    Statistic. A statistic is any summary number, like an average or percentage, that describes the sample. The sample mean, , and the sample proportion are two ...
  2. [2]
    Population Parameters and Sample Statistics
    In other words, a statistic is a number that has been calculated using sample data. Generally, a statistic is known, since we calculate it from known sample ...
  3. [3]
    1.1 Introduction to Statistics and Key Terms
    A statistic is a number that represents a property of the sample. For example, if we consider one math class to be a sample of the population of all math ...
  4. [4]
    How the Field of Statistics Is Used in Data Analytics
    Sep 8, 2023 · Statistics plays a vital role in data science, enabling analysts to identify patterns, relationships, and trends in large and complex data sets.
  5. [5]
    The Importance of Statistics | Michigan Tech Global Campus
    Statistics are important because they can be applied to nearly everything, from your personal life to critical and complex decisions made by large companies ...
  6. [6]
    What is Statistics? - Michigan Technological University
    Statistics transforms data into meaningful information, helping with decision-making and revealing patterns. It also includes probability, the study of chance.Missing: definition | Show results with:definition
  7. [7]
    Challenges and Opportunities for Statistics in the Era of Data Science
    May 28, 2025 · Statistics typically takes samples to draw inferences about a well-defined population, while cs-inspired data science, especially machine ...
  8. [8]
    [PDF] Chapter 4. Some Elementary Statistical Inferences
    Jul 30, 2021 · Let X1,X2,...,Xn denote a sample on a random variable X. Let T = T(X1,X2,...,Xn) be a function of the sample. Then T is a statistic.
  9. [9]
    [PDF] Lecture 1
    Statistic. • Statistic: a function of the sample that is used to estimate/infer about the unknown parameters! – Examples: Sample mean, sample variance ...
  10. [10]
    [PDF] STAT 234 Lecture 15B Population & Sample (Section 1.1) Lecture ...
    A statistic T is a function of the sample T = g(X1,..., Xn) that is used to estimate the parameter of a population, e.g., the sample mean X = 1 n. P i Xi is ...
  11. [11]
    On the mathematical foundations of theoretical statistics - Journals
    On the mathematical foundations of theoretical statistics. R. A. Fisher ... DOWNLOAD PDF. Figures; Related; References; Details. Cited by. Rioul O (2026) A ...
  12. [12]
    [PDF] Lecture 10: Sample distributions, Law of Large Numbers, the Central ...
    Oct 3, 2005 · In general, we call a function of the sample a statistic. We try to generate samples so that each measurement is independent, be- cause this ...
  13. [13]
    [PDF] Sampling Distributions - Rose-Hulman
    Since a statistic is a function of the sample, and the sample is comprised of random variables, the statistic is a random variable. Although not practical ...
  14. [14]
    Parameters vs. Statistics - CUNY Pressbooks Network
    A parameter is a number that describes a population. A statistic is a number that we calculate from a sample. Let's use this new vocabulary to rephrase what we ...
  15. [15]
    Point Estimation | STAT 504 - STAT ONLINE
    Point estimation is a single value that estimates a parameter, calculated from the sample. The sample mean is the best point estimate.
  16. [16]
    [PDF] 6 Point Estimation
    Point estimation aims to estimate a parameter (θ) using sample data. A point estimate is a sensible guess for θ based on a sample.
  17. [17]
    [PDF] Lecture 10: Point Estimation - MSU Statistics and Probability
    A point estimate of a parameter θ, denoted by ˆθ, is a single number that can be considered as a possible value for θ.
  18. [18]
    [PDF] Point Estimation Estimators and Estimates - Stat@Duke
    An estimator is a function of the sample, i.e., it is a rule that tells you how to calculate an estimate of a parameter from a sample.
  19. [19]
    Statistical Paradigms: Frequentist, Bayesian, Likelihood & Fiducial
    Parameters are fixed but unknown constants. Inference is based on the sampling distribution— how results would behave if we repeated the experiment many times.
  20. [20]
    [PDF] Review on Statistical Inference 5.1 Introduction 5.2 Frequentist ...
    Statistical inference involves drawing conclusions from data, often estimating parameters. The frequentist approach interprets probability as long-term ...
  21. [21]
    24.4 - Mean and Variance of Sample Mean | STAT 414
    We have shown that the mean (or expected value, if you prefer) of the sample mean X ¯ is μ . That is, we have shown that the mean of ...
  22. [22]
    Sample Means - Yale Statistics and Data Science
    In the case of the sample mean, the linear combination is = (1/n)*(X1 + X2 + ... Xn). For example, consider the distributions of yearly average test scores on a ...
  23. [23]
    3.2 – Median – Introduction to Statistics and Statistical Thinking
    If you have an odd number of scores, find the middle score. If you have an even number of scores, find the two middle scores and average them. Let's say we have ...
  24. [24]
    2.6 Measures of Center – Significant Statistics
    The mode is the most frequent value. There can be more than one mode in a data set as long as those values have the same frequency and that frequency is the ...
  25. [25]
    26.3 - Sampling Distribution of Sample Variance | STAT 414
    S 2 = 1 n − 1 ∑ i = 1 n ( X i − X ¯ ) 2 is the sample variance of the observations.
  26. [26]
    Bessel's Correction -- from Wolfram MathWorld
    Bessel's correction is the factor (N-1)/N in the relationship between the variance sigma and the expectation values of the sample variance.
  27. [27]
    [PDF] Measures of Dispersion - MATH 130, Elements of Statistics I
    Definition. The range, denoted R, is the difference between the largest and smallest data values. R = largest data value - smallest data value. Page 9. The ...
  28. [28]
    Interquartile Range and Boxplots (1 of 3) – Concepts in Statistics
    The range measures the variability of a distribution by looking at the interval covered by all the data. · The five-number summary of a distribution consists of ...
  29. [29]
    5.1 - Distribution of Sample Mean Vector - STAT ONLINE
    Let's consider the distribution of the sample mean vector, first looking at the univariate setting and comparing this to the multivariate setting.
  30. [30]
    [PDF] Sample Mean Vector and Sample Covariance Matrix
    By the formula for the sample covariance of linear combinations of variates, the sample covariance matrix for the new data matrix Z is. R = V. −1. 2 S. V. −1. 2.
  31. [31]
    Principal component analysis: a review and recent developments
    Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing ...
  32. [32]
    [PDF] Sampling Distributions - NJIT
    The sampling distribution of a statistic is the probability distribution of that statistic. Page 6. Sampling distribution of the sample mean. We take many ...
  33. [33]
    What is a Sampling Distribution? - Psychology in Action
    Aug 13, 2016 · A sampling distribution of the sample mean is a probability distribution of all possible sample means from all possible samples (n).
  34. [34]
    Central Limit Theorem | Freakonometrics
    Feb 15, 2016 · Laplace had discovered the essentials of this fundamental theorem in 1810, and with the designation “central limit theorem of probability theory ...
  35. [35]
    [PDF] Studying “moments” of the Central Limit theorem
    It wasn't until 1901, 89 years after Laplace's papers on normal approximation, that the Russian Mathematician Aleksandr Lyapunov gave the central limit theorem ...
  36. [36]
    [PDF] Proof of the CLT 5.11.1 Properties of Moment Generating Functions ...
    In this optional section, we'll prove the Central Limit Theorem, one of the most fundamental and amazing results in all of statistics, using MGFs! = E [Xn]. ...
  37. [37]
    7.1 The Sampling Distribution of the Sample Mean (σ Un-known)
    When the population standard deviation is unknown, the Student's t-distribution is used. The t-score measures how far the sample mean is from the population  ...
  38. [38]
    [PDF] 4.1 Sampling Distributions
    Definition. The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the ...
  39. [39]
    Bootstrap Methods: Another Look at the Jackknife - Project Euclid
    January, 1979 Bootstrap Methods: Another Look at the Jackknife. B. Efron · DOWNLOAD PDF + SAVE TO MY LIBRARY. Ann. Statist. 7(1): 1-26 (January, 1979). DOI ...
  40. [40]
    [PDF] Properties of Estimators I 7.6.1 Bias
    The bias of an estimator measures whether or not in expectation, the estimator will be equal to the true parameter. Definition 7.6.1: Bias. Let ˆθ be an ...
  41. [41]
    [PDF] Stat 610
    An estimator T(X) of g(θ) is unbiased if its bias is 0, i.e., Eθ [T(X)] = g(θ) for all θ ∈ Θ. An unbiased estimator can be thought of an estimator that has no ...
  42. [42]
    [PDF] Estimators, Mean Square Error, and Consistency
    Jan 20, 2006 · Thus, the mean square error can be decomposed into a variance term and a bias term. The bias is defined as (µδ−θ), the distance between the ...
  43. [43]
    [PDF] Graduate Econometrics Review - P.J. Healy
    Apr 13, 2005 · Definition 5.2 An estimator ˆθ of θ is unbiased if E h. ˆθ i. = θ. Definition 5.3 An estimator ˆθ of θ is consistent if plim ˆθn = θ (see ...
  44. [44]
    [PDF] Asymptotic Theory
    So by convergence in MSE, the sample mean xn is a consistent estimator of the population mean. The sampling distribution of the sample mean converges to a spike ...
  45. [45]
    1.3 - Unbiased Estimation | STAT 415 - STAT ONLINE
    In summary, we have shown that, if X i is a normally distributed random variable with mean μ and variance σ 2 , then S 2 is an unbiased estimator of σ 2 . It ...
  46. [46]
    [PDF] Lecture 14 — Consistency and asymptotic normality of the MLE 14.1 ...
    ˆθ is asymptotically unbiased. More precisely, the bias of ˆθ is less than order 1/ √ n.
  47. [47]
    [PDF] 3 Evaluating the Goodness of an Estimator: Bias, Mean-Square ...
    We can use the relative efficiency to decide which of the two unbi- ased estimators is preferred. • If. Eff(ˆθ1, ˆθ2) = Var(ˆθ2). Var(ˆθ1). > 1,.
  48. [48]
    ON THE CRAMÉR-RAO INEQUALITY - Project Euclid
    Introduction. We are concerned with the original form of the Cramér-Rao inequality, a slight extension of Cramér's (1946, Section 32) formulation (see.
  49. [49]
    [PDF] Lecture 15 — Fisher information and the Cramer-Rao bound 15.1 ...
    We'll prove one such result, called the Cramer-Rao lower bound: Theorem ... Recall the score function z(x, θ) = ∂. ∂θ log f(x|θ) = ∂. ∂θ f(x|θ) f(x|θ).
  50. [50]
    [PDF] Lecture 3 Properties of MLE: consistency, asymptotic normality ...
    Asymptotic normality says that the estimator not only converges to the unknown parameter, but it converges fast enough, at a rate 1/≥n. Consistency of MLE. To ...
  51. [51]
    [PDF] Lecture 6: Asymptotically efficient estimation
    Under some regularity conditions, a root of the likelihood equation (RLE), which is a candidate for an MLE, is asymptotically efficient. Assume the conditions ...
  52. [52]
    [PDF] Chapter 4 - The Gauss-Markov Theorem
    It is also called the best linear unbiased estimator or BLUE of β. Proof. Any linear estimator of β can be written as AY for some p × n matrix A. (That's what ...
  53. [53]
    [PDF] The Gauss-Markov Theorem - STA 211 - Stat@Duke
    Mar 7, 2023 · The Gauss-Markov Theorem asserts that under some assumptions, the OLS estimator is the “best” (has the lowest variance) among all estimators in ...
  54. [54]
    THE STATISTICAL WORK OF LUCIEN LE CAM Free University ...
    Superefficiency. Le Cam has contributed to an understanding of the super efficiency phenomenon at various points in his career, using the new insights.
  55. [55]
    [PDF] 27 Superefficiency
    Already in 1952 Le Cam had announced in an abstract to the Annals of Mathematical. Statistics that the set of superefficiency can never be larger than a ...
  56. [56]
    [PDF] 5. Completeness and sufficiency 5.1. Complete statistics. Definition ...
    Completeness and sufficiency. 5.1. Complete statistics. Definition 5.1. A statistic T is called complete if Eg(T) = 0 for all θ and some function g implies ...
  57. [57]
    [PDF] Unbiased Estimation Lecture 15: UMVUE: functions of sufficient and ...
    Theorem 3.1 (Lehmann-Scheffé theorem). Suppose that there exists a sufficient and complete statistic T(X) for. P ∈ P. If ϑ is estimable, then there is a ...
  58. [58]
    Completeness, Similar Regions, and Unbiased Estimation-Part I
    The aim of this paper is the study of two classical problems of mathematical statistics, the problems of similar regions and of unbiased estimation.Missing: original | Show results with:original
  59. [59]
    [PDF] Ancillary Statistics: A Review
    Ancillary statistics, one of R. A. Fisher's most fundamental contributions to statistical inference, are statistics whose distributions do not depend on the ...Missing: correlation | Show results with:correlation
  60. [60]
    Chapter 6 Ancillary Statistics, Complete Statistics (Lecture on 01/21 ...
    Theorem 6.1 (Basu Theorem) If T(X) T ( X ) is a complete and minimal sufficient statistic, then T(X) T ( X ) is independent of every ancillary statistic.
  61. [61]
    Completeness, Ancillarity, and Basu's Theorem - Stat 210a
    Theorem (Basu): If T ( X ) is complete sufficient and V ( X ) ancillary for the model P , then V ( X ) ⊥ ⊥ T ( X ) for all θ ∈ Θ . Again, for this proof our ...
  62. [62]
    (PDF) Basu's Theorem - ResearchGate
    Aug 25, 2016 · Basu's Theorem, published in Sankhya, 1955, has served several generations of statisticians as a fundamental tool for proving independence of ...
  63. [63]
    [PDF] A Tutorial on Fisher Information - arXiv
    This section outlines how Fisher information can be used to define the Jeffreys's prior, a default prior commonly used for estimation problems and for nuisance.
  64. [64]
    [PDF] On the Mathematical Foundations of Theoretical Statistics Author(s)
    On the Mathematical Foundations of Theoretical Statistics. Author(s): R. A. Fisher. Source: Philosophical Transactions of the Royal Society of London. Series ...