Fact-checked by Grok 2 weeks ago

Estimation theory

Estimation theory is a branch of concerned with the of unknown or states from observed data using probabilistic models, aiming to construct optimal estimators that minimize error metrics such as . It provides a rigorous for estimation in the presence of or uncertainty, applicable across fields like , communications, and control systems. The foundations of estimation theory trace back to the early 19th century, with pioneering work by mathematicians such as and on methods for astronomical data analysis, and on inverse probability. In the 20th century, Ronald A. Fisher formalized key concepts like in the 1920s, while Harold Cramér and developed the Cramér–Rao lower bound in the 1940s, which quantifies the minimum achievable variance for unbiased estimators. These advancements shifted estimation from ad hoc techniques to a systematic statistical discipline, incorporating concepts like sufficient statistics—functions of data that capture all relevant information about the parameter—and Bayesian approaches that integrate prior knowledge. Central to estimation theory are methods for deriving estimators, including maximum likelihood estimation (MLE), which selects the parameter value maximizing the likelihood of the observed data, and the method of moments, which equates sample moments to theoretical ones. Performance is evaluated using criteria like bias, variance, and the matrix, which measures the amount of information data provides about the parameter. Bayesian estimation further extends this by computing posterior distributions, enabling minimum estimators as conditional expectations. In modern applications, estimation theory underpins technologies such as GPS positioning, adaptive filtering in wireless communications, and algorithms for parameter tuning, with ongoing developments incorporating computational advances for high-dimensional . Its principles ensure reliable in noisy environments, from biomedical signal to econometric modeling.

Fundamentals

Definition and Motivation

Estimation theory constitutes a foundational in for inferring unknown parameters \theta of a probabilistic model from observed X, enabling approximations of these parameters under conditions of . This approach addresses the core challenge of extracting meaningful from finite samples drawn from an underlying , as articulated in early theoretical developments where estimation problems involve deriving that represent the relevant features of the . The theory emphasizes the use of to approximate \theta, distinguishing it from other inferential tasks by focusing on quantitative approximations rather than qualitative decisions. At its core, estimation theory operates within a basic probabilistic setup where the observed X is generated according to a conditional p(X|\theta), with \theta denoting the of interest that characterizes the model's . This formulation captures the likelihood of the data given the parameters, providing a basis for when direct observation of \theta is impossible. For instance, in scenarios involving noisy measurements, such as signals corrupted by environmental interference, the theory facilitates the reconstruction of true signal parameters like or . The motivation for estimation theory stems from pervasive real-world demands to quantify unknowns amid incomplete information, spanning fields like , scientific experimentation, and predictive modeling. In predictive contexts, it allows estimation of latent probabilities—such as the likelihood of specific linguistic structures—from limited exposures, informing models of human learning and behavior. Unlike testing, which evaluates discrete alternatives to accept or reject a null model, estimation delivers point estimates (single values) or approximations (ranges) for continuous parameters, prioritizing the precision and utility of these approximations in . This distinction underscores estimation's role in enabling proactive rather than reactive validation.

Historical Overview

The roots of estimation theory trace back to early developments in , particularly Jacob Bernoulli's 1713 work on the , which provided foundational ideas for estimating probabilities from repeated trials in the . This laid the groundwork for parametric inference by addressing how empirical frequencies converge to true probabilities as sample size increases. A significant advancement came in 1809 with Carl Friedrich Gauss's introduction of the method of least squares, originally developed for estimating orbital parameters in astronomy by minimizing the sum of squared residuals. Gauss's approach, grounded in the assumption of normally distributed errors, marked the beginning of systematic parameter estimation in the presence of observational and influenced subsequent statistical methodologies. In the early 20th century, Ronald A. Fisher formalized key principles of with his 1922 paper introducing the maximum likelihood method, which selects parameter values that maximize the probability of observing the given data. This was complemented by the Neyman-Pearson framework in the 1930s, which developed the theory of hypothesis testing and optimal decision rules based on likelihood ratios, solidifying frequentist approaches to . The Bayesian perspective saw a revival through Harold Jeffreys's 1939 Theory of Probability, which advocated objective priors and integrated prior knowledge with data for parameter estimation, countering criticisms of subjectivity in earlier Bayesian work. Post-1950s computational advances, such as (MCMC) methods originating from the Metropolis algorithm in 1953, enabled practical Bayesian estimation for complex models by simulating posterior distributions. Milestones in bounding estimation performance included C. R. Rao's 1945 derivation of a lower bound on estimator variance using and Harald Cramér's 1946 extension, establishing the Cramér-Rao bound as a fundamental limit for unbiased estimators. By the late , estimation theory evolved toward robust methods resistant to outliers, pioneered by Peter Huber's 1964 work on M-estimators, and nonparametric techniques that relaxed parametric assumptions, as advanced in by Rosenblatt in 1956 and further developed in the 1980s–1990s.

Core Concepts

Estimators

In estimation theory, an estimator is a rule or function that assigns an approximate value to an unknown parameter based on observed data. Formally, given a random sample X = (X_1, \dots, X_n) drawn from a probability distribution parameterized by \theta, an estimator \hat{\theta} is defined as \hat{\theta} = g(X), where g is a measurable function mapping the sample space to the parameter space. This function transforms the raw data into a point approximation of \theta, serving as the core tool for inferring population characteristics from limited observations. Estimators are broadly classified into point estimators and interval estimators. A point estimator yields a single value as the approximation for \theta, such as the sample mean estimating the population mean in a . In contrast, an interval estimator provides a range of plausible values, typically in the form of a that contains \theta with a specified probability. This distinction allows point estimators to offer simplicity and directness, while interval estimators incorporate . Within point estimators, a key distinction exists between unbiased and biased types. An unbiased estimator satisfies E[\hat{\theta}] = \theta for all \theta, meaning its expected value equals the true parameter, as exemplified by the sample variance (with n-1 in the denominator) for a normal population variance. A biased estimator, however, has E[\hat{\theta}] \neq \theta, potentially introducing systematic error, though bias can sometimes be traded for reduced variance. estimators represent a straightforward class of point estimators, obtained by substituting sample moments or statistics directly into the parameter's functional form; for instance, using the to estimate the . A related concept is that of sufficiency, which identifies statistics that efficiently summarize the data for estimation purposes. A sufficient statistic T(X) captures all relevant information about \theta contained in the full sample X, such that the conditional of X given T(X) = t is of \theta. This property, introduced by , ensures that any estimator based on T(X) loses no information compared to using the entire dataset, facilitating data reduction without sacrificing inferential power.

Statistical Models and Parameters

In estimation theory, statistical models formalize the probabilistic structure underlying observed data, enabling the of unknown from measurements. A specifies a family of probability distributions indexed by a finite-dimensional \theta \in \Theta, where \Theta is typically an open subset of \mathbb{R}^k for some k \geq 1. This parameterization assumes that the data-generating process belongs to a restricted class of distributions, allowing for tractable while capturing essential features of the variability in the data. A classic example is the normal family, where observations X_1, \dots, X_n are modeled as X_i \sim \mathcal{N}(\mu, \sigma^2) independently, with \theta = (\mu, \sigma^2) \in \mathbb{R} \times (0, \infty). Here, \mu represents the and \sigma^2 the of the . Parametric models often include parameters, which are components of \theta not of direct interest but necessary for accurately specifying the full ; for instance, in estimating a \mu under heteroscedasticity, the variance parameters may serve as nuisances. conditions ensure that distinct values \theta \neq \theta' produce distinct probability distributions, preventing ambiguity in ; a standard requirement is that the mapping from \Theta to the space of distributions is injective. Central to parametric is the , defined as L(\theta \mid X) = p(X \mid \theta), which quantifies the probability of the observed X under \theta and serves as the for deriving estimators and performing tests. In frequentist approaches, parameters are treated as fixed but unknown constants, with based solely on the of the . In contrast, Bayesian frameworks view parameters as random variables governed by a , updating beliefs via the posterior proportional to the likelihood times the . Common model classes in estimation include independent and identically distributed (i.i.d.) samples from a parametric family, such as binomial trials for proportion estimation where \theta = p \in (0,1). Regression models extend this by incorporating covariates, for example, the linear model Y_i = \beta_0 + \beta_1 x_i + \epsilon_i with \epsilon_i \sim \mathcal{N}(0, \sigma^2) i.i.d., parameterizing the relationship via \theta = (\beta_0, \beta_1, \sigma^2). These structures underpin diverse applications, from signal processing to econometrics, by balancing model simplicity with descriptive power.

Properties of Estimators

Bias and Consistency

In estimation theory, bias quantifies the systematic deviation of an from the true value. For an estimator \hat{\theta} of a \theta, the is formally defined as B(\hat{\theta}) = E[\hat{\theta}] - \theta, where E[\cdot] denotes the under the true distribution. An is unbiased if B(\hat{\theta}) = 0, meaning its equals the true for all possible values of \theta. This ensures that, on average over repeated samples, the centers around the true value without systematic over- or underestimation. However, unbiasedness is a finite-sample and does not guarantee low variability; biased s may sometimes exhibit lower overall error in practice. The (MSE) provides a composite measure of that incorporates both and variance: \text{MSE}(\hat{\theta}) = \text{Var}(\hat{\theta}) + [B(\hat{\theta})]^2. This decomposition highlights the bias-variance tradeoff, where reducing bias might increase variance, and vice versa; detailed analysis of MSE and efficiency follows in subsequent discussions. Unbiased estimators simplify certain optimality criteria but are not always achievable or desirable, as small biases can yield estimators with superior MSE in finite samples. Consistency addresses the long-run behavior of estimators as the sample size n increases to . An sequence \{\hat{\theta}_n\} is (weakly) if \hat{\theta}_n \xrightarrow{p} \theta in probability, meaning for any \epsilon > 0, P(|\hat{\theta}_n - \theta| > \epsilon) \to 0 as n \to \infty. requires , i.e., P(\lim_{n \to \infty} \hat{\theta}_n = \theta) = 1. Consistency ensures that larger samples lead to estimators arbitrarily close to the true with high probability, making it a fundamental asymptotic desideratum even for biased estimators. A classic example is the sample mean \bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i for and identically distributed (i.i.d.) random variables X_i with finite \mu. This is unbiased, as E[\bar{X}_n] = \mu, and consistent, as it converges in probability (and almost surely, by the strong ) to \mu. To demonstrate weak consistency via for an unbiased with vanishing variance, note that if \text{Var}(\hat{\theta}_n) \to 0, then P(|\hat{\theta}_n - \theta| > \epsilon) \leq \frac{\text{Var}(\hat{\theta}_n)}{\epsilon^2} \to 0. For the sample mean, \text{Var}(\bar{X}_n) = \frac{\sigma^2}{n} \to 0 under finite variance \sigma^2 < \infty, confirming consistency.

Efficiency and Mean Squared Error

In estimation theory, the variance of an estimator \hat{\theta} quantifies its precision in finite samples by measuring the expected squared deviation from its own expected value, defined as \operatorname{Var}(\hat{\theta}) = E[(\hat{\theta} - E[\hat{\theta}])^2]. This metric captures the spread or variability of the estimator around its mean, independent of any systematic offset from the true parameter \theta, and is particularly useful for comparing the reliability of unbiased estimators. For instance, lower variance indicates that repeated samples would yield estimates closer to the estimator's mean. Relative efficiency provides a comparative measure of precision between two unbiased estimators \hat{\theta}_1 and \hat{\theta}_2 of the same parameter, given by the ratio \operatorname{Eff}(\hat{\theta}_1, \hat{\theta}_2) = \frac{\operatorname{Var}(\hat{\theta}_2)}{\operatorname{Var}(\hat{\theta}_1)}. If this ratio exceeds 1, \hat{\theta}_1 is deemed more efficient, as it achieves the same unbiasedness with less variability. A classic example of the arises in estimating the variance \sigma^2 of a : the unbiased estimator s^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2 has variance \frac{2\sigma^4}{n-1}, while the biased method-of-moments estimator \tilde{s}^2 = \frac{1}{n} \sum_{i=1}^n (X_i - \bar{X})^2 has smaller variance \frac{2(n-1)\sigma^4}{n^2} but E[\tilde{s}^2] = \frac{n-1}{n} \sigma^2, highlighting how bias can reduce variability at the cost of systematic error. The mean squared error (MSE) offers a comprehensive finite-sample measure of an estimator's overall accuracy, incorporating both variability and potential bias, and is defined as \operatorname{MSE}(\theta) = E[(\hat{\theta} - \theta)^2]. This decomposes additively as \operatorname{MSE}(\theta) = \operatorname{Var}(\hat{\theta}) + [B(\hat{\theta})]^2, where B(\hat{\theta}) = E[\hat{\theta}] - \theta is the ; for unbiased estimators, the MSE reduces to the variance alone. MSE thus serves as a risk function in decision-theoretic frameworks, balancing precision against systematic error. As sample sizes grow large, the central limit theorem often implies asymptotic normality for regular estimators, where the scaled error \sqrt{n}(\hat{\theta} - \theta) converges in distribution to a normal random variable with mean zero and variance equal to the asymptotic variance. This normality facilitates approximate inference, such as confidence intervals, even when exact distributions are intractable. For maximum likelihood estimators under suitable regularity conditions, the asymptotic distribution is specifically \sqrt{n}(\hat{\theta} - \theta) \to N(0, I(\theta)^{-1}), where I(\theta) denotes the , providing a benchmark for large-sample efficiency.

Estimation Techniques

Frequentist Methods

Frequentist methods in estimation theory focus on constructing estimators based on the long-run frequency behavior of statistical procedures, treating parameters as fixed unknowns without incorporating prior probabilities. These approaches emphasize data-driven techniques that maximize empirical fit or match distributional properties, relying on properties like consistency and efficiency derived from repeated sampling under the assumed model. Key methods include the method of moments, least squares, and maximum likelihood estimation, each providing point estimates that can be extended to interval estimates via confidence procedures. The method of moments, introduced by Karl Pearson in 1894, constructs estimators by equating population moments to their sample counterparts. For a parameter \theta defined through functions g_k(X) where X follows a distribution parameterized by \theta, the estimators solve \mathbb{E}[g_k(X)] = \mu_k(\theta) using sample averages \hat{\mu}_k = \frac{1}{n} \sum_{i=1}^n g_k(X_i). For instance, in estimating the mean and variance of a distribution, the first two sample moments (arithmetic mean and sample variance) directly yield the parameter values, providing a straightforward, computationally simple approach applicable to many parametric families. Least squares estimation, first published by Adrien-Marie Legendre in 1805 and independently developed earlier by Carl Friedrich Gauss, minimizes the sum of squared residuals to estimate parameters in regression models. The estimator \hat{\theta} solves \hat{\theta} = \arg\min_\theta \sum_{i=1}^n (y_i - f(x_i; \theta))^2, where y_i are observed responses and f(x_i; \theta) is the predicted function. This method is particularly effective for linear models, yielding unbiased and minimum-variance estimators under Gaussian error assumptions, and forms the basis for ordinary least squares in linear regression. Maximum likelihood estimation (MLE), formalized by Ronald A. Fisher in 1922, selects the parameter value that maximizes the likelihood function given the observed data. The likelihood is L(\theta | X) = \prod_{i=1}^n p(x_i | \theta), and the MLE is \hat{\theta}_{ML} = \arg\max_\theta L(\theta | X), or equivalently, \arg\max_\theta \sum_{i=1}^n \log p(x_i | \theta). Under standard regularity conditions—such as the existence of derivatives of the log-likelihood up to third order and identifiability of \theta—the MLE is asymptotically efficient, achieving the minimal asymptotic variance among consistent estimators as the sample size n \to \infty. Frequentist interval estimation constructs confidence intervals around point estimates using pivotal quantities, which are functions of the data and whose distributions do not depend on \theta. For the mean \mu of a normal distribution with known variance \sigma^2, the pivotal quantity \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} follows a standard normal distribution, yielding the interval \bar{X} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}} with confidence level $1 - \alpha. When variance is unknown, the t-distribution provides an exact interval based on the studentized mean \frac{\bar{X} - \mu}{S / \sqrt{n}}, ensuring the procedure covers the true with the stated probability in repeated sampling.

Bayesian Methods

Bayesian estimation incorporates prior knowledge about model parameters into the inference process, treating parameters as random variables whose distributions are updated based on observed data. At the core of this approach is Bayes' theorem, which states that the posterior distribution of the parameter θ given data X is proportional to the product of the likelihood and the prior: p(θ|X) ∝ p(X|θ) p(θ), where p(θ) represents the prior beliefs about θ before observing the data. This framework allows for probabilistic statements about parameters, contrasting with frequentist methods by explicitly quantifying uncertainty through the full posterior distribution rather than point estimates alone. A common point estimator derived from the posterior is the maximum a posteriori (MAP) estimate, defined as the value of θ that maximizes the posterior density: \hat{\theta}_{\text{MAP}} = \arg\max_{\theta} [\log L(\theta|X) + \log p(\theta)], where L(\theta|X) is the . This estimator balances the data-driven likelihood with the prior, reducing to the when the prior is uniform. Other point estimators include the posterior mean, \mathbb{E}[\theta|X] = \int \theta p(\theta|X) d\theta, or the posterior median, which can offer robustness in certain scenarios. For interval estimation, are constructed from posterior quantiles, providing probability statements such as "there is a 95% probability that θ lies within this interval" given the model and data. To facilitate analytical computation of the posterior, conjugate priors are often employed, where the prior distribution belongs to the same family as the likelihood, ensuring the posterior remains in that family. A classic example is the beta prior for the parameter of a binomial likelihood: if p(θ) is Beta(α, β), then after observing k successes in n trials, the posterior is Beta(α + k, β + n - k), allowing closed-form updates. This conjugacy simplifies inference but requires careful prior selection to avoid undue influence on results. Exact posterior computation becomes intractable for complex models with high-dimensional parameters or non-conjugate priors, necessitating approximate methods such as (MCMC) sampling and . MCMC methods, including the , generate samples from the posterior distribution to approximate expectations and integrals. addresses this by optimizing a simpler distribution to approximate the true posterior, minimizing the between them, as introduced in early graphical model applications. This approach enables scalable Bayesian estimation in modern applications like large-scale signal processing.

Performance Limits

Cramér–Rao Lower Bound

The Cramér–Rao lower bound (CRLB) establishes a fundamental limit on the precision with which an unbiased estimator can estimate a parameter from observed data, serving as a benchmark for estimator performance in frequentist statistics. This bound quantifies the minimum possible variance of any unbiased estimator of a parameter \theta, based on the amount of information the data provide about \theta. It arises from the inherent trade-off between bias and variance in estimation, highlighting that no unbiased estimator can have variance below this theoretical minimum under standard regularity conditions, such as the differentiability of the log-likelihood and the ability to interchange differentiation and integration. Central to the CRLB is the concept of Fisher information, which measures the amount of information that an observable random variable X carries about an unknown parameter \theta in a parametric family of probability distributions p(X \mid \theta). The Fisher information I(\theta) for a single observation is defined as the expected value of the squared score function, where the score is the derivative of the log-likelihood with respect to \theta: I(\theta) = \mathbb{E} \left[ \left( \frac{\partial}{\partial \theta} \log p(X \mid \theta) \right)^2 \right]. Under regularity conditions allowing differentiation under the integral sign, this is equivalently expressed as the negative expected value of the second derivative of the log-likelihood: I(\theta) = -\mathbb{E} \left[ \frac{\partial^2}{\partial \theta^2} \log p(X \mid \theta) \right]. This equivalence follows from the fact that the expected value of the score is zero, \mathbb{E} \left[ \frac{\partial}{\partial \theta} \log p(X \mid \theta) \right] = 0, and applying differentiation to this identity. For n independent and identically distributed (i.i.d.) observations X_1, \dots, X_n, the total Fisher information scales additively to n I(\theta), reflecting the accumulation of information with more data. The CRLB states that for any unbiased estimator \hat{\theta} of \theta based on n i.i.d. samples, the variance satisfies \text{Var}(\hat{\theta}) \geq \frac{1}{n I(\theta)}, provided the estimator is unbiased, \mathbb{E}[\hat{\theta}] = \theta, and regularity conditions hold to ensure the bound's validity, such as the finiteness of the Fisher information and the existence of the moments involved. This bound implies that the precision of estimation improves at least as fast as $1/n, with the constant determined by the intrinsic information content I(\theta) of the distribution. The derivation of the CRLB relies on the Cauchy-Schwarz inequality applied to the score function. Consider the score for the full sample, Z = \sum_{i=1}^n \frac{\partial}{\partial \theta} \log p(X_i \mid \theta), which has mean zero and variance n I(\theta). For an unbiased estimator \hat{\theta}, the covariance between Z and \hat{\theta} is \text{Cov}(Z, \hat{\theta}) = \mathbb{E}[Z (\hat{\theta} - \theta)] = \frac{\partial}{\partial \theta} \mathbb{E}[\hat{\theta}] = 1, since the derivative can be passed inside the expectation under regularity. Applying gives [\text{Cov}(Z, \hat{\theta})]^2 \leq \text{Var}(Z) \cdot \text{Var}(\hat{\theta}), which substitutes to $1 \leq n I(\theta) \cdot \text{Var}(\hat{\theta}), yielding the bound \text{Var}(\hat{\theta}) \geq 1 / (n I(\theta)). Equality in the CRLB holds if and only if \hat{\theta} is a linear function of the score Z, specifically \hat{\theta} = \theta + c Z for some constant c = 1 / (n I(\theta)), meaning the estimator must be affinely related to a sufficient statistic for \theta. Under additional regularity conditions, the maximum likelihood estimator (MLE) achieves this bound asymptotically as n \to \infty. For multidimensional parameters \theta \in \mathbb{R}^p, the CRLB generalizes to a matrix inequality on the covariance matrix of the unbiased estimator \hat{\theta}. The Fisher information matrix I(\theta) has elements [I(\theta)]_{ij} = \mathbb{E} \left[ \frac{\partial}{\partial \theta_i} \log p(X \mid \theta) \cdot \frac{\partial}{\partial \theta_j} \log p(X \mid \theta) \right] = -\mathbb{E} \left[ \frac{\partial^2}{\partial \theta_i \partial \theta_j} \log p(X \mid \theta) \right], measuring the joint information across parameters. For n i.i.d. samples, the CRLB asserts that the covariance matrix satisfies the positive semi-definite matrix inequality \text{Cov}(\hat{\theta}) \geq [n I(\theta)]^{-1}, where the inverse exists if I(\theta) is positive definite, ensuring identifiability. The derivation extends the scalar case using the multivariate (or equivalently, properties of positive semi-definite matrices) on the score vector and the estimator deviations, leading to the matrix bound. Equality holds when \hat{\theta} - \theta is linearly related to the score vector in a manner that saturates the inequality, often achieved asymptotically by the .

Information Inequality and Beyond

The Chapman–Robbins bound provides a lower bound on the variance of biased estimators, generalizing the (CRLB) by incorporating a bias function to account for systematic errors in estimation. Unlike the unbiased CRLB, this bound applies to estimators \hat{\theta} of a parameter \theta where the expected bias b(\theta) = E[\hat{\theta} - \theta] is included, yielding \text{Var}(\hat{\theta}) \geq \frac{(1 + b'(\theta))^2}{I(\theta)}, with I(\theta) denoting the ; this form holds under weaker regularity conditions and is particularly useful when unbiased estimators do not exist or are inefficient. The bound is attained in certain cases, such as linear models with known variance, but generally offers a tighter limit than the CRLB for biased scenarios by penalizing deviation from unbiasedness. In Bayesian estimation, the van Trees inequality serves as an analog to the CRLB, providing a lower bound on the mean squared error (MSE) of estimators by integrating the Fisher information over a prior distribution on the parameter. Formulated for the Bayes risk E[\text{MSE}(\hat{\theta})] \geq \left( E[I(\theta)] + \int \frac{[\pi'(\theta)]^2}{\pi(\theta)} d\theta \right)^{-1}, where \pi(\theta) is the prior density, it combines classical Fisher information I(\theta) with a prior information term, enabling bounds on posterior variance even without unbiasedness assumptions. This inequality is widely applied in signal processing and communications to assess the fundamental limits of Bayesian estimators under uncertainty in the parameter. Standard information bounds like the CRLB fail in non-regular cases, where the likelihood's support depends on the parameter or differentiability assumptions are violated, leading to phenomena such as superefficiency where estimators achieve variances below the asymptotic CRLB at specific points. For instance, in estimating the endpoint of a uniform distribution on [0, \theta], the maximum likelihood estimator \hat{\theta} = \max(X_i) has finite-sample variance \theta^2 \frac{n}{(n+1)^2 (n+2)}, which is of order O(1/n^2) and does not align with CRLB predictions due to the non-differentiable density at the boundary, rendering classical bounds inapplicable. Superefficiency, first demonstrated by Le Cam, occurs when a sequence of estimators has MSE converging faster than the CRLB rate at isolated parameter values, but such points form a set of Lebesgue measure zero, limiting global efficiency gains. In multiparameter estimation, the Fisher information matrix \mathbf{I}(\boldsymbol{\theta}) governs the CRLB, with the covariance matrix of any unbiased estimator \hat{\boldsymbol{\theta}} satisfying \text{Cov}(\hat{\boldsymbol{\theta}}) \succeq \mathbf{I}(\boldsymbol{\theta})^{-1}. For a scalar parameter \theta_i within \boldsymbol{\theta} = (\theta_1, \dots, \theta_p), the bound reduces to the (i,i)-th element of the inverse matrix, [\mathbf{I}(\boldsymbol{\theta})^{-1}]_{ii}, capturing the influence of nuisance parameters on estimation precision; this scalar reduction highlights trade-offs, as off-diagonal elements induce incompatibility in joint estimation. Such formulations are essential in high-dimensional settings, where the matrix's positive definiteness ensures the bound's validity under regularity. Modern extensions of information inequalities include minimax bounds, which address worst-case performance over uncertainty sets, and robust estimation limits that mitigate model misspecification or outliers. Wald's minimax framework establishes the risk R(\hat{\theta}, \theta) \geq \inf_{\hat{\theta}} \sup_{\theta} E[(\hat{\theta} - \theta)^2], providing bounds independent of specific \theta and applicable when priors are unavailable. In robust settings, Huber's contamination models yield asymptotic lower bounds on risk, such as \text{MSE} \geq \frac{1}{I(\theta)} (1 - \epsilon)^2 for \epsilon-fraction outliers, emphasizing resilience in non-ideal data environments. These approaches extend classical inequalities to practical scenarios, prioritizing global optimality over local efficiency.

Illustrative Examples

Estimation in Gaussian Noise

A fundamental problem in estimation theory is the estimation of a constant parameter \theta from observations corrupted by additive white Gaussian noise. The model consists of n independent and identically distributed (i.i.d.) observations X_i = \theta + W_i for i = 1, \dots, n, where each noise term W_i \sim \mathcal{N}(0, \sigma^2) and \sigma^2 is known. In the frequentist approach, the maximum likelihood estimator (MLE) for \theta is the sample mean \hat{\theta}_{ML} = \bar{X} = \frac{1}{n} \sum_{i=1}^n X_i. This estimator is unbiased, meaning E[\hat{\theta}_{ML}] = \theta, and it achieves the minimum variance among all unbiased estimators, making it the minimum variance unbiased (MVU) estimator. The Cramér–Rao lower bound (CRLB) provides the theoretical limit on the variance of any unbiased estimator, given by \mathrm{Var}(\hat{\theta}) \geq \frac{\sigma^2}{n}, which is precisely attained by the sample mean, confirming its efficiency. For inference, a (1 - \alpha) confidence interval for \theta is constructed as \bar{X} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}, where z_{\alpha/2} is the (1 - \alpha/2)-quantile of the standard normal distribution; this interval contains the true \theta with probability $1 - \alpha. In the Bayesian framework, additional structure is imposed by assuming a normal prior distribution \theta \sim \mathcal{N}(\mu_0, \sigma_0^2). The posterior distribution of \theta given the observations is also normal, \mathcal{N}(\mu_n, \sigma_n^2), where the posterior variance is \sigma_n^2 = \left( \frac{n}{\sigma^2} + \frac{1}{\sigma_0^2} \right)^{-1} and the posterior mean is \mu_n = w \bar{X} + (1 - w) \mu_0, \quad w = \frac{n / \sigma^2}{n / \sigma^2 + 1 / \sigma_0^2}. This posterior mean serves as a shrinkage estimator, representing a weighted average that pulls the sample mean toward the prior mean \mu_0, with the shrinkage weight w reflecting the relative precision of the data versus the prior.

Uniform Distribution Parameter Estimation

Consider independent and identically distributed (i.i.d.) random variables X_1, \dots, X_n drawn from a uniform distribution on the interval [0, \theta], where \theta > 0 is an unknown parameter to be estimated. This model arises in scenarios where observations are bounded above by an unknown threshold, such as certain measurement errors or randomized experiments with a fixed upper limit. The maximum likelihood (MLE) for \theta is the sample maximum, \hat{\theta}_\text{MLE} = \max(X_1, \dots, X_n). This is biased downward, with E[\hat{\theta}_\text{MLE}] = \frac{n \theta}{n+1}. To obtain an unbiased , scale the MLE by \tilde{\theta} = \frac{n+1}{n} \hat{\theta}_\text{MLE}, which corrects the while preserving as n increases. correction techniques, such as this scaling, are essential for applications requiring unbiased estimates, as discussed in general properties. The Cramér-Rao lower bound (CRLB), which provides a theoretical minimum variance for unbiased estimators under regularity , does not apply to this model. The reason is that the support of the [0, \theta] depends on the \theta, violating the that the support be independent of \theta. Consequently, the is not defined in the standard way, and the CRLB cannot be used to assess . The actual variance of the MLE can be computed directly using the distribution of the maximum order statistic: \Var(\hat{\theta}_\text{MLE}) = \frac{\theta^2 n}{(n+1)^2 (n+2)}. This expression reveals that the MLE achieves a variance lower than that of the method of moments estimator \hat{\theta}_\text{MoM} = 2 \bar{X} (where \bar{X} is the sample mean), which has \Var(\hat{\theta}_\text{MoM}) = \frac{\theta^2}{3n}. For large n, both estimators converge to \theta, but the MLE is more , highlighting the challenges in bounded-support parameter estimation where standard bounds fail.

Applications and Extensions

Signal Processing and Communications

In signal processing, estimation theory is essential for recovering desired signals from noisy observations, such as separating a transmitted from (AWGN) in communication channels. This involves estimating parameters like signal , , or to enable accurate and decoding. In communications, estimation techniques mitigate the effects of , , and distortions, ensuring reliable data transmission over wireless links. For instance, in scenarios, maximum likelihood estimators (MLEs) often achieve near-optimal performance by maximizing the derived from the received signal model. Channel estimation in wireless systems is critical for compensating fading effects, where the channel coefficients represent the multiplicative distortions due to multipath propagation. Least squares (LS) estimation is a widely used frequentist method for estimating these fading coefficients, particularly in orthogonal frequency-division multiplexing (OFDM) systems. The LS estimator minimizes the squared error between the received pilot symbols and the modeled channel response, yielding \hat{h}_{LS} = (P^H P)^{-1} P^H y, where P is the known pilot matrix and y is the received vector. This approach is computationally efficient and unbiased under ideal conditions, though it is sensitive to noise without prior channel statistics. Seminal work demonstrated its application in frequency-selective fading channels, showing that LS provides a baseline for equalization in broadband wireless systems. Synchronization in communication receivers requires precise estimation of timing and carrier phase to align the received signal with the transmitter's clock and oscillator. In AWGN channels, (MLE) is employed for joint timing and carrier phase recovery, formulating the problem as maximizing the log-likelihood function based on the (PSK) or (QAM) signal model. For timing, the MLE derives from the derivative of the likelihood, leading to nondata-aided detectors that avoid decisions on symbols. Carrier phase estimation similarly uses the argmax of the , often implemented via phase-locked loops. A foundational timing-error detector, derived from MLE principles, operates on sampled receivers for BPSK/QPSK modulations, achieving low in burst-mode transmissions. These estimators are particularly effective in AWGN, where the Cramér-Rao lower bound (CRLB) is approached at high signal-to-noise ratios (SNRs). For dynamic systems in , such as tracking varying channels in mobile communications, the serves as a recursive Bayesian estimator. It sequentially updates the posterior distribution of the —representing parameters like fading coefficients or Doppler shifts—using a linear Gaussian model: the prediction step propagates the prior via the , while the update incorporates new measurements via Bayes' rule. This yields the minimum mean squared error estimate under Gaussian assumptions, making it ideal for time-varying environments like vehicular wireless links. The filter's recursive nature reduces compared to batch methods, enabling real-time implementation in receivers. Its formulation as an optimal Bayesian solution for linear dynamics was established in foundational work on filtering problems. In systems, estimation theory bounds the accuracy of estimation, crucial for and in networks. The CRLB provides a fundamental limit on the variance of unbiased DOA estimators, derived from the matrix for the array manifold model. For or communication arrays, the bound accounts for the virtual aperture formed by transmit-receive pairs, showing that DOA variance decreases with the number of antennas and SNR. Analysis reveals that colocated MIMO configurations achieve a CRLB scaling as 1/(M N SNR), where M and N are the number of transmit and receive elements, respectively, outperforming traditional SIMO setups. This limit guides design, ensuring estimation errors do not degrade . Estimation accuracy directly impacts communication performance, particularly bit error probability (BEP) in . Imperfect estimates introduce residual , elevating the effective and increasing BEP for schemes like M-QAM. For OFDM systems, studies show that estimation errors degrade BEP, especially in nonlinear , underscoring the need for robust estimators to maintain low error rates in practical wireless links. As of 2025, estimation theory plays a key role in wireless networks, where -enhanced channel estimation supports integrated sensing and communication (ISAC). AI-driven techniques, such as for pilot-based estimation, improve accuracy in massive and terahertz bands, enabling ultra-reliable low-latency applications.

Machine Learning and Control Systems

In and systems, estimation theory underpins adaptive algorithms that iteratively refine parameters or states in dynamic environments, enabling prediction, filtering, and control. These applications often involve sequential data processing where estimators must balance , variance, and computational efficiency, particularly under or non-stationarity. Parameter estimation in linear models and state reconstruction in feedback loops exemplify how estimation techniques extend beyond static to decision-making, with robustness becoming critical in high-dimensional or mismatched scenarios. Parameter estimation in serves as a foundational technique in for fitting models to data, where the ordinary (OLS) minimizes the sum of squared residuals but suffers from high variance in the presence of or high dimensionality. To address this, introduces a through L2 regularization, shrinking coefficients toward zero by adding a penalty term λ||β||² to the loss function, which reduces at the cost of slight , especially effective when predictors outnumber observations. This biased , proposed by Hoerl and Kennard, stabilizes predictions in ill-conditioned problems common to ML tasks like genomic analysis or recommender systems. In adaptive filtering, recursive least squares (RLS) algorithms enable online parameter updates by incrementally minimizing a cost, avoiding full matrix recomputation through efficient recursions for the inverse correlation matrix via the matrix inversion lemma. This makes RLS suitable for time-varying systems, such as echo cancellation or channel equalization, where it converges faster than gradient-descent methods like LMS, albeit with higher complexity O(p²) per update for p parameters. Detailed in Haykin's adaptive filter theory, RLS exemplifies deterministic in sequential settings, with forgetting factors λ < 1 allowing to non-stationary signals. State estimation in control systems relies on observers to reconstruct unmeasurable states from outputs, with the Luenberger observer providing a deterministic full-order for linear time-invariant systems via a gain matrix L that ensures error dynamics stability through pole placement. In contrast, the incorporates stochastic noise models, optimally estimating states by minimizing covariance under Gaussian assumptions, outperforming the Luenberger in noisy environments by fusing predictions and measurements via Kalman gain K. Luenberger's original formulation targets noise-free reconstruction, while Kalman's approach handles process and measurement noise, making it preferable for systems like aircraft navigation where uncertainty dominates. In , value function estimation approximates expected returns using temporal difference (TD) methods, which update estimates via : V(s) ← V(s) + α [r + γ V(s') - V(s)], where α is the , r the reward, γ the discount factor, and s' the next state, reducing bias compared to methods through eligibility traces for multi-step lookahead. Sutton's seminal work established TD(λ) as a model-free bridging dynamic programming and , enabling scalable value approximation in MDPs for applications like game playing or . Robustness to model mismatch in high-dimensional settings is essential, as estimators like OLS or standard Kalman can degrade under specification errors, such as unmodeled dynamics or outliers. Techniques like robust M-estimators or H∞ filtering minimize worst-case error bounds, with demonstrating resilience in high-p regimes by controlling variance explosion, while adaptive observers adjust gains online to handle parametric uncertainties, ensuring stability margins in loops like power systems. In ML, double-descent phenomena—where test error decreases again after in overparameterized models—highlight recovery from mismatch via implicit regularization in high-dimensional linear models. Recent advances as of 2025 integrate estimation theory with in ML, such as Bayesian neural networks for in large language models and robust state estimation in autonomous systems using neural Kalman filters.

References

  1. [1]
    [PDF] Some Prerequisite Estimation Theory
    Estimation theory provides a rigorous framework for doing so once we have been given a statistical model. An important prerequisite of any real world ...
  2. [2]
    [PDF] Brief Review on Estimation Theory
    Definition and applications. The statistics represent the set of methods that ... The estimation theory allows us to characterize 'good estimators'.
  3. [3]
    [PDF] Estimation Theory EE6343
    Estimation : continuous-valued unknown parameters (e.g. SNR, frequency, amplitude, phase, location, temperature,…) • Detection : determine whether there is ...
  4. [4]
    [PDF] On the Mathematical Foundations of Theoretical Statistics Author(s)
    (17) R. A. FISHER (1922). " The Interpretation of X2 from Contingency Tables, and the Calculation of P," 'J.R.S.S.,' lxxxv., pp. 87-94. (18) K. PEARSON ...
  5. [5]
    [PDF] Chapter 4 Parameter Estimation - MIT
    In the context of parameter estimation, the likelihood is naturally viewed as a function of the parameters θ to be estimated, and is. defined as in Equation (2 ...
  6. [6]
    [PDF] STOCHASTIC PROCESSES, DETECTION AND ESTIMATION
    This chapter of the notes provides a fairly self-contained introduction to the fun- damental concepts and results in estimation theory.
  7. [7]
    A History of Parametric Statistical Inference from Bernoulli to Fisher ...
    This is a history of parametric statistical inference, written by one of the most important historians of statistics of the 20th century, Anders Hald.
  8. [8]
    Gauss on least-squares and maximum-likelihood estimation
    Apr 2, 2022 · Gauss' 1809 discussion of least squares, which can be viewed as the beginning of mathematical statistics, is reviewed.
  9. [9]
    R.A. Fisher and the making of maximum likelihood 1912-1922
    In 1922 R. A. Fisher introduced the method of maximum likelihood. He first presented the numerical procedure in 1912. This paper considers Fisher's changing ...
  10. [10]
    Introduction to Neyman and Pearson (1933) On the Problem of the ...
    Neyman, J., and Pearson, ES (1933). On the problem of the most efficient tests of statistical hypotheses, Phil. Trans. R. Soc., Ser. A, 231, 289–337.Missing: framework 1930s
  11. [11]
    Harold Jeffreys's Theory of Probability Revisited - Project Euclid
    Published exactly seventy years ago, Jeffreys's Theory of Probability (1939) has had a unique impact on the Bayesian community and is now considered to be ...
  12. [12]
    [PDF] A Short History of Markov Chain Monte Carlo - arXiv
    Jan 9, 2012 · While Monte Carlo methods were in use by that time, MCMC was brought closer to statistical prac- ticality by the work of Hastings in the 1970s.Missing: post- | Show results with:post-
  13. [13]
    Introduction to Rao (1945) Information and the Accuracy Attainable ...
    The object of this introduction is to present a brief account of a paper that remains an unbroken link in the continuing evolution of modern statistics.
  14. [14]
    (PDF) The historical development of robust statistics - ResearchGate
    We focus on the historical development of robust statistics by highlighting its contributions to the general development of statistical theory and ...
  15. [15]
    [PDF] Estimation theory
    “Point estimation” refers to the decision problem we were talking about last class: we observe data Xi drawn i.i.d. from pθ(x)16, and our goal is to.Missing: history | Show results with:history
  16. [16]
    Statistical Inference and Estimation | STAT 504
    An estimator is particular example of a statistic, which becomes an estimate when the formula is replaced with actual observed sample values. Point estimation = ...
  17. [17]
    [PDF] Overview of Estimation - Arizona Math
    The raw material for our analysis of any estimator is the distribution of the random variables that underlie the data under any possible value θ of the ...
  18. [18]
    Point Estimation | STAT 504 - STAT ONLINE
    Point estimation and interval estimation, and hypothesis testing are three main ways of learning about the population parameter from the sample statistic.
  19. [19]
    [PDF] Confidence Intervals Point Estimation Vs - Stat@Duke
    Point estimation gives a single value, while interval estimation, like confidence intervals, gives a range of values likely containing the parameter.
  20. [20]
    [PDF] 5: Introduction to Estimation
    Estimation is a form of statistical inference, generalizing from a sample to a population. It includes point and interval estimation.
  21. [21]
    1.3 - Unbiased Estimation | STAT 415 - STAT ONLINE
    A natural question then is whether or not these estimators are "good" in any sense. One measure of "good" is "unbiasedness." Bias and Unbias Estimator. If ...
  22. [22]
    [PDF] Properties of Estimators I 7.6.1 Bias
    The first estimator property we'll cover is Bias. The bias of an estimator measures whether or not in expectation, the estimator will be equal to the true ...
  23. [23]
    [PDF] Plugin estimators and the delta method 17.1 Estimating a function of θ
    The obvious way to estimate g(θ) is to use g(ˆθ), where ˆθ is an estimate (say, the MLE) of θ. This is called the plugin estimate of g(θ), because we are just “ ...
  24. [24]
    24.1 - Definition of Sufficiency | STAT 415
    Sufficiency is the kind of topic in which it is probably best to just jump right in and state its definition. Let's do that! Sufficient. Let \(X_1, X_2, \ldots, ...
  25. [25]
    [PDF] Properties of Estimators III 7.8.1 Sufficiency
    All estimators are statistics because they take in our n data points and produce a single number. We'll see an example which intuitively explains what it means ...
  26. [26]
    [PDF] Regular Parametric Models and Likelihood Based Inference
    A parametric model is a family of probability distributions P, such that there exists some (open) subset of a finite dimensional Euclidean space, say Θ, such ...
  27. [27]
    [PDF] Sufficient statistics with nuisance parameters. - University of Toronto
    For some problems involving a parameter of interest and a nuisance parameter, it is possible to define a statistic sufficient for the parameter of interest. The ...<|separator|>
  28. [28]
    [PDF] Identification in Parametric Models - NYU Stern
    The identification problem concerns drawing inferences from observed samples to an underlying structure. A structure is identifiable if no other equivalent ...
  29. [29]
    1.2 - Maximum Likelihood Estimation | STAT 415
    Maximum likelihood estimation finds the parameter value that maximizes the probability of observing the observed data. The likelihood function is used to find ...
  30. [30]
    Chapter 3 Fundamentals of Bayesian Inference - Bookdown
    A key distinction between Bayesian and Frequentists is how uncertainty regarding parameter θ θ is treated. Frequentists views parameters as fixed and ...
  31. [31]
    [PDF] Bayesian Techniques for Parameter Estimation
    o Parameters may be unknown but are fixed and deterministic. Bayesian: Interpretation of probability is subjective and can be updated with new data ...
  32. [32]
    [PDF] 6 Classic Theory of Point Estimation - Purdue Department of Statistics
    Lehmann, E. L. and Casella, G. (1998). Theory of Point Estimation, Springer, New York. Lehmann, E. L. and Scheffe, H. (1950). Completeness, similar regions ...
  33. [33]
    [PDF] STAT 714 LINEAR STATISTICAL MODELS
    EXAMPLE: Recall the simple linear regression model from Chapter 1 given by ... iid sample from fX(x;θ) and let X = (X1,X2, ..., Xn)0. Suppose also that bθ1 ...
  34. [34]
    [PDF] Asymptotic Properties of Maximum Likelihood Estimators
    We will now show that the MLE is asymptotically normally distributed, and asymptotically unbiased and efficient, i.e.. ˆθn a. ∼ Nd{θ, i(θ)−1/n}. The central ...
  35. [35]
    III. Contributions to the mathematical theory of evolution - Journals
    Contributions to the mathematical theory of evolution. Karl Pearson ... Xiao Z (2010) The weighted method of moments approach for moment condition ...
  36. [36]
    Gauss and the Invention of Least Squares - Project Euclid
    It is argued (though not conclusively) that Gauss probably possessed the method well before Legendre, but that he was unsuccessful in communicating it to his ...
  37. [37]
    [PDF] Legendre On Least Squares - University of York
    His work on geometry, in which he rearranged the propositions of Euclid, is one of the most successful textbooks ever written. On the Method of Least Squares.
  38. [38]
    [PDF] On the Mathematical Foundations of Theoretical Statistics
    Apr 18, 2021 · the method of maximum likelihood appears from the following considerations. If the individual values of any sample of data are regarded as co- ...
  39. [39]
    [PDF] Stat 5102 Notes: More on Confidence Intervals
    Feb 24, 2003 · Pivotal quantities allow the construction of exact confidence intervals, mean- ing they have exactly the stated confidence level, ...
  40. [40]
    LII. An essay towards solving a problem in the doctrine of chances ...
    An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, FRS communicated by Mr. Price, in a letter to John Canton, AMFR S.
  41. [41]
    Conjugate Priors for Exponential Families - jstor
    Introduction. Modem Bayesian statistics is dominated by the notion of conjugate priors. The usual definition is that a family of priors is conjugate if it ...<|separator|>
  42. [42]
    Information and the Accuracy Attainable in the Estimation of ...
    Information and the Accuracy Attainable in the Estimation of Statistical Parameters. Chapter. pp 235–247; Cite this chapter. Download book PDF · Breakthroughs ...
  43. [43]
    [PDF] Lecture 15 — Fisher information and the Cramer-Rao bound 15.1 ...
    15.2 The Cramer-Rao lower bound. Let's return to the setting of a single parameter θ ∈ R. Why is the Fisher information I(θ) called “information”, and why ...
  44. [44]
    Multivariate Cramer-Rao inequality - EFAVDB
    Jun 20, 2015 · The Cramer-Rao inequality provides a lower bound on the covariance matrix of unbiased estimators, showing a limit to how tightly centered they ...
  45. [45]
    Minimum Variance Estimation Without Regularity Assumptions
    A lower bound for the variance of estimators is obtained which is (a) free from regularity assumptions and (b) at least equal to and in some cases greater than ...Missing: original | Show results with:original
  46. [46]
    Comparison of Some Bounds in Estimation Theory - Project Euclid
    Conditions are given for the attainment of the Hammersley-Chapman-Robbins bound for the variance of an unbiased estimator, in both regular and nonregular cases.Missing: original | Show results with:original
  47. [47]
    [PDF] 29.1 Hammersley-Chapman-Robbins (HCR) lower bound - People
    In this section, we derive a useful statistical lower bound by applying the variational representation of f-divergence in Section 7.5.Missing: seminal | Show results with:seminal
  48. [48]
    [PDF] Some Information Inequalities for Statistical Inference - arXiv
    Feb 13, 2018 · Chapman-Robbins bound and Bhattacharyya Bounds in both regular and non- regular cases. This is done by imposing the regularity conditions on ...Missing: seminal | Show results with:seminal
  49. [49]
    [PDF] A Historical Perspective on the Schützenberger-van Trees Inequality ...
    Abstract. The Bayesian Cramér-Rao Bound (BCRB) is generally at- tributed to Van Trees who published it in 1968. According to Stigler's law.
  50. [50]
  51. [51]
    Estimation in Discrete Parameter Models - Project Euclid
    Therefore, we discuss consistency, asymptotic distribu- tion theory, information inequalities and their relations with efficiency and superefficiency for a ...
  52. [52]
    [PDF] 27 Superefficiency
    ABSTRACT We review the history and several proofs of the famous result of Le Cam that a sequence of estimators can be superefficient on at most a. Lebesgue null ...Missing: non- seminal
  53. [53]
    [PDF] Minimax Robust Detection: Classic Results and Recent Advances
    May 20, 2021 · The birth of robust detection as a self-contained branch of robust statistics can be dated to a seminal paper by Huber [40], published in 1965.
  54. [54]
    [PDF] Cramér-Rao Bound (CRB) and Minimum Variance Unbiased (MVU ...
    Since Ep(x;θ)[T(X)] = ψ(θ), we can view T(X) as an unbiased estimator of ψ = ψ(θ); then (2) gives a lower bound on variance of T(X), expressed in terms of the ...
  55. [55]
    Confidence interval for the mean - StatLect
    Learn how to derive the confidence interval for the mean of a normal distribution. Read detailed proofs and try some solved exercises.
  56. [56]
    [PDF] Conjugate Bayesian analysis of the Gaussian distribution
    Oct 3, 2007 · The use of conjugate priors allows all the results to be derived in closed form.
  57. [57]
    A New Approach to Linear Filtering and Prediction Problems
    Kalman, R. E. (March 1, 1960). "A New Approach to Linear Filtering and Prediction Problems." ASME. J. Basic Eng. March 1960; 82(1): 35–45. https://doi.org ...Missing: original | Show results with:original<|separator|>
  58. [58]
    (PDF) Cramer-Rao Lower Bounds for the Joint Estimation of Target ...
    We first derive the CRLB for MIMO radars with colocated antennas for estimating the target's direction of arrival, range and range-rate. We then demonstrate ...
  59. [59]
    Bit Error Rate Analysis for an OFDM System with Channel ...
    Mar 13, 2011 · In this paper, we present closed-form BER expression for OFDM with a pilot-assisted CE in a nonlinear and frequency-selective fading channel.
  60. [60]
    Ridge Regression: Biased Estimation for Nonorthogonal Problems
    Proposed is an estimation procedure based on adding small positive quantities to the diagonal of X′X. Introduced is the ridge trace, a method for showing in two ...
  61. [61]
    Robustness to Incorrect System Models in Stochastic Control
    Compared to the existing literature, we obtain strictly refined robustness results that are applicable even when the incorrect models can be investigated under ...