Fact-checked by Grok 2 weeks ago

Bayes estimator

In , the Bayes estimator is a point estimate of an unknown that minimizes the posterior expected loss, where the posterior distribution is obtained by updating a distribution with observed data via . This approach treats the parameter as a , contrasting with frequentist methods that view parameters as fixed unknowns. The specific form of the Bayes estimator depends on the chosen loss function, which quantifies the penalty for estimation error. For the squared error loss L(\theta, \hat{\theta}) = (\theta - \hat{\theta})^2, the Bayes estimator is the mean of the posterior distribution. Under absolute error loss L(\theta, \hat{\theta}) = |\theta - \hat{\theta}|, it is any median of the posterior distribution. For 0-1 loss L(\theta, \hat{\theta}) = I(\theta \neq \hat{\theta}) with discrete parameters, the estimator is the posterior mode. Formally, the Bayes estimator \delta_\pi(x) minimizes the Bayes risk R(\pi, \delta) = \int R(\theta, \delta) \pi(d\theta), where R(\theta, \delta) is the frequentist risk and \pi is the prior distribution; this is equivalent to, for each observed x, minimizing the posterior risk r(\delta|x) = \int L(\theta, \delta(x)) \pi(\theta|x) \, d\theta. Bayes estimators may not always exist or be unique, but under conditions like strict convexity of the loss and of measures, they are unique and often admissible. In practice, computation involves deriving the posterior \pi(\theta|x) \propto f(x|\theta) \pi(\theta) and then evaluating the minimizer of the expected loss. For conjugate priors, such as a beta prior for a binomial parameter, the posterior is also beta, and the Bayes estimator under squared loss is the posterior mean (a + s)/(a + b + n), where s is the number of successes in n trials and (a, b) are prior parameters. This framework, rooted in Bayesian decision theory, allows incorporation of prior knowledge and provides a coherent way to handle uncertainty in estimation.

Fundamentals

Definition

In , a Bayes estimator \delta^\pi for an unknown \theta with respect to a \pi is defined as any decision rule that minimizes the Bayes r(\pi, \delta) = E_\pi [R(\theta, \delta)], where R(\theta, \delta) = E_{X|\theta} [L(\theta, \delta(X))] is the , L(\theta, a) is the incurred by taking action a when the true is \theta, and the is taken over the joint of \theta and the observed X induced by the \pi and the sampling model. This minimization ensures that the estimator achieves the optimal average performance under the specified and , distinguishing it from frequentist estimators that focus on long-run average properties without incorporating beliefs. The Bayes risk can equivalently be expressed in terms of the posterior expected loss: r(\pi, \delta) = \int \left[ \int L(\theta, a) \, \pi(\theta \mid x) \, d\theta \right] \pi_X(x) \, dx, where L(\theta, a) is the loss function, \pi(\theta \mid x) is the posterior distribution of \theta given data x, and \pi_X is the marginal distribution of the data. For each fixed x, the inner integral represents the posterior risk, and the Bayes estimator \delta^\pi(x) is chosen to minimize this posterior expected loss \int L(\theta, a) \, \pi(\theta \mid x) \, d\theta pointwise. This posterior minimization property implies that Bayes estimators are inherently adaptive to the observed data through the posterior. A key prerequisite for constructing a Bayes estimator is Bayesian updating, which yields the posterior \pi(\theta \mid x) \propto p(x \mid \theta) \, \pi(\theta), where p(x \mid \theta) is the ; this updating rule combines prior beliefs with evidence from the without requiring further here. Common functions include the squared error L(\theta, a) = (\theta - a)^2, under which the Bayes estimator simplifies to the posterior mean \delta^\pi(x) = E[\theta \mid x], providing a natural measure of in the posterior . Other es, such as absolute error, lead to the posterior as the estimator.

Bayesian Framework

The Bayesian framework provides the probabilistic foundation for Bayes estimators, originating from ' 1763 essay, which introduced the idea of revising probabilities based on new evidence through what is now known as . This concept was formalized and extended by in the late 18th and early 19th centuries, particularly in his 1774 memoir, where he derived the general form of the theorem and applied it to problems. At its core, the framework combines prior knowledge about an unknown \theta with observed data x to form updated beliefs, enabling principled inference and decision-making under uncertainty. Central to this framework is the prior distribution \pi(\theta), which encodes the researcher's beliefs or information about \theta prior to observing the data. Priors can be subjective, reflecting domain-specific knowledge or historical data, or objective, aiming to incorporate minimal assumptions about \theta. A prominent example of an objective prior is the , proposed by in 1939, which is derived from the matrix to ensure invariance under parameter transformations and thus provides a non-informative starting point for . The p(x \mid \theta), in contrast, quantifies the probability of observing the data x as a function of \theta, modeling the data-generating process under the assumed parametric family. Bayes' theorem then combines the prior and likelihood to yield the posterior distribution \pi(\theta \mid x), which represents the updated beliefs about \theta after accounting for the data: \pi(\theta \mid x) = \frac{p(x \mid \theta) \, \pi(\theta)}{m(x)}, where the normalizing constant m(x) = \int p(x \mid \theta) \, \pi(\theta) \, d\theta is the of the data. This posterior serves as the basis for all subsequent Bayesian analysis. From a decision-theoretic perspective, estimation within the Bayesian framework is a special case of , where the goal is to select a point estimate \hat{\theta}(x) that minimizes the expected posterior risk under a chosen L(\theta, \hat{\theta}). This approach, formalized in works like DeGroot's 1970 treatise, treats estimators as actions in a , balancing the between and variance through the posterior to achieve optimal performance relative to the specified loss.

Construction Methods

Posterior-Based Estimators

In , the Bayes estimator \delta(x) for a \theta based on observed x is derived by minimizing the posterior expected with respect to the action a, given by \delta(x) = \arg\min_a \int L(\theta, a) \, \pi(\theta \mid x) \, d\theta, where L(\theta, a) is the loss function and \pi(\theta \mid x) is the posterior distribution of \theta given x. This general rule ensures that the estimator achieves the minimal Bayes risk under the specified loss, integrating uncertainty from the posterior over all possible parameter values. For the squared error loss L(\theta, a) = (\theta - a)^2, the Bayes estimator is the posterior \delta(x) = E[\theta \mid x]. This follows from the fact that the value minimizing the expected squared deviation E[(\theta - a)^2 \mid x] is the , as expanding the expression yields E[(\theta - a)^2 \mid x] = \text{Var}(\theta \mid x) + (E[\theta \mid x] - a)^2, which is minimized when a = E[\theta \mid x]. The posterior thus provides an optimal balance of and variance in the Bayesian sense, shrinking toward the in light of the data. Under absolute error loss L(\theta, a) = |\theta - a|, the Bayes estimator is the posterior median \delta(x) = \text{median}(\pi(\theta \mid x)). To see this, note that the posterior expected loss is E[|\theta - a| \mid x] = \int_{-\infty}^a (a - \theta) \pi(\theta \mid x) \, d\theta + \int_a^\infty (\theta - a) \pi(\theta \mid x) \, d\theta; differentiating with respect to a gives \frac{d}{da} E[|\theta - a| \mid x] = 2 \int_{-\infty}^a \pi(\theta \mid x) \, d\theta - 1, which equals zero when the cumulative posterior up to a is 1/2, corresponding to the median. This choice is robust to outliers in the posterior, prioritizing central tendency over squared deviations. For the 0-1 L(\theta, a) = 0 if \theta = a and 1 otherwise (often used in settings), the Bayes estimator is the posterior mode \delta(x) = \arg\max_\theta \pi(\theta \mid x), known as the maximum (MAP) estimator. This minimizes the of error by selecting the most probable value under the posterior distribution. More generally, for L_p L(\theta, a) = |\theta - a|^p with p \geq 1, the Bayes \delta(x) minimizes E[|\theta - a|^p \mid x], which corresponds to the p-norm minimizer of the posterior distribution. Specific cases include the posterior mean for p=2, the posterior median for p=1, and approximations to the posterior mode as p \to \infty. This framework allows tailoring the estimator to the desired robustness or via the choice of p. In cases where the posterior is analytically tractable, such as with conjugate priors, these estimators can be computed explicitly.

Conjugate Prior Approach

The conjugate prior approach in Bayesian estimation involves selecting a distribution from a that ensures the posterior distribution belongs to the same after incorporation of the likelihood, thereby simplifying computations through updated hyperparameters. Specifically, for a \theta with prior \pi(\theta) and likelihood L(x|\theta), the posterior \pi(\theta|x) \propto \pi(\theta) L(x|\theta) retains the prior's functional form, with parameters adjusted to reflect the data. This conjugacy is particularly useful within exponential families, where the structure facilitates analytical updates without altering the distributional class. A primary of conjugate priors is the availability of closed-form expressions for the posterior, which enables direct derivation of Bayes estimators and avoids the need for methods. This computational tractability was especially valuable in early Bayesian applications, allowing efficient posterior inference and predictive distributions. For instance, under squared error loss, the Bayes estimator is the posterior mean, which can be computed explicitly from the updated hyperparameters. In the beta-binomial model, a Beta(\alpha, \beta) prior for the success probability \theta in binomial data with s successes in n trials yields a Beta(\alpha + s, \beta + n - s) posterior. The Bayes estimator under squared loss is then the posterior mean \frac{\alpha + s}{\alpha + \beta + n}. Similarly, for normally distributed data with known variance and a normal prior N(\mu_0, \sigma_0^2), the posterior is also normal, and the Bayes estimator for the mean is a precision-weighted average: \mu_{\text{post}} = \frac{\sigma_0^2}{\sigma^2 / n + \sigma_0^2} \bar{x} + \frac{\sigma^2 / n}{\sigma^2 / n + \sigma_0^2} \mu_0, where \bar{x} is the sample mean and \sigma^2 the known variance. While conjugacy offers significant convenience, it is not always realistic, as suitable conjugate families may not align with substantive , constraining model flexibility. In non-conjugate scenarios, alternative methods such as are required for posterior approximation.

Variants

Generalized Bayes Estimators

Generalized Bayes estimators extend the Bayesian framework by employing improper priors, which are prior distributions π(θ) that do not integrate to a finite value over the parameter space, i.e., ∫ π(θ) dθ = ∞. Despite the impropriety of the prior, the resulting posterior distribution remains proper under certain conditions where the likelihood provides sufficient , ensuring that the posterior integrates to 1. Formally, a generalized Bayes estimator δ^π(x) is defined as the rule that minimizes the posterior with respect to this improper prior, often derived as the limit of Bayes estimators based on a sequence of proper priors {π_n} that approximate π as n → ∞. The primary motivation for generalized Bayes estimators arises from the need for or noninformative priors in scenarios lacking strong prior knowledge, such as reference priors that treat parameter values equitably without favoring specific regions. Improper priors, like the uniform flat prior on unbounded spaces, facilitate this objectivity, particularly for parameters where invariance under translation is desired. This approach allows Bayesian methods to align with frequentist ideals, such as yielding maximum likelihood estimators in limiting cases, while retaining the benefits of posterior risk minimization. The derivation follows the standard Bayes estimator procedure but accommodates the improper prior: the posterior density is proportional to the likelihood times the prior, π(θ|x) ∝ f(x|θ) π(θ), and the estimator minimizes the expected loss under this posterior, such as the posterior mean for squared error loss. Although the Bayes risk ∫ R(θ, δ^π) π(dθ) may be infinite due to the improper prior, the posterior risk remains well-defined and finite when the posterior is proper, preserving the minimax properties in many cases. A classic example occurs in estimating the mean μ of a with known variance σ² = 1, based on i.i.d. observations X_1, ..., X_n ~ N(μ, 1), using the improper flat π(μ) ∝ 1. The posterior distribution is then N(\bar{x}, 1/n), and the generalized Bayes estimator under squared error loss is the sample mean \bar{x}, which coincides with the maximum likelihood estimator. Key caveats include the necessity of verifying that the posterior is proper for the given data and model; otherwise, is invalid. Additionally, not all improper priors yield admissible estimators, as some may lead to dominated risks in high dimensions, though many common choices like the flat for location parameters remain admissible.

Empirical Bayes Estimators

Empirical Bayes estimators treat the hyperparameters \lambda of a parametric prior distribution \pi(\theta \mid \lambda) as unknown quantities to be inferred from the observed data, rather than fixed a priori. This data-driven estimation of the prior distinguishes empirical Bayes from standard Bayesian methods and enables the construction of estimators that approximate optimal Bayes rules without requiring subjective prior specification. The approach is categorized into Type I empirical Bayes, which maximizes the marginal likelihood over \lambda, and Type II empirical Bayes, which estimates \lambda via methods such as moment matching between the empirical and theoretical marginal distributions. The core procedure for obtaining an empirical Bayes estimator involves first specifying a parametric family \pi(\theta \mid \lambda). The of the is formed as m(\mathbf{x} \mid \lambda) = \int L(\mathbf{x} \mid \theta) \, \pi(\theta \mid \lambda) \, d\theta, where L(\mathbf{x} \mid \theta) denotes the . An estimate \hat{\lambda} is then obtained by maximizing m(\mathbf{x} \mid \lambda) with respect to \lambda, often via numerical optimization or closed-form solutions when conjugate priors are used. With \hat{\lambda} in hand, the posterior \pi(\theta \mid \mathbf{x}, \hat{\lambda}) is computed, and the estimator is typically the posterior \mathbb{E}[\theta \mid \mathbf{x}, \hat{\lambda}] or mode, effectively plugging the empirical prior into the standard Bayes formula. This yields a point that mimics the performance of a fully Bayesian rule with the true \lambda, particularly effective when the provide reliable information about the . A landmark illustration of empirical Bayes estimation is the James-Stein estimator in the problem of estimating p \geq 3 independent normal means \theta = (\theta_1, \dots, \theta_p)^\top from observations \mathbf{X} = (X_1, \dots, X_p)^\top where X_i \stackrel{\text{iid}}{\sim} N(\theta_i, 1). Positing a hierarchical model with \theta_i \stackrel{\text{iid}}{\sim} N(0, \tau^2), the Bayes posterior mean shrinks each X_i toward the grand mean (here 0) by a factor depending on the signal-to-noise ratio \tau^2 / (1 + \tau^2). Estimating \tau^2 empirically from the marginal X_i \sim N(0, 1 + \tau^2) leads to the James-Stein rule: \hat{\theta}_i^{\text{JS}} = \left(1 - \frac{p-2}{\|\mathbf{X}\|^2}\right) X_i, where the data-dependent shrinkage factor (p-2)/\|\mathbf{X}\|^2 adapts to the observed variability, pulling estimates toward 0 when \|\mathbf{X}\|^2 is small relative to p. This estimator dominates the maximum likelihood estimator \hat{\theta}_i = X_i in integrated mean squared error, with the supremum of the risk at most p(1 - 2/p), or a factor of $1 - 2/p times that of the MLE, as p grows large, resolving Stein's paradox by achieving inadmissibility improvement in high dimensions. The empirical Bayes framing was formalized by Efron and Morris, who showed the shrinkage arises naturally from marginal moment matching. Empirical Bayes methods excel in bridging Bayesian and frequentist paradigms, delivering estimators with strong frequentist risk properties while incorporating prior structure learned from data, which is especially valuable in high-dimensional problems where naive estimators like the MLE fail. They facilitate automatic shrinkage and stabilization, reducing variance without excessive bias, and have proven influential in applications requiring estimation across many similar units, such as and . Despite these strengths, empirical Bayes estimators are criticized for undermining Bayesian coherence, as the data-dependent prior \pi(\theta \mid \hat{\lambda}) conflates prior and likelihood information, potentially violating the and leading to inconsistent . In finite samples, the estimation of \lambda can introduce or inflated type I errors, particularly if the parametric prior family is misspecified, though asymptotic consistency often mitigates these issues in large datasets.

Properties

Admissibility

In decision theory, an estimator \delta is admissible if there does not exist another estimator \delta' such that the risk function R(\theta, \delta') \leq R(\theta, \delta) for all parameter values \theta, with strict inequality holding for at least one \theta. This property ensures that no better estimator exists in terms of average performance across the parameter space under the given loss function. Proper Bayes estimators, derived from priors with finite total mass, are generally admissible when the Bayes risk is finite, particularly under bounded loss functions where the risk remains controlled. Uniqueness of the Bayes estimator further guarantees admissibility, as any dominating estimator would contradict the risk-minimizing property of the Bayes rule. In contrast, generalized Bayes estimators, often based on improper priors, may be inadmissible; a prominent example is the Stein phenomenon in estimating the mean of a multivariate normal distribution, where the maximum likelihood estimator (equivalent to a generalized Bayes estimator under a flat improper prior) fails to be admissible for dimensions three or higher. Stein's theorem establishes that the usual unbiased estimator (the maximum likelihood estimator) of the mean vector in a with dimension p \geq 3 is inadmissible under squared error loss. The James-Stein estimator, an empirical Bayes shrinkage rule, addresses this by dominating the maximum likelihood estimator, achieving lower risk across the parameter space. Admissibility conditions for Bayes estimators often rely on the structure of families, where or constancy of the facilitates verification. Improper priors can introduce inadmissibility unless the generalized Bayes estimator is viewed as a of proper Bayes rules with carefully converging ; otherwise, issues like those in the multivariate case arise.

Asymptotic Efficiency

Under regularity conditions in parametric models, Bayes exhibit asymptotic equivalence to the maximum likelihood estimator (MLE). Specifically, as the sample size n approaches infinity, the Bayes estimator converges in probability to the MLE and attains the Cramér-Rao bound, achieving the minimal asymptotic variance given by the inverse matrix, regardless of the distribution. This equivalence is formalized by the Bernstein-von Mises , which establishes that the posterior distribution, after suitable centering and normalization, converges in total to a centered at the MLE with covariance equal to the inverse . A key aspect of this asymptotic behavior is posterior consistency, where the posterior concentrates on the true value \theta_0 as n \to \infty. This holds if the assigns positive mass to any neighborhood of \theta_0, and the rate of concentration—typically \sqrt{n} in well-behaved models—aligns with that of frequentist estimators like the MLE. Seminal results confirm this concentration under mild conditions on the model and , ensuring that Bayesian credible intervals asymptotically match frequentist confidence intervals. For illustration, consider estimating the success probability p in a with s successes in n trials under a uniform prior. The Bayes estimator is \frac{s+1}{n+2}, which asymptotically equals the MLE \frac{s}{n}. Both estimators satisfy \sqrt{n} \left( \hat{p} - p \right) \xrightarrow{d} \mathcal{N}\left(0, p(1-p)\right), demonstrating shared asymptotic and . Bayes estimators also show robustness to prior misspecification in asymptotic regimes. Mildly incorrect priors preserve consistency and normality of the posterior, yielding estimators that still converge to the pseudo-true parameter under model misspecification. However, strong misspecification can introduce persistent bias, deviating from the MLE and potentially violating efficiency. In modern extensions beyond standard parametric settings, Bayesian methods can achieve superior asymptotic performance. Nonparametric Bernstein-von Mises theorems hold for Gaussian white noise models under priors like Gaussian processes or series expansions, enabling Bayes procedures to deliver efficient frequentist inference. In high-dimensional regimes where the dimension grows with n, certain priors allow Bayesian estimators to outperform the MLE in mean squared error across much of the parameter space.

Examples

Binomial Distribution Estimation

A classic example of Bayes estimation arises in inferring the success probability p of a binomial distribution from n independent and identically distributed Bernoulli trials, where s successes are observed. The likelihood function is given by the binomial probability mass function: P(S = s \mid p) = \binom{n}{s} p^s (1-p)^{n-s}, for $0 < p < 1. The Beta distribution with parameters \alpha > 0 and \beta > 0 serves as a conjugate prior for p, with density \pi(p) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha) \Gamma(\beta)} p^{\alpha - 1} (1-p)^{\beta - 1}. The resulting posterior distribution is also Beta, specifically \text{Beta}(\alpha + s, \beta + n - s). Under squared error loss L(p, \hat{p}) = (p - \hat{p})^2, the Bayes estimator is the posterior mean, \hat{p} = \frac{\alpha + s}{\alpha + \beta + n}.[](https://stat210a.berkeley.edu/fall-2025/reader/bayes-estimation.html) Under absolute error loss $L(p, \hat{p}) = |p - \hat{p}|$, the Bayes estimator is the posterior median. For the uniform prior case $\alpha = \beta = 1$, this median lacks a closed-form expression but can be approximated as $\frac{s + 2/3}{n + 4/3}$ for moderate $n$.[](https://arxiv.org/abs/1111.0433) Under 0-1 loss $L(p, \hat{p}) = 1$ if $\hat{p} \neq p$ and 0 otherwise, the Bayes estimator is the posterior mode (maximum a posteriori estimate), given by \[ \hat{p} = \frac{\alpha + s - 1}{\alpha + \beta + n - 2}, provided \alpha + s > 1 and \beta + n - s > 1. With the uniform prior \alpha = \beta = 1, the squared error Bayes simplifies to \frac{s+1}{n+2}, known as Laplace's . This contrasts with the maximum likelihood (MLE) \hat{p}_{\text{MLE}} = \frac{s}{n}. For small n, Laplace's estimator exhibits lower than the MLE by shrinking extreme estimates toward 0.5, thereby reducing ; for example, with n=1 and s=0, the MLE yields 0 while Laplace's yields $1/3. In an empirical Bayes approach with multiple independent datasets, the hyperparameters \alpha and \beta can be estimated by maximizing the under the Beta-Binomial model, treating the datasets as exchangeable; this yields a data-driven that further shrinks individual estimates toward a common value.

Practical Applications

Bayes estimators find widespread application in , particularly through the , which serves as a recursive Bayes estimator for state estimation in linear dynamic systems under assumptions. This approach enables real-time updating of system states, such as tracking positions or filtering noisy data in systems. In , the applies Bayes' rule to compute posterior class probabilities, assuming feature independence to simplify computations for tasks like spam detection and . Additionally, underpin shrinkage techniques, such as , where parameters are estimated by treating them as draws from a distribution, reducing variance in high-dimensional settings like genomic . In , Bayes estimators facilitate modeling in GARCH frameworks by incorporating conjugate gamma priors on variance parameters, allowing for posterior on time-varying risk in asset returns and improving forecast accuracy over frequentist methods. Medical research employs Bayes estimators in clinical trials to estimate success rates of treatments, utilizing priors derived from historical data to update posteriors with new trial outcomes, which enhances efficiency in adaptive designs and studies. For complex models lacking conjugate priors, modern computational tools like MCMC sampling via and PyMC enable approximate Bayes estimation by generating posterior samples, contrasting with closed-form solutions available for conjugate cases and supporting scalable in hierarchical models. A notable in technology involves for web optimization, where Bayes estimators update posteriors incrementally with user interaction data—starting from informative priors on conversion rates—to make data-driven decisions on interface changes, as implemented in platforms like for faster, more reliable experimentation.

References

  1. [1]
    [PDF] Bayes estimators - Stat@Duke
    Oct 23, 2013 · Definition (Bayes estimator). A Bayes estimator is a minimizer of the Bayes risk. The minimizer is specific to the prior π being used. We ...
  2. [2]
    23.2 - Bayesian Estimation | STAT 415 - STAT ONLINE
    A Bayesian might estimate a population parameter. The difference has to do with whether a statistician thinks of a parameter as some unknown constant or as a ...
  3. [3]
    [PDF] Chapter 12 Bayesian Inference - Statistics & Data Science
    If L(θ, bθ) = (θ bθ)2 then the Bayes estimator is the posterior mean. If. L(θ, bθ) = |θ bθ| then the Bayes estimator is the posterior median. If θ is ...
  4. [4]
    LII. An essay towards solving a problem in the doctrine of chances ...
    An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, FRS communicated by Mr. Price, in a letter to John Canton, AMFR S.
  5. [5]
    [PDF] Memoir on the probability of the causes of events - University of York
    Originally published as "Mémoire sur la probabilité des causes par les évène- mens," par M. de la Place, Professeur à l'École royal Militaire, in Mémoires de ...
  6. [6]
    Statistical Decision Theory and Bayesian Analysis - SpringerLink
    In this new edition the author has added substantial material on Bayesian analysis, including lengthy new sections on such important topics as empirical and ...
  7. [7]
    Theory Of Probability : Jeffreys Harold - Internet Archive
    Jan 17, 2017 · Theory Of Probability ; Publication date: 1948 ; Topics: C-DAC ; Collection: digitallibraryindia; JaiGyan ; Language: English ; Item Size: 1.2G.
  8. [8]
    Optimal Statistical Decisions | Wiley Online Books
    Optimal Statistical Decisions ; Author(s):. Morris H. DeGroot, ; First published:16 April 2004 ; Print ISBN:9780471680291 | ; Online ISBN: ...
  9. [9]
    [PDF] Chapter 9 The exponential family: Conjugate priors - People @EECS
    For exponential families the likelihood is a simple standarized function of the parameter and we can define conjugate priors by mimicking the form of the ...
  10. [10]
    [PDF] 1 Priors
    In the BC1 and pre-MCMC erae the conjugate priors have been extensively used (and misused) precisely because of the computational convenience. Nowadays, the ...
  11. [11]
    [PDF] Stat 710: Mathematical Statistics Lecture 1
    If Π(Θ) 6= 1, Π is called an improper prior. δ(x) is called a generalized Bayes action. With no past information, one has to choose a prior subjectively. In ...
  12. [12]
    [PDF] Bayes and approximately Bayes procedures - Stat@Duke
    Jan 27, 2025 · Note that every Bayes estimator is generalized Bayes, but not vice versa. ... – Jeffreys' prior;. – improper priors;. – invariant priors. 6.1 ...
  13. [13]
    [PDF] Math 5062: Bayesian Models
    • Π(Θ) = 1: proper prior. • Π(Θ) 6= 1: improper prior ⇒ δ∗ is referred to as a generalized Bayes estimator. Noninformative prior is usually an improper prior.
  14. [14]
    [PDF] A Very Brief Summary of Bayesian Inference, and Examples
    This is valid for proper priors, and for improper priors if r(π) < ∞. If r(π) = ∞, we define a generalized Bayes estimator as the minimizer, for ... The ...
  15. [15]
    Two Modeling Strategies for Empirical Bayes Estimation
    Abstract. Empirical Bayes methods use the data from parallel experiments, for instance, observations Xk ∼ N( k,1) for k = 1,2,...,N, to estimate.
  16. [16]
    An Introduction to Empirical Bayes Data Analysis - jstor
    The empirical Bayes. model is much richer than either the classical or the ordinary. Bayes model and often provides superior estimates of pa- rameters. An ...
  17. [17]
    Estimation with Quadratic Loss - Project Euclid
    Estimation with Quadratic Loss. Chapter. Author(s) W. James, Charles Stein. Editor(s) Jerzy Neyman. Berkeley Symp. on Math. Statist. and Prob., 1961: 361-379 ( ...Missing: URL | Show results with:URL
  18. [18]
    Stein's Estimation Rule and Its Competitors - jstor
    In the later sections we discuss rules for more complicated estimation problems, and conclude with results from empirical linear. Bayes rules in non-normal ...
  19. [19]
    Bayes, Oracle Bayes and Empirical Bayes - Project Euclid
    Abstract. This article concerns the Bayes and frequentist aspects of empirical Bayes inference. Some of the ideas explored go back to Robbins in the 1950s, ...
  20. [20]
    Learning from a lot: Empirical Bayes for high‐dimensional model ...
    4. CRITICISMS AND THEORY ON EB. Empirical Bayes comes with assumptions and hence with criticism. Of course, such criticism should be balanced against potential ...
  21. [21]
    Inadmissibility of the Usual Estimator for the Mean of a Multivariate ...
    VOL. 3.1 | 1956 Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution. Chapter Author(s) Charles Stein.Missing: theorem admissibility
  22. [22]
    Admissibility of Linear Estimators in the One Parameter Exponential ...
    In this paper it is shown that Karlin's argument yields sufficient conditions for the admissibility of estimators of the form aX+b a X + b for estimating γ(θ) γ ...
  23. [23]
  24. [24]
    On the Asymptotic Distribution of Differentiable Statistical Functions
    September, 1947 On the Asymptotic Distribution of Differentiable Statistical Functions. R. v. Mises · DOWNLOAD PDF + SAVE TO MY LIBRARY. Ann. Math. Statist.
  25. [25]
    On Bayes procedures
    Summary. A result of DooB regarding consistency of Bayes estimators is extended to a large class of. Bayes decision procedures in which the loss functions ...
  26. [26]
    Asymptotically Unbiased, Efficient, and Consistent Properties of the ...
    This study will examine the characteristics of the Bayes estimator such as unbiased, minimum variance (efficiency), and consistency of the Binomial distribution ...
  27. [27]
    [PDF] Bernstein - von Mises theorem and misspecified models - arXiv
    Apr 28, 2022 · When applying Bayesian approach under model misspecification, the key question is whether Bayesian inference remains asymptotically efficient, ...
  28. [28]
    [PDF] Nonparametric Bernstein-von Mises theorems in Gaussian white noise
    It is demonstrated how such re- sults justify Bayes methods as efficient frequentist inference procedures in a variety of concrete nonparametric problems.
  29. [29]
    Chapter 3 The Beta-Binomial Bayesian Model - Bayes Rules!
    Via Bayes' Rule, the conjugate Beta prior combined with the Binomial data model produce a Beta posterior model for π π . The updated Beta posterior ...
  30. [30]
    [PDF] bayes-binomial.pdf
    In a binomial model, y is the number of successes in n trials, and θ is the probability of success. The posterior distribution is p(θ|y) ∝ n y θy (1 − θ)ny.<|control11|><|separator|>
  31. [31]
    Bayes Estimation - Stat 210a
    The key to finding a Bayes estimator is to calculate the conditional distribution of θ given X , which we call the posterior. The prior will commonly be ...
  32. [32]
    [PDF] Chapter 7: Estimation - Stat@Duke
    Oct 9, 2012 · For absolute error loss: The posterior median mina E (L(θ,a)|x) = mina E (|θ − a| |x). The median of θ|x minimizes this, i.e. the posterior ...
  33. [33]
    Full article: Laplace's Law of Succession Estimator and M-Statistics
    Laplace, using Bayes (Citation1763) formula for conditional probability derived an estimator of the binomial probability as the posterior mean under the ...
  34. [34]
    Empirical bayes estimation of a set of binomial probabilities
    A class of prior distributions is defined to reflect exchangeability of a set of binomial probabilities. The class is indexed by the hyperparameter K, ...<|separator|>
  35. [35]
    [PDF] A Step by Step Mathematical Derivation and Tutorial on Kalman Filters
    Oct 8, 2019 · The Kalman filter has a Bayesian interpretation as well [7], [8] and can be derived within a Bayesian framework as a MAP estimator. The ...
  36. [36]
    [PDF] NAIVE BAYES AND LOGISTIC REGRESSION Machine Learning
    2.2 Derivation of Naive Bayes Algorithm. The Naive Bayes algorithm is a classification algorithm based on Bayes rule and a set of conditional independence ...
  37. [37]
    [PDF] James–Stein Estimation and Ridge Regression
    The James–Stein rule describes a shrinkage estimator, each MLE value xi being shrunk by factor ˆB toward the grand mean ˆM = ¯x (7.13). ( ˆB = 0.34 in (7.20) ...
  38. [38]
    Use of Bayesian Estimates to determine the Volatility Parameter ...
    To construct a Bayes estimator for the volatility of a particular stock, we can assume a prior distribution for the volatility coinciding with the gamma ...
  39. [39]
    Bayesian clinical trial design using historical data that inform ... - NIH
    We consider the problem of Bayesian sample size determination for a clinical trial in the presence of historical data that inform the treatment effect.