Fact-checked by Grok 2 weeks ago

Marginal likelihood

In Bayesian statistics, the marginal likelihood, also known as the Bayesian evidence or model evidence, is the probability of observing the data given a model, obtained by integrating the joint likelihood of the data and model parameters with respect to the prior distribution over the parameters.^[1] Mathematically, for data D and model M_k with parameters h_k, it is expressed as p(D \mid M_k) = \int p(D \mid h_k, M_k) \, p(h_k \mid M_k) \, dh_k.^[2] This integral marginalizes out the parameters, yielding a quantity that depends only on the data and the model structure, including the prior.^[1] The marginal likelihood plays a central role in Bayesian model comparison and selection, as it quantifies how well a model predicts the data while automatically incorporating a complexity penalty through the prior's influence on the integration.^[2] Specifically, the Bayes factor, which is the ratio of the marginal likelihoods of two competing models, serves as a measure of evidence in favor of one model over the other, updating prior odds to posterior odds without requiring nested models.^[3] It is essential for Bayesian model averaging, where posterior model probabilities are computed as p(M_k \mid D) = \frac{p(D \mid M_k) p(M_k)}{\sum_l p(D \mid M_l) p(M_l)}, allowing inference that accounts for model uncertainty.^[1] Applications span fields like hydrology,^[1] phylogenetics,^[4] and cosmology,^[5] where it aids in hypothesis testing and hyperparameter optimization. Computing the marginal likelihood exactly is often intractable for complex models due to the high-dimensional integral, necessitating approximations such as Laplace's method, thermodynamic integration, or Markov chain Monte Carlo (MCMC) estimators like the harmonic mean.^[1] These methods balance accuracy and computational feasibility, with thermodynamic integration noted for its consistency in environmental modeling contexts.^[1] The sensitivity to prior choices underscores its interpretive challenges, as it encodes Occam's razor by favoring parsimonious models that fit the data well.^[2]

Definition and Basics

Formal Definition

The marginal likelihood, denoted as p(y), is formally defined as the marginal probability of the observed data y, obtained by integrating the joint distribution of the data and the model parameters \theta over the parameter space:

p(y) = \int p(y, \theta) \, d\theta = \int p(y \mid \theta) p(\theta) \, d\theta,

where p(y \mid \theta) is the likelihood function and p(\theta) is the prior distribution over the parameters \theta.^[6] This integral represents the prior predictive distribution of the data, averaging the likelihood across all possible parameter values weighted by the prior.^[6] The process of marginalization in this context involves integrating out the parameters \theta, which are treated as nuisance parameters, to yield a probability for the data y that is independent of any specific parameter values.^[6] This eliminates dependence on \theta by averaging over its uncertainty as specified by the prior, resulting in a quantity that summarizes the model's predictive content for the observed data without conditioning on particular estimates of \theta.^[6] In contrast to the joint probability p(y, \theta) = p(y \mid \theta) p(\theta), which depends on both the data and parameters, the marginal likelihood p(y) marginalizes away \theta and thus provides the unconditional probability of the data under the model.^[6] This distinction underscores how marginalization transforms the parameter-dependent joint into a data-only marginal.^[6] The term "marginal likelihood" is used interchangeably with "marginal probability" or "evidence" to refer to p(y), particularly in contexts emphasizing its role as a model-level summary.^[7]

Bayesian Interpretation

In Bayesian statistics, the marginal likelihood represents the predictive probability of the observed data under a given model, obtained by averaging the likelihood over all possible parameter values weighted by their prior distribution. This integration marginalizes out the parameters, providing a coherent measure of how well the model explains the data while incorporating prior beliefs about parameter uncertainty. As such, it serves as the Bayesian evidence for the model, distinct from conditional measures that fix parameters at specific values.^[8] The marginal likelihood functions as a key indicator of model plausibility, where a higher value suggests the model offers a better overall fit to the data after accounting for the full range of parameter uncertainty encoded in the prior. Unlike point estimates, this averaging process naturally balances goodness-of-fit with the model's capacity to generalize, implicitly favoring parsimonious models that do not overfit by spreading probability mass too thinly across parameters. In model comparison, it enables direct assessment of competing hypotheses by quantifying the relative support each model receives from the data alone.^[8] In contrast to maximum likelihood estimation, which maximizes the likelihood at a single point estimate of the parameters and thus relies on plug-in predictions that ignore uncertainty, the marginal likelihood integrates over the entire parameter space. This integration implicitly penalizes model complexity through the prior's influence, as overly flexible models tend to dilute the evidence by assigning low density to the data under broad parameter ranges, whereas maximum likelihood can favor intricate models that fit noise without such restraint. Consequently, marginal likelihood promotes more robust inference in finite samples by embedding Occam's razor directly into the evaluation.^[2] Historically, the marginal likelihood emerged as a central component in Bayesian model assessment through its role in computing Bayes factors, which compare the evidence for rival models and update prior odds to posterior odds in a coherent manner. This framework, formalized in seminal work on Bayes factors, provided a principled alternative to frequentist testing for evaluating scientific theories.^[8]

Mathematical Framework

Expression in Parametric Models

In parametric statistical models, the marginal likelihood integrates out the model parameters to obtain the probability of the observed data under the model specification. Consider a parametric model M indexed by a parameter vector \theta \in \Theta, where \Theta is the parameter space. The marginal likelihood is then expressed as

p(y \mid M) = \int_{\Theta} p(y \mid \theta, M) \, \pi(\theta \mid M) \, d\theta,

where p(y \mid \theta, M) denotes the likelihood function and \pi(\theta \mid M) is the prior distribution over the parameters given the model.^[8] This formulation arises naturally in Bayesian inference as the normalizing constant for the posterior distribution.^[8] When the parameter space \Theta is discrete, the continuous integral is replaced by a summation:

p(y \mid M) = \sum_{\theta \in \Theta} p(y \mid \theta, M) \, \pi(\theta \mid M).

This discrete form is applicable in scenarios such as finite mixture models with a discrete number of components or models with categorical parameters.^[8] The dimensionality of the integral or sum corresponds directly to the dimension of \theta, reflecting the number of parameters being marginalized out; higher dimensionality increases computational demands but maintains the core structure of the expression.^[8] A concrete example illustrates this in a simple univariate normal model, where the observed data y follows y \sim \mathcal{N}(\mu, \sigma^2) with known \sigma^2, and the prior on the mean is \mu \sim \mathcal{N}(0, 1). The marginal likelihood involves the integral

p(y \mid M) = \int_{-\infty}^{\infty} \mathcal{N}(y \mid \mu, \sigma^2) \, \mathcal{N}(\mu \mid 0, 1) \, d\mu.

Completing the square in the exponent yields a closed-form normal distribution for the marginal:

p(y \mid M) = \mathcal{N}(y \mid 0, \sigma^2 + 1).

However, in the more general univariate normal model with unknown variance and a conjugate normal-inverse-gamma prior on (\mu, \sigma^2), the full marginal likelihood over both parameters results in a closed-form Student's t-distribution for y, highlighting how prior choices enable analytical tractability.

Relation to Posterior and Likelihood

In Bayesian inference, the marginal likelihood plays a central role in Bayes' theorem, which expresses the posterior distribution as
p(\theta \mid y) = \frac{p(y \mid \theta) p(\theta)}{p(y)},
where p(y \mid \theta) is the likelihood function, p(\theta) is the prior distribution, and p(y) is the marginal likelihood of the data y. This formulation updates prior beliefs about the parameters \theta with observed data to yield the posterior p(\theta \mid y).^[6] The marginal likelihood p(y) serves as the normalizing constant—or evidence—that ensures the posterior integrates to 1 over the parameter space, transforming the unnormalized product p(y \mid \theta) p(\theta) into a proper probability density. Computationally, it is obtained by integrating the joint density of data and parameters: p(y) = \int p(y \mid \theta) p(\theta) \, d\theta. This integration averages the likelihood across all possible parameter values weighted by the prior, providing a data-dependent measure of model plausibility independent of specific \theta.^[6] Unlike the conditional likelihood p(y \mid \theta), which conditions on fixed parameters and evaluates model fit for particular \theta, the marginal likelihood marginalizes the full likelihood over the prior distribution, incorporating parameter uncertainty from the outset. This contrasts with the profiled likelihood in frequentist settings, where nuisance parameters are eliminated by maximization rather than integration, leading to a point-estimate-based adjustment without prior weighting.^[6]^[9] The marginal likelihood thus quantifies the total predictive uncertainty in the data under the model, encompassing both the variability in the likelihood due to unknown parameters and the prior's influence on that variability, whereas the likelihood alone fixes \theta and ignores such averaging. This broader uncertainty measure supports coherent probabilistic reasoning in Bayesian frameworks.^[6]

Computation Methods

Analytical Approaches

Analytical approaches to computing the marginal likelihood rely on cases where the integral over the parameter space can be evaluated exactly in closed form, which occurs primarily under the use of conjugate priors.^[10] Conjugate priors are distributions from the same family as the posterior, allowing the marginal likelihood to be derived as a normalizing constant without numerical integration.^[11] This solvability holds when the likelihood and prior combine to yield a posterior in the same parametric family, simplifying the integration to known functions like the beta or gamma integrals.^[10] A classic example is the beta-binomial model, where the binomial likelihood for coin flips is paired with a beta prior on the success probability \theta. The marginal likelihood is then given by

p(y \mid n, \alpha, \beta) = \binom{n}{y} \frac{B(\alpha + y, \beta + n - y)}{B(\alpha, \beta)},

where B denotes the beta function, representing the integral over \theta.^[12] This closed-form expression arises directly from the conjugacy, enabling exact Bayesian inference for binary data.^[13] In Gaussian models, conjugate priors such as the normal-inverse-Wishart facilitate analytical marginal likelihoods, often resulting in a multivariate Student's t-distribution for the data. For a multivariate normal likelihood with unknown mean \mu and precision \Lambda, the marginal distribution of the data is

p(D) = \pi^{-nd/2} \frac{\Gamma_d(\nu_n/2)}{\Gamma_d(\nu_0/2)} \frac{|\Lambda_0|^{\nu_0/2}}{|\Lambda_n|^{\nu_n/2}} \left( \frac{\kappa_0}{\kappa_n} \right)^{d/2},

where \nu_n = \nu_0 + n, \kappa_n = \kappa_0 + n, \Lambda_n = \Lambda_0 + S + \frac{\kappa_0 n}{\kappa_n} (\bar{x} - \mu_0)(\bar{x} - \mu_0)^T , S = \sum_{i=1}^n (x_i - \bar{x})(x_i - \bar{x})^T , d is the dimensionality, and n is the number of observations.^[10] This highlights the tractability in linear Gaussian settings. However, such analytical solutions are rare, particularly in high-dimensional settings or with non-conjugate priors, where the parameter integral becomes intractable and necessitates numerical methods.^[14] These limitations stem from the exponential growth in integration complexity as dimensionality increases, restricting exact computations to low-dimensional or specially structured models.^[15]

Numerical and Approximation Techniques

When exact analytical computation of the marginal likelihood is infeasible, such as in non-conjugate models with high-dimensional parameter spaces, numerical and approximation techniques become essential for estimation. These methods leverage sampling or asymptotic expansions to approximate the integral \int p(y \mid \theta) p(\theta) \, d\theta, balancing computational feasibility with accuracy. Monte Carlo-based approaches, in particular, provide unbiased or consistent estimators but often require careful tuning to manage variance, while deterministic methods offer faster but potentially biased approximations suitable for large-scale applications. Monte Carlo methods, including importance sampling, form a foundational class of estimators for the marginal likelihood. The importance sampling estimator approximates the integral as \hat{p}(y) \approx \frac{1}{N} \sum_{i=1}^N \frac{p(y \mid \theta_i) p(\theta_i)}{q(\theta_i)}, where \{\theta_i\}_{i=1}^N are samples drawn from a proposal distribution q(\theta) chosen to approximate the posterior p(\theta \mid y). This self-normalized form ensures consistency under mild conditions on q, though high variance arises if q poorly covers the posterior support, necessitating techniques like adaptive proposals or multiple importance sampling to improve efficiency in complex models.^[16] Markov Chain Monte Carlo (MCMC) methods extend these ideas by generating dependent samples from the posterior to estimate the marginal likelihood without direct prior sampling. Bridge sampling, for instance, uses samples from both the prior and posterior to compute ratios via an optimal bridge function, yielding \hat{p}(y) = \frac{1}{N} \sum_{i=1}^N \frac{p(y \mid \theta_i^\text{prior}) p(\theta_i^\text{prior})}{g(\theta_i^\text{prior} \mid y)} \cdot \frac{1}{M} \sum_{j=1}^M \frac{g(\theta_j^\text{post} \mid y)}{p(y \mid \theta_j^\text{post}) p(\theta_j^\text{post})}, where g minimizes estimation variance; this approach is particularly robust for multimodal posteriors. The harmonic mean estimator, another posterior-based method, approximates p(y) as the reciprocal of the average inverse likelihood over posterior samples, \hat{p}(y) \approx \left( \frac{1}{N} \sum_{i=1}^N \frac{1}{p(y \mid \theta_i)} \right)^{-1}, but suffers from infinite variance in heavy-tailed cases, prompting stabilized variants.^[17]^[18] Deterministic approximations, such as the Laplace method, provide closed-form estimates by exploiting local behavior around the posterior mode. This technique approximates the log-posterior as quadratic via its Hessian matrix H at the mode \theta^*, leading to p(y) \approx p(y \mid \theta^*) p(\theta^*) (2\pi)^{d/2} |H|^{-1/2}, where d is the parameter dimension; the approximation improves asymptotically as n \to \infty for large samples but can bias results in small-data or highly nonlinear settings. It is computationally efficient, requiring only optimization and Hessian evaluation, and serves as a building block for higher-order corrections in moderate dimensions.^[19] Variational inference offers a scalable lower bound on the log-marginal likelihood through optimization, framing estimation as minimizing the Kullback-Leibler divergence between a tractable variational distribution q(\theta) and the true posterior. The evidence lower bound (ELBO) states \log p(y) \geq \mathbb{E}_q [\log p(y, \theta)] - \mathbb{E}_q [\log q(\theta)], maximized by adjusting q (often mean-field or structured forms) via stochastic gradients; this bound is tight when q \approx p(\theta \mid y) and enables fast inference in massive datasets, though it underestimates the true value and requires careful family selection to avoid loose bounds.^[20] Recent advancements, including those post-2020, have enhanced these techniques for scalability in deep learning models, where parameter spaces exceed millions of dimensions. Annealed importance sampling (AIS) refines importance sampling by introducing intermediate distributions bridging prior and posterior, with differentiable variants enabling end-to-end optimization of annealing schedules for tighter estimates in generative models.^[21]^[22] Sequential Monte Carlo (SMC) methods, propagating particles through annealed sequences with resampling, have been adapted for deep Bayesian networks, achieving unbiased marginal likelihoods via thermodynamic integration while integrating neural proposals for efficiency in high-dimensional tasks like variational autoencoders.^[23] More recent methods, such as generalized stepping-stone sampling (2024), improve efficiency in specific domains like pulsar timing analysis.^[24] These developments prioritize variance reduction and GPU acceleration, facilitating model comparison in neural architectures.

Applications in Statistics

Model Comparison and Selection

One key application of the marginal likelihood in Bayesian statistics is model comparison, where it serves as the basis for the Bayes factor, a measure of the relative evidence provided by the data for two competing models. The Bayes factor BF_{12} comparing model M_1 to model M_2 is defined as the ratio of their marginal likelihoods:
BF_{12} = \frac{p(y \mid M_1)}{p(y \mid M_2)},
where y denotes the observed data.^[25] This ratio quantifies how much more likely the data are under M_1 than under M_2, after integrating out model parameters via their priors, thereby providing a coherent framework for hypothesis testing and model selection without relying on arbitrary significance thresholds.^[25] For nested models, where M_1 is a special case of M_2 (e.g., by imposing parameter restrictions), the Savage-Dickey density ratio offers a convenient way to compute the Bayes factor using posterior and prior densities evaluated at the boundary values of the restricted parameters. Specifically, the Bayes factor is given by
BF_{12} = \frac{p(\theta \mid y, M_2)}{\pi(\theta \mid M_2)},
evaluated at the values of \theta (the restricted parameter) that define the nesting boundary, under the encompassing model M_2, assuming the prior under M_2 matches the prior under M_1 for the common parameters. This approach simplifies computation by avoiding full marginal likelihood estimation for both models, though it requires careful prior specification to ensure validity.^[26] The marginal likelihood inherently implements Occam's razor in model selection by favoring parsimonious models that adequately fit the data, as more complex models must allocate prior probability mass across larger parameter spaces, effectively penalizing overparameterization unless the data strongly support the added complexity.^[25] In practice, Bayes factors are interpreted using guidelines such as those proposed by Kass and Raftery, where values between 1 and 3 provide barely worth mentioning evidence, 3 to 20 indicate positive evidence, 20 to 150 strong evidence, and greater than 150 very strong evidence in favor of the numerator model (e.g., BF_{12} > 10 suggests substantial support for M_1).^[25] However, Bayes factors exhibit sensitivity to the choice of priors, which can lead to varying conclusions if priors are not chosen judiciously, and their computation becomes prohibitively expensive for high-dimensional or large-scale models, often necessitating approximations like those from MCMC methods.^[25]

Inference in Hierarchical Models

In Bayesian hierarchical models, the marginal likelihood facilitates inference by marginalizing over lower-level parameters to focus on hyperparameters. Consider a setup with observed data y, group-specific parameters \theta, and global hyperparameters \psi; the marginal likelihood is expressed as

p(y \mid \psi) = \int p(y \mid \theta, \psi) \, p(\theta \mid \psi) \, d\theta.

This integral integrates out the variability in \theta across groups, yielding a likelihood for \psi that captures the overall structure induced by the hierarchy. Such marginalization enables posterior inference on \psi, such as via Markov chain Monte Carlo (MCMC) on the reduced parameter space, which is particularly valuable in models with many groups where full joint inference would be computationally prohibitive. The marginal likelihood also underpins posterior predictive checks in hierarchical settings, where it serves as the normalizing constant for the posterior distribution. The posterior predictive distribution for new data y^* is

p(y^* \mid y) = \int p(y^* \mid \theta) \, p(\theta \mid y) \, d\theta,

with p(\theta \mid y) \propto p(y \mid \theta) p(\theta) and the marginal p(y) ensuring proper normalization. This framework allows assessment of model fit by simulating replicated datasets from the hierarchy and comparing them to observed y, accounting for parameter uncertainty across levels. In empirical Bayes methods, the marginal likelihood is maximized to estimate hyperparameters directly from the data, as in \hat{\psi} = \arg\max_{\psi} p(y \mid \psi). This approach yields data-driven priors that shrink individual estimates toward a global mean, improving stability in sparse or high-dimensional scenarios. In genomics, for instance, empirical Bayes using marginal maximization has enabled efficient inference for thousands of gene expressions in microarray experiments, providing posterior means with reduced variance compared to independent analyses. Relative to full Bayesian MCMC over all parameters and hyperparameters, this method often reduces effective dimensionality, accelerating inference while maintaining shrinkage benefits, though it approximates the full posterior. Despite these advantages, computing the marginal likelihood in multi-level hierarchical models poses significant challenges due to the need to evaluate high-dimensional integrals, which grow intractable with added hierarchy depth. In such cases, variational methods approximate the posterior by optimizing a lower bound on the marginal, enabling scalable inference in complex structures like topic models or phylogenetic trees. When exact computation fails, numerical approximation techniques from broader computational frameworks are typically invoked to estimate these integrals reliably.

Historical Development

Origins in Bayesian Theory

The concept of marginal likelihood traces its roots to the foundational principles of Bayesian inference articulated by Thomas Bayes in his posthumously published 1763 essay, "An Essay towards solving a Problem in the Doctrine of Chances." In this work, Bayes explored inverse probability, seeking to determine the probability of a cause given observed effects, and implicitly relied on marginalization over possible parameter values to normalize the posterior distribution. This approach integrated prior beliefs with data to yield updated probabilities, laying the groundwork for the marginal likelihood as the predictive probability of the data under a model, though Bayes did not explicitly term it as such. The explicit formulation of marginal likelihood as a measure of evidence emerged in Harold Jeffreys' seminal 1939 book, Theory of Probability. Jeffreys advanced Bayesian hypothesis testing by defining the marginal likelihood—the integral of the likelihood function over the prior distribution—as the evidential support for a hypothesis, independent of specific parameter values. This formulation enabled objective comparisons between competing scientific theories, using non-informative priors to avoid subjective bias, and positioned the marginal likelihood as central to inductive reasoning in statistics.^[27] In the mid-20th century, the 1950s and 1960s saw further development through the introduction of Bayes factors, which directly incorporate marginal likelihoods to quantify relative model support. I.J. Good, in his 1950 monograph Probability and the Weighing of Evidence, coined the term "Bayes factor" as the ratio of marginal likelihoods under alternative hypotheses, providing a scale for weighing evidential strength in decision-making. Building on this, D.V. Lindley's 1957 paper "A Statistical Paradox" demonstrated the use of marginal probabilities for Bayesian model choice, illustrating how they could lead to counterintuitive results when comparing nested models and emphasizing their role in resolving ambiguities in hypothesis testing. Concurrently, A.W.F. Edwards contributed to this era's neo-Bayesian revival by integrating marginal likelihood concepts with likelihood-based inference, as explored in his later writings on the foundations of statistical evidence. Prior to 1980, the practical application of marginal likelihood remained constrained by computational limitations, confining its use largely to analytically tractable cases involving conjugate priors, where closed-form expressions for the marginal could be derived. This focus on conjugate families, such as the beta-binomial or normal-normal models, allowed exact computation without numerical integration, but restricted broader adoption in complex, non-conjugate scenarios.^[28]

Key Advancements and Modern Usage

The advent of Markov chain Monte Carlo (MCMC) methods in the 1980s and 1990s revolutionized Bayesian computation, enabling the estimation of marginal likelihoods in complex models where analytical integration was infeasible. A pivotal advancement was Chib's 1995 method, which computes the exact marginal likelihood using posterior samples from Gibbs sampling by rearranging Bayes' theorem to express it as the ratio of the posterior ordinate to the prior and likelihood evaluated at a high-density point. This approach, applicable to models with data augmentation, provided a reliable estimator without requiring additional sampling, facilitating model comparison in hierarchical settings.^[29] In the 2000s, asymptotic approximations gained prominence for their computational efficiency in large datasets, bridging Bayesian and frequentist paradigms. The Bayesian Information Criterion (BIC), originally proposed by Schwarz in 1978, serves as a consistent estimator of the log marginal likelihood, approximating it as \log p(y \mid M) \approx \ell_{\max} - \frac{k}{2} \log n, where \ell_{\max} is the maximized log-likelihood, k is the number of parameters, and n is the sample size. This approximation, derived under regularity conditions for large n, links marginal likelihood estimation to penalization criteria like AIC, promoting its use in high-dimensional model selection despite its frequentist origins.^[30] The 2010s and 2020s witnessed the integration of marginal likelihood estimation into machine learning frameworks, enhancing hyperparameter optimization and uncertainty quantification. In Gaussian processes (GPs), the marginal likelihood—often maximized via its log form—guides kernel and noise parameter selection, balancing data fit and model complexity as detailed in foundational treatments of GP regression.^[31] For Bayesian neural networks, the marginal likelihood, or evidence, underpins variational approximations to assess model generalization, with techniques like the evidence lower bound (ELBO) providing scalable proxies amid the rise of deep learning.^[2] As of 2025, marginal likelihood computation plays a central role in scalable Bayesian deep learning, particularly through black-box variational inference (BBVI), which optimizes the ELBO to approximate the evidence in high-dimensional models without model-specific derivations. However, ongoing debates highlight its sensitivity to prior choices in large datasets, where even weakly informative priors can disproportionately influence estimates, prompting robustness analyses and alternative criteria like the widely applicable information criterion (WAIC). Influential software packages have democratized these methods: Stan implements bridge sampling for MCMC-based marginal likelihood estimation, while PyMC supports integrated nested Laplace approximations and sequential Monte Carlo for automatic computation in probabilistic programming workflows.^[32]

References

[1]
[PDF] Evaluating marginal likelihood with thermodynamic integration ...
The marginal likelihood, also called integrated likelihood or Bayesian evidence, measures overall model fit, i.e., to what extent that the data, D, can be ...
[2]
[PDF] Bayesian Model Selection, the Marginal Likelihood, and ...
The marginal likelihood, or Bayesian evidence, is the probability of generating observations from a prior, used for hypothesis testing and model selection.
[3]
[PDF] Lecture 6: Bayesian Hypothesis Testing Continued - Stat@Duke
Sep 14, 2021 · Bayes factor: is a ratio of marginal likelihoods and it provides a weight of evidence in the data in favor of one model over another.
[4]
[PDF] Bayesian Data Analysis Third edition (with errors fixed as of 20 ...
... Bayesian Data Analysis. Third edition. (with errors fixed as of 20 February 2025) ... marginal likelihood has a particularly simple form. The marginal ...
[5]
On the marginal likelihood and cross-validation - Oxford Academic
Summary. In Bayesian statistics, the marginal likelihood, also known as the evidence, is used to evaluate model fit as it quantifies the joint probability.
[6]
[PDF] Bayes Factors - Robert E. Kass; Adrian E. Raftery
Oct 14, 2003 · Aitkin (1991) described so-called "posterior Bayes factors" in which the marginal likelihood pr(DH) defined by. Equation (2), which is the prior ...
[7]
[PDF] Marginal vs. Profile Likelihood for systematic uncertainties
May 12, 2011 · Marginal likelihood uses Bayesian averaging to eliminate uncertainty in a parameter, while profile likelihood treats it as a free parameter ...
[8]
[PDF] Conjugate Bayesian analysis of the Gaussian distribution
Oct 3, 2007 · The use of conjugate priors allows all the results to be derived in closed form.
[9]
[PDF] A Compendium of Conjugate Priors - Applied Mathematics Consulting
For each conjugate prior relationship the explicit form of the conjugate prior, the likelihood function, and the relevant sufficient statistics are presented.
[10]
6 Inferring a Binomial Probability via Exact Mathematical Analysis
This chapter presents an example of how to do Bayesian inference using pure analytical mathematics without any approximations.
[11]
Chapter 3 The Beta-Binomial Bayesian Model - Bayes Rules!
We can reflect this reality and conduct a more nuanced Bayesian analysis by constructing a continuous prior probability model of π. A reasonable prior is ...
[12]
Estimating the Marginal Likelihood Using the Arithmetic Mean Identity
Abstract. In this paper we propose a conceptually straightforward method to estimate the marginal data density value (also called the marginal likelihood).
[13]
19 Dubious Ways to Compute the Marginal Likelihood of a ...
We benchmark the speed and accuracy of 19 different methods to compute the marginal likelihood of phylogenetic topologies on a suite of real data sets.<|control11|><|separator|>
[14]
[PDF] The Development of Bayesian Statistics - Columbia University
Jan 13, 2022 · Bayes' theorem is a mathematical identity of conditional probability, and applied Bayesian inference dates back to Laplace in the late 1700s ...
[15]
Importance Sampling Schemes for Evidence Approximation in ...
Abstract. The marginal likelihood is a central tool for drawing Bayesian infer- ence about the number of components in mixture models.Missing: seminal | Show results with:seminal
[16]
[PDF] Simulating Normalizing Constants: From Importance Sampling to ...
We consider two methods of computing the poste- rior density, or marginal likelihood, or normalizing constant, as a function of 0: (i) the usual approach of ...
[17]
[PDF] Approximate Bayesian Inference with the Weighted Likelihood ...
Oct 14, 2003 · Thus, the marginal likelihood may be estimated by the harmonic mean of the likelihoods of a sample from the posterior distribution. This is true ...
[18]
[PDF] Accurate Approximations for Posterior Moments and Marginal ...
Apr 18, 2003 · Laplace's method can also be used to approximate marginal posterior ... Tierney and Kadane: Accurate Approximations for Posterior Moments.
[19]
[PDF] An Introduction to Variational Methods for Graphical Models
Abstract. This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks ...
[20]
[PDF] Annealed Importance Sampling - arXiv
Sep 1, 1998 · Annealed importance sampling uses Markov chain transitions for an annealing sequence, inspired by simulated annealing to handle isolated modes, ...
[21]
[PDF] Differentiable Annealed Importance Sampling and the ... - NIPS papers
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation, but are not fully differentiable ...Missing: post- | Show results with:post-
[22]
[PDF] Adversarial Sequential Monte Carlo - Bayesian Deep Learning
Sequential Monte Carlo, or particle filtering, is a popular class of methods for Bayesian inference that approximate an intractable target distribution by ...
[23]
[PDF] Bayes Factors Robert E. Kass; Adrian E. Raftery Journal of the ...
Mar 5, 2007 · This is the paper 'Bayes Factors' by Robert E. Kass and Adrian E. Raftery, published in the Journal of the American Statistical Association, ...
[24]
[PDF] A tutorial on the Savage–Dickey method - Eric-Jan Wagenmakers
Jan 12, 2010 · The Savage-Dickey method is a method to compute Bayesian hypothesis tests for nested models, using a density ratio, and is used to carry out ...
[25]
[PDF] Harold Jeffreys's Theory of Probability Revisited - arXiv
Probability (1939) has had a unique impact on the Bayesian commu- nity and is now considered to be one of the main classics in Bayesian. Statistics as well ...
[26]
Marginal Likelihood From the Metropolis–Hastings Output
This article provides a framework for estimating the marginal likelihood for the purpose of Bayesian model comparisons.
[27]
The Bayesian information criterion: background, derivation, and ...
Dec 14, 2011 · The Bayesian information criterion (BIC), introduced by Schwarz,1 is derived to serve as an asymptotic approximation to a transformation of the ...
[28]
[PDF] Gaussian Processes for Machine Learning
... Marginal Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 29 ... Gaussian Processes for Machine Learning presents one of the most important.
[29]
Bayes Factors and Marginal Likelihood — PyMC example gallery
Jan 15, 2023 · The “Bayesian way” to compare models is to compute the marginal likelihood of each model, ie the probability of the observed data y given the M k model.