Fact-checked by Grok 2 weeks ago

Posterior predictive distribution

In , the posterior predictive distribution is the probability distribution of unobserved or future given the observed , obtained by integrating the likelihood of the new over the posterior distribution of the model parameters. Formally, it is defined as p(\tilde{y} \mid y) = \int p(\tilde{y} \mid \theta) p(\theta \mid y) \, d\theta, where y represents the observed , \tilde{y} denotes the future or replicated , and \theta are the parameters, assuming conditional independence between observed and future given the parameters. This distribution incorporates both the uncertainty in parameter estimates from the posterior and the inherent stochasticity in the data-generating process, resulting in predictions that are typically more variable than those based solely on point estimates of parameters. The posterior predictive distribution serves as a foundational tool in for two primary purposes: and . In , it enables the simulation of replicated datasets y^{\text{rep}} from the fitted model, which are then compared to the observed data using test statistics T(y, \theta) to assess fit; discrepancies, quantified via posterior predictive p-values, can indicate model inadequacies such as outliers or systematic biases. For , it provides probabilistic predictions for new observations by averaging over the posterior, as seen in applications like where the predictive distribution follows a with \tilde{X} \hat{\beta}, matrix involving the posterior variance, and n - k. This approach extends to hierarchical models, such as those for outcomes or clinical trials, where it accounts for multilevel and supports sensitivity analyses across different parameterizations. Overall, the posterior predictive distribution bridges knowledge, observed evidence, and future , making it indispensable for robust across diverse fields including , social sciences, and environmental modeling.

Fundamentals

Definition and Bayesian Context

In , the prior distribution \pi(\theta) encodes initial beliefs about the unknown parameters \theta before observing any data. The p(y \mid \theta) then quantifies the probability of the observed data y given those parameters. Updating these with the data yields the posterior distribution \pi(\theta \mid y) \propto \pi(\theta) p(y \mid \theta), which represents the refined beliefs about \theta after incorporating the evidence from y. The posterior predictive distribution extends this framework to forecast unobserved future data \tilde{y} conditional on the observed data y. It is formally defined as p(\tilde{y} \mid y) = \int p(\tilde{y} \mid \theta) \, \pi(\theta \mid y) \, d\theta, where the integral averages the conditional distribution of new data over the entire posterior uncertainty in \theta. This approach inherently accounts for parameter variability, providing a full probabilistic prediction rather than a point estimate. The motivation for the posterior predictive distribution lies in its ability to generate predictions that reflect both model structure and epistemic uncertainty, enabling robust assessments of future observations without assuming fixed parameters. By marginalizing over \theta, it avoids overconfidence in any single parameter value and supports under uncertainty in fields like and . The concept traces its roots to Pierre-Simon Laplace's 18th-century work on , where he first explored updating probabilities based on data to infer causes and predict outcomes. It was formalized within modern during the 20th century, notably in foundational treatments that emphasized predictive inference as a core application of the posterior.

Prior Predictive Distribution

The predictive distribution, denoted as p(\tilde{y}), is the of a new point \tilde{y} obtained by integrating the likelihood over the distribution of the parameters \theta: p(\tilde{y}) = \int p(\tilde{y} \mid \theta) \pi(\theta) \, d\theta. This formulation arises from marginalizing the distribution p(\tilde{y}, \theta) = p(\tilde{y} \mid \theta) \pi(\theta) with respect to \theta, yielding the unconditional of the under the alone. This encodes the researcher's beliefs about possible outcomes before any observations are made, serving as a for eliciting and validating specifications in Bayesian models. It allows practitioners to simulate hypothetical datasets from the to assess whether the implied data-generating process aligns with or expected variability. A simple example occurs in the conjugate case of a likelihood with a . Suppose the is \theta \sim \mathcal{N}(\mu_0, \tau_0^2) and the likelihood for a new observation is \tilde{y} \mid \theta \sim \mathcal{N}(\theta, \sigma^2) with known variance \sigma^2. The prior predictive then simplifies to a closed-form : \tilde{y} \sim \mathcal{N}(\mu_0, \sigma^2 + \tau_0^2), reflecting the combined from the and sampling variance. This result highlights how conjugacy facilitates analytical tractability for prior predictions.

Posterior Predictive Distribution

The posterior predictive distribution describes the conditional probability of future or unobserved data \tilde{y} given observed data y, obtained by marginalizing over the posterior distribution of the model parameters \theta. It is formally expressed as p(\tilde{y} \mid y) = \int p(\tilde{y} \mid \theta) \, p(\theta \mid y) \, d\theta, where the posterior p(\theta \mid y) = \frac{\pi(\theta) p(y \mid \theta)}{m(y)} with prior \pi(\theta) and marginal likelihood m(y) = \int \pi(\theta) p(y \mid \theta) \, d\theta. This formulation arises naturally in Bayesian inference as a way to generate predictions that fully incorporate updated beliefs about \theta after observing y. A key property of the posterior predictive distribution is its integration of parameter , which accounts for both the variability in the data-generating process and the remaining doubt about \theta after on y; consequently, it yields predictions that are generally wider than those derived from plug-in likelihood estimates using point estimates of \theta. Under standard regularity conditions and a correctly specified model, as the amount of observed grows, the posterior predictive distribution asymptotically approximates the true of future observations, providing a bridge to frequentist interpretations in large samples. Computing the posterior predictive distribution involves evaluating the , which admits closed-form solutions in conjugate models but becomes intractable in non-conjugate or high-dimensional settings, often requiring numerical approximations such as (MCMC) methods to sample from the posterior and then simulate \tilde{y} from the conditional likelihood. As a basic illustrative example, consider a likelihood for the number of successes y in n trials with unknown success probability \theta, paired with a conjugate Beta(\alpha, \beta) prior; the resulting posterior is Beta(\alpha + y, \beta + n - y), and the posterior predictive distribution for the number of successes \tilde{y} in m future trials follows a with parameters \alpha + y, \beta + n - y, and m. This example highlights how the posterior predictive smooths the discrete probabilities, reflecting due to uncertainty in \theta.

Comparisons and Implications

Differences from Prior Predictive Distribution

The prior predictive distribution, denoted as p(\tilde{y}) = \int p(\tilde{y} \mid \omega) p(\omega) \, d\omega, encapsulates the uncertainty in future observations \tilde{y} arising from both the beliefs about parameters \omega and the inherent stochasticity in the data-generating process, without any conditioning on observed y. In contrast, the posterior predictive distribution, p(\tilde{y} \mid y) = \int p(\tilde{y} \mid \omega) p(\omega \mid y) \, d\omega, conditions predictions on the observed , thereby incorporating to update parameter uncertainty via the posterior p(\omega \mid y). This fundamental distinction means the prior predictive reflects a broader range of plausible outcomes driven solely by subjective , while the posterior predictive "shrinks" toward the observed , leveraging learning from y to refine expectations. The posterior predictive can be understood as a conditional form of the prior predictive, expressed through the relationship p(\tilde{y} \mid y) = \frac{p(\tilde{y}, y)}{p(y)}, where the joint distribution p(\tilde{y}, y) = \int p(\tilde{y} \mid \omega) p(y \mid \omega) p(\omega) \, d\omega links the two, and p(y) is the marginal likelihood serving as a normalizing constant. This updating process transforms the unconditional prior predictive into a data-informed version, effectively averaging the likelihood over the posterior rather than the prior. A key implication for is that the posterior predictive distribution typically exhibits lower predictive variance compared to its prior counterpart, due to the concentration of the posterior around values consistent with the , which reduces the spread induced by parameter uncertainty. For instance, in a normal linear model with unknown and known variance, the prior predictive variance includes the full prior variance of the plus the data variance, whereas the posterior predictive variance substitutes the smaller posterior variance of the , leading to tighter intervals for future predictions. To illustrate these differences visually, consider a simple univariate normal model where the parameter \mu follows a broad normal prior (e.g., mean 0, variance 10). The prior predictive density for a new observation \tilde{y} is a normal distribution centered at 0 with high variance (prior variance plus observation variance), appearing as a wide, flat curve. After observing data y (e.g., a sample mean of 2), the posterior predictive density shifts its center toward 2 and narrows significantly, reflecting reduced uncertainty and a more peaked distribution that aligns closely with the data. Such plots highlight how observation updates the predictive distribution from diffuse prior expectations to more precise, evidence-based forecasts. This contrast underscores the posterior predictive's role in model checking, where replicated data from it are compared to observed y to assess fit.

Role in Model Checking and Selection

Posterior predictive checks (PPCs) provide a Bayesian approach to model evaluation by simulating replicated data sets \tilde{y} from the posterior predictive distribution p(\tilde{y} | y) and comparing them to the observed data y using a test statistic T(y), such as means, variances, or tail probabilities, to assess overall model fit. This method integrates over the posterior distribution of parameters \theta, p(\tilde{y} | y) = \int p(\tilde{y} | \theta) p(\theta | y) \, d\theta, allowing researchers to identify systematic discrepancies that indicate model misspecification, such as inadequate capture of data variability or dependence structures. For instance, the posterior predictive p-value, defined as \Pr[T(\tilde{y}) \geq T(y) | y], quantifies the probability of observing a discrepancy at least as extreme as the actual data under the fitted model; values near 0 or 1 suggest poor fit. In Bayesian model selection, the posterior predictive distribution contributes to criteria that evaluate predictive performance across competing models, favoring those with higher expected log predictive densities. Methods like the widely applicable information criterion (WAIC) and leave-one-out cross-validation (LOO) approximate the out-of-sample predictive accuracy \mathbb{E}[\log p(\tilde{y} | y)], where WAIC decomposes into log pointwise predictive density minus a variance penalty, and LOO uses importance sampling on posterior draws to estimate leave-one-out predictive densities without refitting the model. These metrics enable ranking of models by their ability to predict new data, with higher values indicating better generalization; for example, in comparing hierarchical models, WAIC or LOO can select the structure that balances fit and complexity more effectively than in-sample likelihoods. A practical example arises in , where PPCs simulate \tilde{y} from the posterior predictive under a normal error model and examine whether the resulting residuals or 95% predictive intervals align with observed patterns, such as uniform coverage or no heteroscedasticity in the discrepancies. If the observed residuals fall outside the distribution of simulated ones, it signals issues like omitted variables or incorrect error assumptions. Despite their utility, PPCs exhibit limitations, including sensitivity to prior specifications, where informative priors can distort the posterior predictive distribution and lead to misleading fit assessments, particularly in low-data regimes. Additionally, generating sufficient replicates for reliable comparisons incurs high computational cost in complex models, often requiring simulations and potentially thousands of draws, which can be prohibitive without approximations.

Formulation in Exponential Families

Prior Predictive in Exponential Families

The exponential family provides a unifying framework for many common probability distributions, parameterized in the form
p(y \mid \theta) = h(y) \exp\left\{ \eta(\theta)^\top T(y) - A(\theta) \right\},
where \eta(\theta) is the natural parameter, T(y) is the , h(y) is the base measure, and A(\theta) is the log-normalizer ensuring integrability. This parameterization facilitates analytical tractability in , particularly when paired with conjugate priors that preserve the family structure upon updating.
The predictive distribution for a new observation \tilde{y} under an likelihood integrates over the \pi(\theta):
p(\tilde{y}) = \int h(\tilde{y}) \exp\left\{ \eta(\theta)^\top T(\tilde{y}) - A(\theta) \right\} \pi(\theta) \, d\theta.
This often simplifies to a closed form when using conjugate priors, which are chosen to match the structure, such as a for a likelihood with known variance. In the normal-normal case, the prior predictive distribution is a , reflecting the marginalization over the uncertain mean parameter.
A prominent closed-form example arises in the Poisson distribution, an exponential family member with natural parameter \eta(\theta) = \log \theta and sufficient statistic T(y) = y, where a gamma prior on the rate \theta \sim \Gamma(\alpha, \beta) yields a negative binomial prior predictive distribution for \tilde{y}:
p(\tilde{y}) = \frac{\Gamma(\tilde{y} + \alpha)}{\tilde{y}! \Gamma(\alpha)} \left( \frac{\beta}{1 + \beta} \right)^\alpha \left( \frac{1}{1 + \beta} \right)^{\tilde{y}}.
This distribution has mean \alpha / \beta and variance \alpha (1 + \beta) / \beta^2, exceeding the Poisson variance due to prior incorporation.
The predictive in exponential families typically exhibits heavier tails than the conditional likelihood p(\tilde{y} \mid \theta), as the integration over uncertainty in \theta introduces additional variability, broadening the predictive and enhancing robustness to model misspecification.

Posterior Predictive in Exponential Families

In exponential families, the use of conjugate priors enables closed-form expressions for the posterior predictive distribution, facilitating exact without relying on approximation methods. The likelihood for observed data y = (y_1, \dots, y_N) takes the form p(y | \eta) = \prod_{i=1}^N h(y_i) \exp\{ \eta^T T(y_i) - A(\eta) \}, where \eta is the natural parameter, T(y) is the , A(\eta) is the log-partition function, and h(y) is the base measure. A conjugate for \eta is given by p(\eta | \nu, n) = H(\nu, n) \exp\{ \nu^T \eta - n A(\eta) \}, where H(\nu, n) is the , \nu encodes sufficient statistics, and n reflects sample size. Upon observing the data, the posterior updates straightforwardly to p(\eta | y, \nu, n) = H(\nu', n') \exp\{ {\nu'}^T \eta - n' A(\eta) \}, with updated hyperparameters \nu' = \nu + \sum_{i=1}^N T(y_i) and n' = n + N. This preservation of the conjugate family form allows the posterior predictive distribution for a new observation \tilde{y} to be derived as p(\tilde{y} | y) = \int p(\tilde{y} | \eta) p(\eta | y) \, d\eta = h(\tilde{y}) \frac{H(\nu' + T(\tilde{y}), n' + 1)}{H(\nu', n')}. This expression yields analytically tractable distributions specific to the exponential family member, such as the beta-binomial for binomial likelihoods with beta priors or the Student-t for normal likelihoods with appropriate conjugate priors on mean and variance. A prominent example is the likelihood with a . For N independent trials with success probability \theta, the likelihood is , and the conjugate \theta \sim \text{[Beta](/page/Beta)}(\alpha, \beta) (where \nu = (\alpha - 1, \beta - 1)^T in natural parameterization) updates to a posterior \theta | y \sim \text{[Beta](/page/Beta)}(\alpha' , \beta'), with \alpha' = \alpha + \sum y_i and \beta' = \beta + N - \sum y_i. The resulting posterior predictive for a new set of M trials is the : p(\tilde{y} | y) = \binom{M}{\tilde{y}} \frac{B(\alpha' + \tilde{y}, \beta' + M - \tilde{y})}{B(\alpha', \beta')}, which accounts for both data variability and parameter uncertainty. Another key case involves the normal likelihood with a conjugate prior on the mean and precision. For observations y_i \sim \mathcal{N}(\mu, \sigma^2) with known \sigma^2, a normal prior \mu \sim \mathcal{N}(\mu_0, \sigma_0^2) leads to a normal posterior predictive. However, when incorporating uncertainty in the variance via a normal-inverse-gamma prior (conjugate for the mean and precision), the posterior predictive distribution for a new observation is a Student-t: \tilde{y} | y \sim t_{\nu_{\text{post}}}(\mu_{\text{post}}, \sigma_{\text{post}}^2), where the degrees of freedom \nu_{\text{post}}, location \mu_{\text{post}}, and scale \sigma_{\text{post}}^2 update based on the data and prior hyperparameters, and the variance of this distribution is \sigma_{\text{post}}^2 \frac{\nu_{\text{post}}}{\nu_{\text{post}} - 2}. This heavier-tailed predictive reflects epistemic uncertainty in both parameters. The primary advantages of this framework lie in its computational tractability: the integrals for the and predictive are exact and avoid simulation-based methods like MCMC, enabling efficient inference even for moderate datasets. This exactness is particularly valuable in hierarchical models or when multiple predictions are needed, as it provides closed-form without approximation error.

Joint Predictive Distribution and Marginal Likelihood

In Bayesian inference within exponential families equipped with conjugate priors, the joint predictive distribution for observed data y = (y_1, \dots, y_N) and future data \tilde{y} is given by
p(\tilde{y}, y) = \int p(\tilde{y} \mid \theta) p(y \mid \theta) \pi(\theta) \, d\theta,
where \pi(\theta) is the density on the parameter \theta. This integral represents the marginal probability of the combined (y, \tilde{y}).
For likelihoods in the exponential family form p(y_i \mid \eta) = h(y_i) \exp(\eta T(y_i) - A(\eta)), where T(y_i) is the sufficient statistic, \eta the natural parameter, h the base measure, and A the log-normalizer, the conjugate prior takes the form p(\eta) = H(\tau, n_0) \exp(\tau \cdot \eta - n_0 A(\eta)), with hyperparameters \tau and n_0, and H the normalizing constant. In this setup, the joint predictive admits a closed-form expression via updated sufficient statistics:
p(\tilde{y}, y) = \left[ \prod_{i=1}^N h(y_i) \right] h(\tilde{y}) \frac{H(\tau + T(y) + T(\tilde{y}), n_0 + N + 1)}{H(\tau, n_0)},
where T(y) = \sum_{i=1}^N T(y_i) and T(\tilde{y}) is the sufficient statistic for the future data. This form arises because the joint treats (y, \tilde{y}) as a single augmented sample from the exponential family, updating the prior hyperparameters accordingly.
The , or , for the observed is
m(y) = p(y) = \int p(y \mid \theta) \pi(\theta) \, d\theta = \left[ \prod_{i=1}^N h(y_i) \right] \frac{H(\tau + T(y), n_0 + N)}{H(\tau, n_0)},
which is computed as the ratio of normalizing constants from the posterior to the , reflecting the change in sufficient statistics induced by the .
The posterior predictive distribution relates directly to these quantities as p(\tilde{y} \mid y) = p(\tilde{y}, y) / m(y), which simplifies to
p(\tilde{y} \mid y) = h(\tilde{y}) \frac{H(\tau + T(y) + T(\tilde{y}), n_0 + N + 1)}{H(\tau + T(y), n_0 + N)}
in the conjugate case, facilitating -based updates in .
A example occurs in the Gamma- model, where the likelihood is with rate \theta and the is Gamma(\alpha, \beta), a conjugate pair for the representation of the . Here, the predictive for observed counts y (sum s = \sum y_i) and a future count \tilde{y} yields a for the combined total s + \tilde{y}, with updated \alpha + s and adjusted by the total exposure, demonstrating how the form extends the marginal to augmented .

Relation to Gibbs Sampling

Gibbs sampling is a Markov chain Monte Carlo (MCMC) algorithm that approximates the posterior distribution \pi(\theta \mid y) by iteratively drawing samples from the full conditional distributions of the parameters given the observed data y and the current values of the other parameters. For a model with parameters \theta = (\theta_1, \dots, \theta_p), the process initializes values for all \theta_j and then cycles through updates: at each iteration s, sample \theta_j^{(s)} \sim p(\theta_j \mid \theta_{-j}^{(s-1)}, y) for j = 1, \dots, p, where \theta_{-j} denotes all parameters except \theta_j; after a burn-in period, the samples \{\theta^{(s)}\}_{s=1}^S from the chain converge in distribution to draws from the joint posterior. This method exploits conditional independencies in the model to generate dependent samples that marginally approximate the target posterior without requiring the full joint density. To compute the posterior predictive distribution p(\tilde{y} \mid y), which integrates the likelihood over the posterior \int p(\tilde{y} \mid \theta) \pi(\theta \mid y) \, d\theta, provides an empirical approximation via . After obtaining posterior samples \{\theta^{(s)}\}_{s=1}^S from the , generate replicated data by drawing \tilde{y}^{(s)} \sim p(\tilde{y} \mid \theta^{(s)}) for each s, then estimate the predictive or moments as the empirical , such as p(\tilde{y} \mid y) \approx \frac{1}{S} \sum_{s=1}^S p(\tilde{y} \mid \theta^{(s)}). This nested sampling approach—first from the posterior via , then from the conditional predictive—yields a sample \{\tilde{y}^{(s)}\}_{s=1}^S whose distribution approximates the true posterior predictive, enabling summaries like estimates or discrepancy measures for model assessment. In non-conjugate settings, where the posterior lacks a closed form and direct over high-dimensional \theta is infeasible, offers a robust by relying only on tractable full conditionals rather than the intractable joint. It scales to complex, high-dimensional models by iteratively updating blocks of parameters, avoiding the curse of dimensionality in marginalization, and converges to the correct posterior under mild conditions, though practical diagnostics like trace plots are used to ensure chain mixing. For instance, in a hierarchical Bayesian model such as a with unknown means, variances, and mixing proportions, cycles through latent cluster assignments and parameter updates: sample assignments z_i conditional on current means \mu_k, mixing proportions \pi_k, and x_i, then update each \mu_k conditional on assigned points; posterior samples of \{\mu_k, \sigma^2, \pi\} then generate predictive replicates by sampling \tilde{z} \sim \text{Categorical}(\{\pi_k^{(s)}\}) and \tilde{x} \mid \tilde{z} \sim \mathcal{N}(\mu_{\tilde{z}}^{(s)}, \sigma^{2(s)}) for against held-out . This process, often with thousands of iterations post-burn-in, facilitates posterior predictive checks in multilevel structures where conjugate updates fail.

Connection to Predictive Inference Techniques

The posterior predictive distribution can be approximated using variational inference, which optimizes the (ELBO) to obtain a tractable variational posterior, enabling fast computation of predictive estimates at the expense of potential bias from the approximating family. This approach directly targets the posterior predictive in some formulations, learning an amortized approximation that improves predictive calibration over standard posterior-focused variational methods. Laplace approximation provides an alternative by fitting a Gaussian around the of the log-posterior, yielding an asymptotically posterior that facilitates closed-form or efficient of the predictive , especially suitable for high-dimensional models with large datasets where the applies. In latent Gaussian models, integrated nested Laplace approximations extend this to compute posterior marginals and predictives deterministically without sampling, offering scalability for complex spatial and temporal data. Importance sampling estimates the posterior predictive by drawing samples from a proposal distribution, such as the prior predictive, and reweighting them with importance ratios to match the target posterior measure, reducing variance when the proposal is well-chosen. This method proves particularly useful in low signal-to-noise regimes for posterior predictives, where optimized proposals mitigate estimation difficulties compared to direct averaging. The posterior predictive connects to cross-validation through leave-one-out (LOO) procedures, where it approximates the expected predictive under leave-one-out posteriors, enabling efficient model assessment without repeated full model refits via Pareto-smoothed . Such approximations leverage posterior samples to compute LOO expectations, providing asymptotically equivalent predictive checks to exact cross-validation while remaining fully Bayesian.

References

  1. [1]
    [PDF] Bayesian Data Analysis Third edition (with errors fixed as of 20 ...
    ... Bayesian Data Analysis. Third edition. (with errors fixed as of 20 ... posterior predictive distribution of unknown observables ˜y. Results from a ...
  2. [2]
    Bayesian Inference in Statistical Analysis - Wiley Online Library
    Bayesian Inference in Statistical Analysis ; Author(s):. George E.P. Box, George C. Tiao, ; First published:6 April 1992 ; Print ISBN:9780471574286 ...
  3. [3]
    Laplace's 1774 Memoir on Inverse Probability - Project Euclid
    Laplace's first major article on mathematical statistics was published in 1774. It is arguably the most influential article in this field to appear before 1800.
  4. [4]
    [PDF] Lecture 3. Univariate Bayesian inference: conjugate analysis
    Posterior-predictive distributions for Binomial data​​ where a = a + y, b = b + n are the revised parameters of the beta distribution after having observed y out ...
  5. [5]
    Posterior predictive assessment of model fitness via realized ...
    Abstract: This paper considers Bayesian counterparts of the classical tests for goodness of fit and their use in judging the fit of a single Bayesian model ...
  6. [6]
    Two simple examples for understanding posterior p-values whose ...
    We explore this tension through two simple normal-distribution examples. In one example, we argue that the low power of the posterior predictive check is ...
  7. [7]
    A survey of Bayesian predictive methods for model assessment ...
    ... Prior predictive distribution and Bayes factor . . . . . . . . . . . 212. 5.7 ... Bayesian decision theory gives a natural definition for the assessment ...<|control11|><|separator|>
  8. [8]
    [PDF] Practical Bayesian model evaluation using leave-one-out cross ...
    Jun 29, 2016 · The purpose of using LOO or WAIC is to estimate the accuracy of the predictive distribution p(˜yi|y). Computation of PSIS-LOO and WAIC (and ...
  9. [9]
    Posterior and Prior Predictive Checks - Stan
    The posterior predictive distribution is the distribution over new observations given previous observations. It's predictive in the sense that it's predicting ...
  10. [10]
    [PDF] Chapter 9 The exponential family: Conjugate priors - People @EECS
    Conjugate priors are prior distributions chosen so that prior-to-posterior updating yields a posterior that is also in the same family, allowing tractable ...
  11. [11]
    [PDF] Conjugate priors: Beta and normal Class 15, 18.05
    With a conjugate prior the posterior is of the same type, e.g. for binomial likelihood the beta prior becomes a beta posterior. Conjugate priors are useful ...
  12. [12]
    [PDF] Conjugate Bayesian analysis of the Gaussian distribution
    Oct 3, 2007 · The use of conjugate priors allows all the results to be derived in closed form.
  13. [13]
    [PDF] CPSC 540: Machine Learning - Conjugate Priors
    Exponential family distributions can be written in the form p(x | w) ∝ h(x) exp(wT F(x)). We often have h(x)=1, or an indicator that x satisfies constraints. F( ...
  14. [14]
    [PDF] Bayesian posterior simulation McMC Gibbs
    Also, posterior predictive distributions provide a diagnostic check on model specification adequacy. If sample data and posterior predictive draws are.
  15. [15]
    [PDF] Bayesian Mixture Models and the Gibbs Sampler - Columbia CS
    Oct 19, 2015 · The posterior is a conditional distribution over these quantities. ' As usual, the posterior also gives a posterior predictive distribution,.
  16. [16]
    [2307.07568] Variational Prediction - arXiv
    Jul 14, 2023 · In this paper, we present variational prediction, a technique for directly learning a variational approximation to the posterior predictive distribution using ...
  17. [17]
    Learn the predictively optimal posterior distribution - arXiv
    Oct 18, 2024 · We propose predictive variational inference (PVI): a general inference framework that seeks and samples from an optimal posterior density.
  18. [18]
    Understanding and mitigating difficulties in posterior predictive ...
    May 30, 2024 · Predictive posterior densities (PPDs) are of interest in approximate Bayesian inference. Typically, these are estimated by simple Monte Carlo (MC) averages.