Fact-checked by Grok 2 weeks ago

Conjugate prior

In , a is defined as a distribution such that, when multiplied by a , the resulting posterior distribution belongs to the same family as the prior, enabling straightforward analytical updates without . This property simplifies by preserving the distributional form, allowing hyperparameters of the prior to be adjusted based on observed data as if incorporating pseudo-observations. The concept of conjugate priors was introduced by Howard Raiffa and Robert Schlaifer in their 1961 book Applied Statistical Decision Theory, where it was formalized to facilitate under in Bayesian frameworks. Their work emphasized "natural conjugate" priors tailored to specific likelihoods, particularly within exponential families of distributions, which dominate modern applications due to their mathematical tractability. Over time, conjugate priors have become a cornerstone of Bayesian , especially in scenarios requiring closed-form solutions, though they are sometimes critiqued for limiting flexibility compared to non-conjugate alternatives. Conjugate priors offer key advantages, including computational efficiency and intuitive parameterization—such as interpreting hyperparameters as prior sample sizes or means—which aids in eliciting from experts. Common examples include the as a conjugate prior for the or likelihood (for modeling success probabilities), the for the likelihood (for rates), and the for the likelihood with known variance (for means). For multivariate cases or unknown variances, extensions like the inverse-Wishart or normal-inverse-gamma priors are frequently used in hierarchical models.

Fundamentals

Definition

In Bayesian inference, a conjugate prior refers to a family of prior distributions such that, when updated with observed data via the , the resulting posterior belongs to the same parametric family as the prior. This property ensures that the updating process preserves the distributional form, facilitating analytical tractability in probabilistic modeling. The concept of conjugacy was introduced by Howard Raiffa and Robert Schlaifer in their 1961 book Applied Statistical Decision Theory, where it emerged as a tool within decision-theoretic frameworks to streamline prior-to-posterior transitions. This historical development emphasized conjugacy's utility in applied settings, particularly for decision-making under uncertainty. Conjugate priors simplify Bayesian updating by yielding a closed-form expression for the posterior distribution, thereby circumventing the need for numerical integration or approximation techniques to compute the marginal likelihood. In contrast to non-conjugate priors, which often necessitate more computationally intensive methods like for posterior inference, conjugacy provides an algebraic convenience without being essential for conducting valid Bayesian analysis.

Mathematical Formulation

In , the posterior distribution of the parameter \theta given data x is formally defined as \pi(\theta \mid x) = \frac{L(\theta \mid x) \pi(\theta)}{m(x)}, where L(\theta \mid x) denotes the , \pi(\theta) is the prior , and m(x) = \int L(\theta \mid x) \pi(\theta) \, d\theta is the that serves as the . This formulation arises directly from and ensures that the posterior is a proper . A conjugate prior is characterized by the property that, if the prior \pi(\theta) belongs to a specific parametric family \Pi, then the posterior \pi(\theta \mid x) also belongs to the same family \Pi, typically with updated hyperparameters that reflect the incorporation of the observed data. This preservation of functional form simplifies computation, as the normalizing constant m(x) can often be evaluated analytically within the family. For likelihood functions belonging to the , the conjugacy mechanism can be derived explicitly. The likelihood takes the form L(\theta \mid x) \propto \exp\left\{ \eta(\theta)^\top t(x) - A(\eta(\theta)) \right\}, where \eta(\theta) is the natural parameter, t(x) is the , and A(\cdot) is the log-normalizer. The conjugate prior is then chosen to mimic this structure: \pi(\theta) \propto \exp\left\{ \eta(\theta)^\top \tau - \nu A(\eta(\theta)) \right\}, with hyperparameters \tau representing prior pseudo-sufficient statistics and \nu a prior sample size parameter. The posterior follows by substitution into the general Bayes update. Assuming x consists of n i.i.d. observations with total sufficient statistic t(x) = \sum_{i=1}^n t(x_i), \pi(\theta \mid x) \propto L(\theta \mid x) \pi(\theta) \propto \exp\left\{ \eta(\theta)^\top (\tau + t(x)) - (\nu + n) A(\eta(\theta)) \right\}. This demonstrates how conjugacy preserves the exponential family form, with the updated hyperparameters \tau' = \tau + t(x) and \nu' = \nu + n, while the marginal likelihood m(x) is obtained by integrating over \theta and adjusting for the change in the normalizing constant induced by the updated parameters. The role of the normalizing constant in the prior and likelihood ensures that the posterior remains properly normalized without requiring separate computation in many cases.

Interpretations

Pseudo-observations Interpretation

The pseudo-observations interpretation of conjugate priors conceptualizes the hyperparameters as counts derived from fictitious or imaginary points that encapsulate prior beliefs about the model parameters. This analogy transforms the abstract into a more tangible, data-like representation, facilitating an intuitive understanding of how influences . In this view, the conjugate prior acts as if it were based on a dataset, which is then augmented by the actual observed during Bayesian updating. A prominent example arises in the pairing of a (α, β) prior with a likelihood, where α and β represent the shape parameters. Here, α - 1 can be interpreted as the number of pseudo-successes and β - 1 as the number of pseudo-failures from a hypothetical prior sample. This framing aligns the prior with an equivalent set of imaginary trials that reflect the anticipated behavior of the before any real data is encountered. Bayesian updating under this conjugate pair proceeds by incorporating the real data into the pseudo-observations: if the observed data consists of s successes and f failures, the posterior distribution becomes Beta(α + s, β + f). This update rule effectively combines the virtual prior data with the actual observations, yielding a posterior that reflects both sources of information in a weighted manner. The process reinforces the data-like nature of the prior, as the hyperparameters are simply accumulated alongside the empirical counts. This interpretation offers significant advantages for building intuition in Bayesian analysis, particularly by enabling practitioners to conceptualize the as a form of "data-driven" shrinkage. The posterior estimates are pulled toward the mean in proportion to the relative strengths of the pseudo-sample (governed by α + β) and the real sample size, mimicking how frequentist shrinkage estimators operate toward a . Such a aids in eliciting and communicating priors, as domain experts can specify hyperparameters by analogy to past or simulated data experiences. Despite its intuitive appeal, the pseudo-observations analogy has limitations, as these virtual data points are not genuine observations and serve only as a for the 's influence. A common pitfall occurs when the prior strength, quantified by α + β, is directly equated to an equivalent real sample size without adjustment; in reality, the effective prior sample size is often closer to α + β - 2 for certain measures of , potentially leading to over- or under-estimation of the prior's impact on the posterior. This misinterpretation can distort assessments of model robustness, especially in scenarios where the prior's assumed data-generating process diverges from the actual likelihood.

Dynamical Systems Interpretation

The conjugate updating process in can be interpreted as a discrete , where the represents the initial , and each new data observation acts as an input that drives a transition to the posterior . This transition preserves the distributional family due to conjugacy, ensuring that the posterior remains within the same form as the . Such a highlights the iterative nature of , analogous to in time-discrete systems, where the "state" is the set of hyperparameters characterizing the belief . This dynamical perspective connects directly to recursive filtering techniques, extending the classical —which relies on Gaussian conjugacy for exact linear updates in state-space models—to non-Gaussian settings. In these analogs, conjugacy facilitates closed-form posterior updates without approximation, enabling efficient sequential inference even under nonlinear or multimodal dynamics. For instance, in tracking applications, the prior-to-posterior map serves as a filter step that incorporates measurement likelihoods while maintaining tractable computations. Mathematically, within the exponential family framework, the state can be represented by a vector of hyperparameters, often the natural parameters \eta, which evolve linearly upon observing new data. Specifically, the posterior natural parameter is given by \eta' = \eta + T(x), where T(x) denotes the sufficient statistics extracted from the data x. This additive update rule embodies the linear dynamics of the system, with the hyperparameters serving as the evolving state vector that accumulates evidence over sequential observations. In time-series analysis, this interpretation underpins sequential Bayesian estimation methods, such as dynamic generalized linear models, where conjugacy ensures computational tractability across multiple time steps. Each update incorporates incoming data while propagating forward, making it ideal for real-time and adaptive modeling in evolving systems like financial series or sensor networks.

Examples

Basic Example

A foundational illustration of the conjugate prior involves estimating the success probability \theta in a sequence of independent Bernoulli trials, where the prior on \theta is a , \theta \sim \text{Beta}(\alpha, \beta), with shape parameters \alpha > 0 and \beta > 0. This setup is conjugate because the likelihood follows a : given n trials with s successes, the likelihood is p(\text{[data](/page/Data)} \mid \theta) \propto \theta^s (1 - \theta)^{n - s}, and the resulting posterior remains , specifically \theta \mid \text{[data](/page/Data)} \sim \text{Beta}(\alpha + s, \beta + n - s). To see the updating process step by step, consider initial parameters \alpha = 2 and \beta = 2, yielding a mean of \mu_{\text{prior}} = \frac{\alpha}{\alpha + \beta} = 0.5, which reflects symmetry around equal success probability, and a prior variance of \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)} = \frac{1}{20} = 0.05. Suppose the consist of n = 10 trials with s = 7 successes, so the sample proportion is \hat{\theta} = 0.7. The posterior parameters become \alpha' = 2 + 7 = 9 and \beta' = 2 + 10 - 7 = 5, giving a posterior mean of \mu_{\text{post}} = \frac{9}{9 + 5} = 0.643, which lies between the prior mean (0.5) and the proportion (0.7). This posterior mean can be expressed as a weighted average: \mu_{\text{post}} = \left( \frac{\alpha + \beta}{\alpha + \beta + n} \right) \mu_{\text{prior}} + \left( \frac{n}{\alpha + \beta + n} \right) \hat{\theta}, where the effective sample size \alpha + \beta = 4 receives weight 4/14 ≈ 0.286, and the receive weight 10/14 ≈ 0.714, pulling the estimate toward the observed successes. The posterior variance shrinks to \frac{9 \times 5}{(14)^2 \times 15} \approx 0.014, a of about 72% from the prior, demonstrating how additional concentrates the belief around the updated estimate. In visualization, the prior beta density is U-shaped for small \alpha and \beta (e.g., \alpha = \beta = 0.5), broadening uncertainty across [0, 1]; after observing data, the posterior shifts and narrows, with the mode at \frac{\alpha + s - 1}{\alpha + \beta + n - 2} aligning closer to \hat{\theta} as n grows, while the conjugacy ensures the form stays beta for straightforward computation without numerical integration. This ease arises from the parameter addition rule, where prior "pseudo-counts" \alpha successes and \beta failures simply add to the observed s and n - s. This Bernoulli-binomial-beta example serves as an archetype for modeling discrete binary outcomes, such as coin flips or pass/fail events, highlighting how conjugacy facilitates intuitive updates by treating the prior as additional data, a principle formalized in early Bayesian decision theory.

Practical Example

A practical application of conjugate priors involves modeling the daily number of visits to a website, which follows a Poisson distribution with rate parameter \lambda representing the expected number of visits per day. The Gamma distribution is the conjugate prior for \lambda under this likelihood, enabling straightforward Bayesian updating. Consider a prior distribution \lambda \sim \text{Gamma}(\alpha = 2, \beta = 1), which has $2/1 = 2 and variance $2/1^2 = 2, reflecting a moderately informative equivalent to pseudo-observations of 2 visits over 1 day. After observing visit counts over 5 days totaling 12 visits (i.e., \sum y_i = 12, n = 5), the posterior distribution is \lambda \mid \mathbf{y} \sim \text{Gamma}\left(2 + 12, 1 + 5\right) = \text{Gamma}(14, 6), with $14/6 \approx 2.33 and variance $14/6^2 \approx 0.39. This posterior indicates an updated estimate of daily visits slightly higher than the , incorporating the observed data. The 95% credible interval for \lambda is obtained from the 0.025 and 0.975 quantiles of the \text{Gamma}(14, 6) distribution, yielding approximately (1.13, 3.53). For inference on future observations, the posterior predictive distribution for a new day's visit count Y integrates out \lambda: P(Y = y \mid \mathbf{y}) = \int \text{Poisson}(y \mid \lambda) \, \text{Gamma}(\lambda \mid 14, 6) \, d\lambda, which follows a Negative Binomial distribution with shape r = 14 and success probability p = 6/(6 + 1) = 6/7. This predictive has mean $14/6 \approx 2.33 (matching the posterior mean of \lambda) and variance (14/6) \times (7/6) \approx 2.72, accounting for both Poisson variability and posterior uncertainty in \lambda. The use of conjugate priors here provides closed-form expressions for the posterior and predictive distributions, facilitating exact probabilistic inference and avoiding the computational expense of numerical methods like (MCMC), which are necessary for non-conjugate prior-likelihood pairs in similar count models.

Conjugate Distributions

Discrete Likelihoods

Conjugate priors for discrete likelihoods are particularly useful in for models involving categorical outcomes, binary events, or count , where the take on values with either finite or . These pairs facilitate analytical of the posterior distribution by maintaining the same family after updating with observed . The following table summarizes key conjugate prior pairs for common discrete likelihood distributions, including the likelihood's form and parameters, the conjugate prior family with its hyperparameters, and the posterior update rule based on observed data.
LikelihoodParametersPrior FamilyHyperparametersPosterior Update
Binomial (number of successes in n trials)p (success probability)Beta\alpha, \beta > 0Beta(\alpha + s, \beta + n - s), where s is the number of observed successes
Multinomial (counts across k categories in n trials)\mathbf{p} = (p_1, \dots, p_k) with \sum p_i = 1Dirichlet\boldsymbol{\alpha} = (\alpha_1, \dots, \alpha_k) with \alpha_i > 0Dirichlet(\boldsymbol{\alpha} + \mathbf{x}), where \mathbf{x} = (x_1, \dots, x_k) are observed counts
Poisson (count events in fixed interval)\lambda > 0 (rate)GammaShape \alpha > 0, rate \beta > 0Gamma(\alpha + \sum x_i, \beta + n), where n is the number of observations and \sum x_i is the total count (Note: Poisson data are discrete, but \lambda is continuous)
Geometric (number of trials until first success)p (success probability)Beta\alpha, \beta > 0Beta(\alpha + 1, \beta + f), where f is the number of observed failures
Negative Binomial (number of failures before r successes)p (success probability)Beta\alpha, \beta > 0Beta(\alpha + n r, \beta + \sum f_i), where n is the number of observations and \sum f_i is the total observed failures across observations
These pairs are applicable when data consist of categorical choices (e.g., multinomial for multi-class outcomes with finite support) or non-negative integer counts (e.g., or geometric for infinite support scenarios like event occurrences or waiting times). Conjugacy simplifies updates by adding observed sufficient statistics to the hyperparameters, enabling exact posteriors without numerical methods. An edge case arises in sampling without replacement, modeled by the (drawing n items from a population of size N with K successes, estimating the proportion K/N). Here, the beta-binomial serves as a conjugate pair approximation, treating the proportion as having a (\alpha, \beta) , with the posterior updating to (\alpha + s, \beta + n - s) where s is observed successes, though it assumes a structure for large populations.

Continuous Likelihoods

In , conjugate priors for continuous likelihoods are particularly valuable for distributions in the , where the posterior remains in the same family as the , facilitating analytical updates. Common pairs arise for location-scale families, such as the , where the encodes pseudo-observations that combine with data via weighted averages. The following table summarizes key conjugate prior pairs for prominent continuous likelihoods, focusing on the likelihood form, prior distribution with hyperparameters, and posterior hyperparameter transformations. These updates assume independent and identically distributed observations x_1, \dots, x_n from the likelihood.
LikelihoodParameter(s)PriorPosterior Hyperparameters
Normal: x_i \sim \mathcal{N}(\mu, \sigma^2) (known \sigma^2)Mean \muNormal: \mu \sim \mathcal{N}(\mu_0, (\kappa_0 \sigma^2)^{-1})\mu_n \sim \mathcal{N}\left( \frac{\kappa_0 \mu_0 + n \bar{x}}{\kappa_0 + n}, \left( (\kappa_0 + n) \sigma^2 \right)^{-1} \right), where \bar{x} = n^{-1} \sum x_i, \kappa_n = \kappa_0 + n
Normal: x_i \sim \mathcal{N}(\mu, \sigma^2) (unknown \mu, \sigma^2)Mean \mu and variance \sigma^2Normal-Inverse-Gamma: \mu \mid \sigma^2 \sim \mathcal{N}(\mu_0, (\kappa_0 \sigma^2)^{-1}), \sigma^2 \sim \text{IG}(\nu_0/2, \nu_0 \sigma_0^2 / 2)\mu_n \mid \sigma_n^2 \sim \mathcal{N}\left( \frac{\kappa_0 \mu_0 + n \bar{x}}{\kappa_n}, (\kappa_n \sigma_n^2)^{-1} \right), \sigma_n^2 \sim \text{IG}(\nu_n/2, \nu_n \sigma_n^2 / 2), with \kappa_n = \kappa_0 + n, \nu_n = \nu_0 + n, \bar{x}_n = n^{-1} \sum x_i, \sigma_n^2 = \frac{\nu_0 \sigma_0^2 + \sum (x_i - \bar{x})^2 + \frac{\kappa_0 n}{\kappa_n} (\bar{x} - \mu_0)^2}{\nu_n}
Exponential: x_i \sim \text{Exp}(\lambda) (rate \lambda)Rate \lambdaGamma: \lambda \sim \text{Gamma}(\alpha_0, \beta_0)\lambda_n \sim \text{Gamma}(\alpha_0 + n, \beta_0 + \sum x_i)
Gamma: x_i \sim \text{Gamma}(\alpha, \beta) (known shape \alpha, rate \beta)Rate \betaGamma: \beta \sim \text{Gamma}(a_0, b_0)\beta_n \sim \text{Gamma}(a_0 + n \alpha, b_0 + \sum x_i)
Normal: x_i \sim \mathcal{N}(\mu, \sigma^2) (known \mu)Variance \sigma^2Inverse-Gamma: \sigma^2 \sim \text{IG}(\alpha_0, \beta_0)\sigma_n^2 \sim \text{IG}(\alpha_0 + n/2, \beta_0 + \frac{1}{2} \sum (x_i - \mu)^2)
Student's t: Marginal from Normal likelihood with Normal-Inverse-Gamma priorLocation \mu (marginal)Implied by Normal-Inverse-Gamma prior on (\mu, \sigma^2)Student's t: \mu_n \sim t_{\nu_n} \left( \mu_n^*, \frac{\sigma_n^2 (1 + 1/\kappa_n)}{\nu_n} \right), with parameters as above
These pairs exploit the structure of densities, ensuring the sufficient statistics update the hyperparameters additively. For instance, in the normal case with known variance, the prior precision \kappa_0 represents the equivalent number of prior observations, and the posterior is a precision-weighted average of the prior and sample . Similarly, for the , the gamma prior's shape \alpha_0 acts as prior "events," updated by the number of observations n. Multivariate extensions maintain conjugacy for location-scale families; for the multivariate likelihood x_i \sim \mathcal{N}(\mu, \Sigma) with unknown \mu and covariance \Sigma, the normal-Wishart prior is conjugate, where \mu \mid \Sigma \sim \mathcal{N}(\mu_0, (\kappa_0 \Sigma)^{-1}) and \Sigma \sim \text{IW}(\nu_0, S_0), yielding posterior updates \kappa_n = \kappa_0 + n, \mu_n = \frac{\kappa_0 \mu_0 + n \bar{x}_n}{\kappa_n}, \nu_n = \nu_0 + n, and S_n = S_0 + \sum (x_i - \bar{x}_n)(x_i - \bar{x}_n)^\top + \frac{\kappa_0 n}{\kappa_n} (\bar{x}_n - \mu_0)(\bar{x}_n - \mu_0)^\top. The Wishart distribution serves as the conjugate prior for the precision matrix in such models, analogous to the gamma for univariate precision. Not all continuous likelihoods admit simple conjugate priors; for example, the lacks a natural conjugate prior, often requiring numerical methods like for posterior inference.

References

  1. [1]
    [PDF] Lecture 21 — Prior distributions 21.1 Conjugate priors and improper ...
    This lecture is a discussion of some topics on the interpretation and use of Bayesian priors and their influence on posterior inference. 21.1 Conjugate priors ...
  2. [2]
    [PDF] A Compendium of Conjugate Priors - Applied Mathematics Consulting
    The definition and construction of conjugate prior distributions depends on the existence and identification of sufficient statistics of fixed dimension for the ...Missing: history | Show results with:history
  3. [3]
    Conjugate Prior - an overview | ScienceDirect Topics
    A Conjugate Prior refers to a prior probability distribution that, when combined with a likelihood function, results in a posterior probability distribution ...Missing: history | Show results with:history
  4. [4]
    Conjugate prior | Definition, explanation and examples - StatLect
    In other words, when we use a conjugate prior, the posterior resulting from the Bayesian updating process is in the same parametric family as the prior. Example.
  5. [5]
    Applied Statistical Decision Theory - Google Books
    Jun 19, 2019 · Authors, Howard Raiffa, Robert Schlaifer ; Publisher, Division of Research, Graduate School of Business Administration, Harvard University, 1961.
  6. [6]
    [PDF] ML, MAP, and Bayesian — The Holy Trinity of Parameter Estimation ...
    Nov 27, 2023 · • The formula shown above gives us a closed form expression for the ... 4.1: What is a Conjugate Prior? • As you saw, Bayesian estimation ...
  7. [7]
    Conjugate Priors for Exponential Families - Project Euclid
    We characterize conjugate prior measures on Θ Θ through the property of linear posterior expectation of the mean parameter of X:E{E(X|θ)|X=x}=ax+b X : E { E ( X ...
  8. [8]
    [PDF] Chapter 9 The exponential family: Conjugate priors - People @EECS
    For exponential families the likelihood is a simple standarized function of the parameter and we can define conjugate priors by mimicking the form of the ...
  9. [9]
    [PDF] Machine Learning: A Probabilistic Perspective. - Noiselab@UCSD
    Machine learning : a probabilistic perspective / Kevin P. Murphy. p. cm ... We see that the posterior is obtained by adding the prior hyper-parameters (pseudo- ...
  10. [10]
    [PDF] Bayesian statistics: a concise introduction - UBC Computer Science
    Oct 5, 2007 · Picking priors is one of the more controversial aspects of Bayesian statistics, because it is seen as “subjective”. But data analysis is never ...
  11. [11]
    [PDF] Bayesian Data Analysis Third edition (with errors fixed as of 20 ...
    This book is intended to have three roles and to serve three associated audiences: an introductory text on Bayesian inference starting from first principles, a ...
  12. [12]
    Approximate Gaussian Conjugacy: Parametric Recursive Filtering ...
    Oct 17, 2017 · sian distribution with a known mean. Based on conjugate prior, the ... dynamical system based on limited information. While the ...
  13. [13]
    [PDF] Dynamic Generalized Linear Models and Bayesian Forecasting
    Three key features of the models are (a) the sequential analysis and the use of conjugate prior distributions that lead to closed form updating and ...
  14. [14]
    [PDF] 18.05 S22 Reading 15: Conjugate priors: Beta and normal
    This means that if the likelihood function is binomial, then a beta prior gives a beta posterior –this is what we saw in the previous examples.
  15. [15]
    [PDF] Applied Statistical Decision Theory - Gwern
    In the field of statistical decision theory Professors Raiffa and Schlaifer have sought to develop new analytical tech niques by which the modern theory of ...
  16. [16]
    [1410.6843] Posteriors, conjugacy, and exponential families ... - arXiv
    Oct 24, 2014 · Along the way, we prove that the gamma process is a conjugate prior for the Poisson likelihood process and the beta prime process is a ...
  17. [17]
    Gamma Distribution Applet/Calculator - University of Iowa
    ... Statistics and Actuarial Science University of Iowa. This applet computes probabilities and percentiles for gamma random variables: X∼Gamma(α,β) X ∼ G a m m ...Missing: 14 6 0.025 0.975
  18. [18]
    [PDF] Bayesian Inference for the Negative Binomial Distribution via ...
    In this section, we focus on evaluating the mean and variance of the predictive distribution: E(Yx) and var(Y; x). ... (1990), "Using the Gamma-Poisson Model to ...
  19. [19]
    [PDF] 12.1.1 Poisson-GammaConjugate
    In other words, the gamma density is the conjugate prior for the Poisson ... Thus, the predictive distribution under BMA has a larger variance than the predictive ...
  20. [20]
    [PDF] The Conjugate Prior for the Normal Distribution 1 Fixed variance (σ2 ...
    Feb 8, 2010 · Our aim is to find conjugate prior distributions for these parameters. We will investigate the hyper-parameter. (prior parameter) update ...
  21. [21]
    [PDF] STAT 535: Chapter 5: More Conjugate Priors
    Recall that a conjugate prior is a prior which (along with the data model) produces a posterior distribution that has the same functional form as the prior ...
  22. [22]
    [PDF] Jeffreys priors 1 Priors for the multivariate Gaussian - People @EECS
    Feb 10, 2010 · The conjugate prior is a multivariate Gaussian of mean µ0 and covariance matrix Σ0. ... The conjugate prior is the inverse Wishart distribution.
  23. [23]
    [PDF] innovations in lifetime data analysis and missing value imputations
    Aug 7, 2025 · First, we introduce an adaptive semi-parametric MCMC framework for Weibull lifetime modeling, addressing the lack of conjugate priors and ...Missing: gaps | Show results with:gaps