Bayes factor
The Bayes factor is a key quantity in Bayesian statistics that compares the relative support for two competing hypotheses or models given observed data, defined as the ratio of the marginal likelihood of the data under one model to that under the other.[1] It serves as an updating factor on prior odds, where a Bayes factor B_{10} = 6, for instance, indicates that the data are six times more likely under the alternative hypothesis H_1 than under the null hypothesis H_0.[1] Originating in the work of Harold Jeffreys in 1935, the concept built on earlier contributions from Dorothy Wrinch and J.B.S. Haldane in the 1920s and 1930s, with Jeffreys formalizing it as a tool for scientific inference in his influential book Theory of Probability.[1] The term "Bayes factor" itself was coined later by Robert E. Kass and Adrian E. Raftery in their 1995 paper, which popularized its use for model selection and hypothesis testing. Unlike frequentist p-values, which only assess evidence against a null hypothesis, the Bayes factor provides a symmetric measure that can quantify evidence in favor of either hypothesis, distinguishing between absence of evidence and evidence of absence.[1] It is particularly advantageous for comparing non-nested models and is robust to optional stopping in data collection, making it suitable for sequential experimental designs.[1] Jeffreys proposed a heuristic scale to interpret Bayes factor magnitudes as strength of evidence: values between 1 and 3 indicate "anecdotal" support for H_1, 3 to 10 offer "moderate" evidence, 10 to 30 provide "strong" evidence, 30 to 100 yield "very strong" evidence, and greater than 100 represent "extreme" evidence, with reciprocals applying for support of H_0.[1] This scale, while subjective, has been widely adopted and refined in fields like psychology, neuroscience, and economics for model comparison tasks. Bayes factors often require numerical approximations due to the intractability of marginal likelihoods in complex models, but they remain central to Bayesian decision-making and evidence accumulation.Mathematical Foundations
Definition
The Bayes factor is a statistical measure used in Bayesian inference to quantify the relative evidence provided by observed data for one model over another competing model. It was introduced by Harold Jeffreys as a tool for objective hypothesis testing within a Bayesian framework.[2] Mathematically, the Bayes factor in favor of model M_1 over model M_2 given data D, denoted BF_{10}, is defined as the ratio of the marginal likelihoods under each model: BF_{10} = \frac{p(D \mid M_1)}{p(D \mid M_2)} The marginal likelihood p(D \mid M) for a model M with parameters \theta is obtained by integrating the likelihood over the prior distribution of the parameters: p(D \mid M) = \int p(D \mid \theta, M) \, p(\theta \mid M) \, d\theta This integration averages the model's predictive performance across all plausible parameter values weighted by the prior, providing a summary of the model's overall fit to the data independent of specific parameter estimates.[2] A common notation convention is BF_{01} = 1 / BF_{10}, which reverses the comparison to favor M_0 (often the null model) over M_1. The Bayes factor plays a central role in Bayesian model comparison by directly comparing the predictive adequacy of competing models based on the observed data, facilitating decisions about model selection without relying on point estimates or frequentist criteria.[2]Relationship to Bayes' Theorem
The Bayes factor emerges directly from Bayes' theorem as a key component in updating the probabilities of competing models based on observed data. Bayes' theorem states that the posterior probability of a model M_i given data D is P(M_i | D) \propto P(D | M_i) P(M_i), where P(D | M_i) is the marginal likelihood under the model and P(M_i) is the prior probability. For two models M_1 and M_2, the ratio of posterior model probabilities, known as the posterior odds, is therefore \frac{P(M_1 | D)}{P(M_2 | D)} = \frac{P(M_1)}{P(M_2)} \times \frac{P(D | M_1)}{P(D | M_2)}, with the second factor on the right-hand side defining the Bayes factor BF_{12}.[2] This formulation demonstrates that the Bayes factor serves as a multiplier that adjusts the prior odds to yield the posterior odds, encapsulating how the data shifts belief between models.[2] By isolating \frac{P(D | M_1)}{P(D | M_2)}, the Bayes factor measures the relative support for each model provided solely by the data, disentangling this evidential contribution from subjective prior beliefs about the models' plausibility.[2] This separation allows the Bayes factor to function as an objective summary of the data's evidential value within the Bayesian updating process, applicable across diverse modeling contexts.[2] The derivation of the Bayes factor highlights a fundamental distinction in handling point-null hypotheses versus composite models. Under a point-null hypothesis, such as M_0: \theta = \theta_0, the marginal likelihood P(D | M_0) simplifies to the likelihood evaluated directly at the fixed parameter value, as there is no parameter uncertainty to integrate over.[2] In contrast, for a composite model M_1 with parameters varying over a continuous space, P(D | M_1) requires integrating the likelihood over a prior distribution on the parameters to average out uncertainty, as previously outlined in the definition of marginal likelihood.[2] This difference affects the computational form of the Bayes factor but preserves its role in the posterior odds equation.[2] Harold Jeffreys pioneered the application of the Bayes factor within this framework of Bayes' theorem in his 1939 monograph Theory of Probability (first edition).[3][2]Interpretation
Evidence Scales
The interpretation of the Bayes factor (BF) relies on standardized scales that categorize its magnitude into qualitative levels of evidence for one model (say, the alternative M_1) over another (say, the null M_0). These scales provide a heuristic framework for assessing evidential strength, though they are not universally fixed.[4] A seminal scale was proposed by Harold Jeffreys, which divides BF values into grades based on orders of magnitude, emphasizing decisive evidence for large values. Jeffreys' scale, as commonly referenced, is as follows:| BF_{10} | Evidence against M_0 |
|---|---|
| > 100 | Decisive |
| 30–100 | Very strong |
| 10–30 | Strong |
| 3–10 | Substantial |
| 1–3 | Barely worth mentioning |
| BF_{10} | 2 ln(BF_{10}) | Evidence against M_0 |
|---|---|---|
| > 150 | > 10 | Very strong |
| 20–150 | 6–10 | Strong |
| 3–20 | 2–6 | Positive |
| 1–3 | 0–2 | Barely worth mentioning |