Fact-checked by Grok 2 weeks ago

Bayesian network

A Bayesian network, also known as a belief network or Bayes net, is a probabilistic graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG).^[1] In this structure, nodes correspond to random variables—ranging from discrete to continuous—and directed edges signify direct probabilistic influences or conditional dependencies between variables, while the absence of edges encodes conditional independencies among non-adjacent nodes.^[2] The full joint probability distribution over the variables is compactly encoded as the product of each variable's conditional probability given its parent variables in the graph, enabling efficient representation and manipulation of complex probability distributions that would otherwise require exponential storage.^[1] The concept of Bayesian networks was formalized and popularized by Judea Pearl in the 1980s, building on earlier work in probabilistic reasoning and graphical models dating back to the 1920s with Sewall Wright's path analysis in genetics.^[3] Pearl's seminal contributions, including his 1988 book Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, introduced belief propagation algorithms for exact inference in these networks, allowing for the updating of probabilities based on new evidence.^[3] This framework leverages Bayes' theorem to perform reasoning under uncertainty, making it a cornerstone of artificial intelligence and machine learning for handling incomplete or noisy data.^[4] Bayesian networks support two primary tasks: inference, which involves computing posterior probabilities or marginals given observed evidence, often via algorithms like variable elimination or junction tree methods for exact results, or approximate methods like Markov chain Monte Carlo for intractable cases; and learning, which estimates network structure and parameters from data using approaches such as score-based search or constraint-based discovery.^[1] These capabilities make Bayesian networks particularly suited for domains requiring causal modeling and decision-making under uncertainty.^[5] Notable applications span medicine, where they aid in diagnostic systems by integrating symptoms, test results, and disease probabilities, for example in the Pathfinder system for pathology diagnosis;^[6] biology, for inferring gene regulatory networks from expression data; and risk assessment, such as in environmental modeling or financial forecasting. In engineering and operations research, they facilitate fault diagnosis in complex systems and reliability analysis.^[7] Recent advancements integrate Bayesian networks with deep learning for scalable inference in large-scale datasets, enhancing their utility in modern AI applications like natural language processing and autonomous systems.^[8]

Introduction

Basic Definition

A Bayesian network is a probabilistic graphical model that represents a set of random variables and their conditional dependencies through a directed acyclic graph (DAG), where each node corresponds to a random variable and each directed edge indicates a direct probabilistic influence from one variable to another.^[9] The primary purpose of a Bayesian network is to compactly encode the joint probability distribution over the variables by specifying a factorization into local conditional probability distributions, one for each variable given its direct predecessors in the graph, thereby enabling efficient representation and reasoning under uncertainty.^[9] Bayesian networks support updating beliefs about unobserved variables through the incorporation of evidence on observed ones, leveraging Bayes' theorem to revise probabilities coherently across the model. The term was coined by Judea Pearl in 1985 to highlight its foundations in Bayesian conditioning and its applicability to evidential and causal reasoning in artificial intelligence and statistics.^[10]

Illustrative Example

A common illustrative example of a Bayesian network is a model for diagnosing lung cancer based on patient symptoms and risk factors. The network consists of four binary variables: Smoking (indicating whether the patient smokes), Cancer (indicating presence of lung cancer), Cough (indicating the presence of a cough), and XRay (indicating a positive X-ray result). Each variable can take values yes or no. The directed edges capture causal relationships: an arrow from Smoking to Cancer reflects that smoking increases the risk of cancer, while arrows from Cancer to Cough and from Cancer to XRay reflect that cancer causes these symptoms and test outcomes. This structure forms a directed acyclic graph (DAG), depicted textually as:

Smoking → Cancer → Cough
         ↘
         XRay
Smoking → Cancer → Cough
         ↘
         XRay

The DAG ensures no cycles and encodes the probabilistic dependencies among the variables. To construct this network, begin by defining the nodes for the relevant variables and their states. Next, draw directed edges based on domain knowledge of causal or associative influences, such as medical evidence linking smoking to cancer and cancer to observable symptoms. Finally, specify a conditional probability table (CPT) for each node, quantifying the probabilities given its parents. For root nodes like Smoking with no parents, the CPT is simply the prior probability distribution. For other nodes, the CPT lists probabilities conditioned on all combinations of parent states. Since the variables are binary, most CPTs are 2x1 or 2x2 tables. Example CPTs, drawn from standard medical diagnosis illustrations, are as follows (probabilities sum to 1 for each conditioning case): CPT for Smoking (prior):

Smoking	Probability
yes	0.3
no	0.7

CPT for Cancer given Smoking:

Cancer \ Smoking	yes	no
yes	0.10	0.01
no	0.90	0.99

CPT for Cough given Cancer:

Cough \ Cancer	yes	no
yes	0.80	0.30
no	0.20	0.70

CPT for XRay given Cancer:

XRay \ Cancer	yes	no
yes	0.90	0.20
no	0.10	0.80

These tables parameterize the joint probability distribution over the variables via the network structure. To perform basic inference, such as computing the posterior probability P(Cancer=yes | Cough=yes), leverage the network's factorization to express the query in terms of the specified conditional probabilities, marginalizing over unobserved variables like Smoking and XRay. First, compute the marginal prior P(Cancer=yes) by summing over Smoking: P(Cancer=yes) = P(Smoking=yes) × P(Cancer=yes | Smoking=yes) + P(Smoking=no) × P(Cancer=yes | Smoking=no) = (0.3 × 0.10) + (0.7 × 0.01) = 0.037. Similarly, P(Cancer=no) = 1 - 0.037 = 0.963. Then, compute the marginal likelihood P(Cough=yes) = P(Cough=yes | Cancer=yes) × P(Cancer=yes) + P(Cough=yes | Cancer=no) × P(Cancer=no) = (0.80 × 0.037) + (0.30 × 0.963) ≈ 0.319. Finally, apply Bayes' rule: P(Cancer=yes | Cough=yes) = [P(Cough=yes | Cancer=yes) × P(Cancer=yes)] / P(Cough=yes) ≈ (0.80 × 0.037) / 0.319 ≈ 0.093. This yields about a 9.3% probability of cancer given a cough, higher than the prior 3.7% due to the evidence. The network structure exploits conditional independencies—for instance, Smoking is independent of Cough given Cancer—to simplify these calculations by avoiding unnecessary summations over irrelevant paths.

Graphical and Structural Foundations

Directed Acyclic Graph Representation

A Bayesian network is represented by a directed acyclic graph (DAG), in which each node corresponds to a random variable that may be either discrete or continuous. Directed edges in the graph connect pairs of nodes, indicating a direct conditional dependency: an edge from node X to node Y signifies that Y directly depends on X, conditional on the other direct influencers of Y. This structure compactly encodes the qualitative relationships among the variables, capturing how the state of one variable influences others without specifying the full joint distribution.^[4] The acyclicity of the graph is a fundamental requirement, ensuring no directed cycles exist that could lead to inconsistencies in dependency resolution during probability propagation. Without cycles, the DAG admits a topological ordering of its nodes, a linear arrangement where every directed edge points from an earlier node to a later one, facilitating ordered processing of dependencies from "root" causes to effects. This property follows from graph theory: a directed graph is acyclic if and only if it possesses at least one topological ordering, which can be constructed via algorithms like depth-first search that detect and prevent cycles. Within the DAG, key structural relationships define the dependency hierarchy. The parents of a node are all nodes with outgoing edges directly to it, representing the immediate conditioning variables. Conversely, the children of a node are those with incoming edges from it, denoting variables directly affected by the node. Ancestors encompass all nodes reachable via directed paths leading to the node in question, forming the extended set of influencers through transitive dependencies. These relations highlight the hierarchical nature of the model, where influences propagate unidirectionally along paths. Qualitatively, the directed edges symbolize direct influences or associative links between variables, often lending themselves to interpretations of mechanistic or conditional effects in practical domains.^[4] However, the graph structure encodes dependencies without inherently establishing causation, which requires additional assumptions or data.

Conditional Independencies and Markov Properties

In Bayesian networks, the directed acyclic graph (DAG) structure encodes conditional independencies among the random variables, enabling compact representation of joint probability distributions by capturing only the essential probabilistic dependencies. These independencies arise from the topological properties of the graph, where the presence or absence of directed edges signifies direct influences, and the overall arrangement implies broader separation conditions that translate to probabilistic independence statements. This encoding is formalized through Markov properties, which bridge the graphical representation and the underlying probability distribution. The local Markov property provides a foundational building block for these independencies, stating that each variable in the network is conditionally independent of its non-descendants given its parents.^[11] Formally, for a variable X_i with parents \mathrm{Pa}(X_i), X_i \perp \mathrm{ND}(X_i) \mid \mathrm{Pa}(X_i), where \mathrm{ND}(X_i) denotes the set of non-descendants of X_i. This property ensures that once the immediate causes (parents) are known, the variable does not depend on other preceding variables in the graph, reflecting the causal or influential flow along directed paths. It implies conditional independencies, such as a variable being screened off from its non-descendants by its parents, and extends to more general cases through the global Markov property. Building on the local property, the global Markov property extends these independencies to arbitrary subsets of variables, asserting that if a set S separates two disjoint sets A and B in the graph—meaning no active path connects A and B given S—then A \perp B \mid S holds in the distribution.^[12] This separation is determined graphically via criteria like d-separation, which identifies blocked paths in the DAG. The global property thus captures the full spectrum of conditional independencies implied by the network structure, allowing inference about distant variables conditioned on intervening ones. A Bayesian network's DAG serves as an I-map (independence map) for the joint distribution, meaning the graph faithfully represents a subset of the true conditional independencies in the probability space, though it may not capture all (i.e., it is not necessarily a perfect map).^[12] This mapping ensures that every graphical independence corresponds to a genuine probabilistic one, providing a minimal yet sufficient structure for reasoning under uncertainty, as established in foundational work on plausible inference.

Formal Definitions and Properties

Factorization of Joint Probability Distribution

In a Bayesian network, the joint probability distribution over a set of random variables X_1, X_2, \dots, X_n represented by a directed acyclic graph (DAG) factors into a product of conditional probabilities, each depending only on the variable's direct predecessors, or parents, in the graph.^[13] This factorization is given by

P(X_1, \dots, X_n) = \prod_{i=1}^n P(X_i \mid \mathrm{Pa}(X_i)),

where \mathrm{Pa}(X_i) denotes the set of parents of X_i in the DAG.^[13] This representation derives from the chain rule of probability, which expands the joint distribution as P(X_1, \dots, X_n) = \prod_{i=1}^n P(X_i \mid X_1, \dots, X_{i-1}), but restricts the conditioning set for each X_i to only its parents \mathrm{Pa}(X_i) due to the conditional independencies encoded in the graph structure.^[13] The local Markov property justifies this restriction by implying that each variable is conditionally independent of its non-descendants given its parents.^[13] For discrete variables, each conditional probability P(X_i \mid \mathrm{Pa}(X_i)) is specified via a conditional probability table (CPT), which enumerates the probabilities for all possible values of X_i given every combination of parent values.^[9] For continuous variables, these conditionals are instead represented by probability density functions that describe the distribution of X_i given its parents.^[14] The resulting factorization provides a minimal I-map of the joint distribution, meaning the graph's independencies form a subset of those in the true distribution, with no redundant edges; removing any edge would violate at least one independence.^[15] This minimal structure ensures an efficient yet complete encoding of the probabilistic dependencies implied by the network.^[15]

Local Markov Property

The local Markov property is a fundamental independence assumption in Bayesian networks that captures the directed dependencies encoded by the directed acyclic graph (DAG). Formally, given a Bayesian network consisting of random variables X = \{X_v : v \in V\} and a DAG G = (V, A), the property states that for every node v \in V, X_v is conditionally independent of its non-descendants given its parents:
X_v \perp\!\!\!\perp \left( V \setminus (\{v\} \cup \mathrm{De}(v)) \right) \mid \mathrm{Pa}(v),
where \mathrm{De}(v) denotes the descendants of v (excluding v), and \mathrm{Pa}(v) are the direct parents of v.^[16] This means that once the immediate causes (parents) of X_v are known, X_v provides no additional information about variables that do not descend from it.^[17] The proof of the local Markov property follows from the joint probability factorization of the Bayesian network. The joint distribution factorizes as
P(X) = \prod_{v \in V} P(X_v \mid \mathrm{Pa}(v)).
Let ND denote the non-descendants of v, so ND = V \setminus (\{v\} \cup \mathrm{De}(v)), and consider the set S = ND ∪ {v}. The marginal distribution P(S) is obtained by summing over the descendants De(v), and since the sum over De(v) integrates to 1, P(S) = ∏{u \in S} P(X_u \mid \mathrm{Pa}(u)). For u ∈ ND, Pa(u) ⊂ ND because v is not an ancestor of any node in ND (otherwise, that node would be a descendant of v). Moreover, none of the CPDs for nodes in ND depend on X_v, as there are no edges from v to ND. Thus, P(S) = \left[ \prod{u \in \mathrm{ND}} P(X_u \mid \mathrm{Pa}(u)) \right] \times P(X_v \mid \mathrm{Pa}(v)), where the first product does not involve X_v. Therefore, P(X_v \mid \mathrm{ND}) = P(X_v \mid \mathrm{Pa}(v)), establishing the independence.^[16]^[17] This property has key implications for the minimality of the network structure. It ensures that the DAG is a minimal I-map (independence map) for the distribution, meaning every edge represents a direct dependence that cannot be removed without violating the encoded conditional independencies. If an edge were redundant—such as connecting a non-parent non-descendant directly to v—the local Markov property would be contravened, as it would imply an unshielded dependence not captured by the parents alone. Consequently, the graph avoids superfluous arcs, promoting a parsimonious representation of the probabilistic dependencies.^[16] In comparison to undirected graphical models like Markov random fields, the local Markov property in Bayesian networks emphasizes directional causality through parent conditioning, whereas in undirected models, a variable is conditionally independent of non-adjacent variables given its immediate neighbors, without distinguishing directionality or ancestry. This directed formulation allows Bayesian networks to model asymmetric influences and temporal or causal orders more explicitly than their undirected counterparts.^[18]

d-Separation Criterion

The d-separation criterion, introduced by Judea Pearl, is a graphical test for determining conditional independencies in a Bayesian network represented as a directed acyclic graph (DAG). It specifies conditions under which two disjoint sets of nodes, X and Y, are conditionally independent given a third disjoint set Z, denoted as X \perp Y \mid Z. This criterion relies on analyzing undirected paths in the DAG to identify whether evidence in Z blocks all information flow between X and Y. Formally, sets X and Y are d-separated by Z, written X \perp_d Y \mid Z, if every undirected path from a node in X to a node in Y is blocked by Z. A path is blocked by Z if, for at least one triple of consecutive nodes i - k - j along the path, one of the following holds:^[19]

Serial (head-to-tail) connection: i \to k \to j, and k \in Z.
Diverging (tail-to-tail) connection: i \leftarrow k \to j, and k \in Z.
Converging (head-to-head) connection: i \to k \leftarrow j, and neither k nor any descendant of k is in Z.

Otherwise, the path is active (d-connecting), allowing dependence to propagate. If all paths are blocked, the criterion guarantees X \perp Y \mid Z under the network's joint distribution. This equivalence between graphical d-separation and probabilistic conditional independence is a foundational property of Bayesian networks.^[19]^[15] To check d-separation efficiently without enumerating all paths, an algorithm uses moralization and undirected graph search on a relevant subgraph. First, extract the ancestral subgraph induced by the ancestors of X \cup Y \cup Z (including X, Y, and Z themselves). Moralize this subgraph by adding undirected edges between all pairs of parents sharing a common child and then undirecting all existing edges. Remove the nodes in Z and their incident edges. Finally, perform an undirected path search (e.g., via BFS or DFS); if no path connects any node in X to any in Y, then X and Y are d-separated by Z. This linear-time procedure leverages the graph structure for scalable independence testing.^[20] For example, consider a DAG with A → C ← B (converging at C). The path A - C - B is blocked by Z = ∅ because it is a converging connection and neither C nor any of its descendants is in Z; thus, A ⊥ B | ∅. However, with Z = {C}, the path becomes active since the collider C is in Z, so A ⊥̸ B | C. Now consider a serial connection, say C → D in an extended graph. The path A - C - D (A → C → D) is active for Z = ∅ but blocked if C ∈ Z. These cases illustrate how Z can activate or block paths depending on the connection type.^[19] The Markov blanket of a node—the set comprising its parents, children, and children's other parents—serves as its minimal d-separator from the rest of the network.^[19]

Inference Methods

Exact Inference Techniques

Exact inference in Bayesian networks involves algorithms that compute precise posterior probabilities by leveraging the joint distribution's factorization into conditional probabilities, ensuring no approximation errors but often at exponential computational cost. These methods systematically eliminate variables or propagate beliefs across the graph structure to obtain marginal distributions conditioned on evidence. The core idea is to reduce the inference problem to summing over irrelevant variables while multiplying factors derived from the network's conditional probability tables (CPTs). For a query marginal P(X_i \mid e), where e denotes evidence, the exact computation is given by

P(X_i \mid e) = \frac{1}{Z} \sum_{\mathbf{X}_{\setminus i}} \prod_{j=1}^n P(X_j \mid \mathrm{Pa}(X_j), e),

where \mathbf{X}_{\setminus i} are all variables except X_i, \mathrm{Pa}(X_j) are the parents of X_j, the product exploits the chain rule factorization, and Z = \sum_{X_i} \sum_{\mathbf{X}_{\setminus i}} \prod_{j=1}^n P(X_j \mid \mathrm{Pa}(X_j), e) normalizes the unnormalized posterior. This formulation avoids explicit construction of the full joint distribution, which would be infeasible for networks with many variables. Variable elimination is a foundational exact inference algorithm that iteratively removes non-query variables by summing them out from intermediate factors, producing the desired marginals efficiently for networks with low treewidth. The process begins by identifying an elimination order, typically one that minimizes the size of intermediate factors, such as a minimum fill-in or minimum degree heuristic. For each variable X_k to eliminate, all factors involving X_k are multiplied into a single factor, after which X_k is summed out, yielding a new factor over the remaining variables. This continues until only the query variable remains, with evidence incorporated by setting CPT entries for observed variables to zero or one as appropriate. The time and space complexity is determined by the treewidth w, the size of the largest clique in a triangulated moral graph, leading to O(n \cdot d^w) operations, where n is the number of variables and d the maximum domain size; poor elimination orders can induce larger cliques, exacerbating costs. Variable elimination unifies various inference tasks and extends to bucket elimination variants for structured representations.^[21] Belief propagation, also known as message passing, enables exact inference in polytrees—Bayesian networks that are directed trees or singly connected graphs without undirected cycles—by performing forward and backward passes to update beliefs at each node. In the forward pass (from roots to leaves), messages \mu_{Y \to X}(x) from parent Y to child X represent the probability distribution over X given evidence in the ancestors of Y, computed as \mu_{Y \to X}(x) = \sum_y P(x \mid y) P(y \mid e_{\mathrm{anc}(Y)}), where e_{\mathrm{anc}(Y)} is evidence from ancestors. The backward pass then sends messages from leaves to roots, incorporating descendant evidence, allowing each node's belief \mathrm{bel}(X = x) to be the product of incoming messages times its local CPT, normalized. This local computation exploits the tree structure to avoid redundant summations, achieving exact results in linear time relative to the number of edges for polytrees.^[22] The algorithm's efficiency stems from the absence of multiple paths between nodes, preventing conflicts in message consistency. For general directed acyclic graphs (DAGs) with cycles in the underlying undirected graph, the junction tree algorithm transforms the network into a clique tree (or junction tree) for exact message passing, generalizing belief propagation. The process starts with moralization, adding undirected edges between co-parents to form the moral graph, followed by triangulation to eliminate cycles by adding fill-in edges, ensuring chordal structure. Maximal cliques are identified, and a junction tree is constructed where nodes represent cliques, edges represent separators (intersections of adjacent cliques), and the tree satisfies running intersection and clique intersection properties for consistent propagation. Potentials are initialized over cliques from the CPTs and evidence, then messages are passed along the tree: an upward pass absorbs evidence from leaves, and a downward pass distributes it, yielding marginals for all clique variables via belief updates \mathrm{bel}(C) = \phi_C \prod_{S \in \mathrm{seps}(C)} \mu_{S \to C}, where \phi_C is the clique potential and \mu are separator messages. The complexity is O(n \cdot d^w), mirroring variable elimination, but the junction tree allows reuse for multiple queries.^[23] This method, implemented in systems like HUGIN, ensures exact inference by confining computations to the tree's structure.

Approximate Inference Algorithms

Approximate inference algorithms are essential for Bayesian networks where exact methods, such as variable elimination or belief propagation on trees, become computationally intractable due to high treewidth or dense connections. These algorithms provide estimates of posterior probabilities by introducing controlled approximations, enabling inference in large-scale networks used in applications like medical diagnosis and fault detection. Sampling-based techniques generate representative samples from the target distribution, while optimization-based methods seek tractable surrogates to the posterior.

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo methods construct a sequence of samples that converge in distribution to the posterior p(\mathbf{X}_U | \mathbf{e}), where \mathbf{X}_U are unobserved variables and \mathbf{e} is evidence. In Bayesian networks, Gibbs sampling, a blocked MCMC variant, draws each unobserved variable from its full conditional distribution given the current state of all others, exploiting conditional independencies to compute these efficiently via local message passing or clique potentials. This iterative process, starting from an initial configuration consistent with evidence, mixes through the state space, with burn-in periods discarded to ensure stationarity. Gibbs sampling for belief networks builds on early stochastic simulation frameworks.^[24] For scenarios where full conditionals are complex, the Metropolis-Hastings algorithm proposes updates to subsets of variables from a user-defined proposal distribution, such as perturbing a single node or sampling from a neighboring conditional, and accepts the proposal with probability \min\left(1, \frac{p(\mathbf{x}') q(\mathbf{x} | \mathbf{x}')}{p(\mathbf{x}) q(\mathbf{x}' | \mathbf{x})}\right) to maintain detailed balance. This flexibility allows tailoring to network structure, like proposing changes along Markov blankets. Applications in networks often combine Gibbs steps with Hastings proposals for better exploration in multimodal posteriors.^[25] Convergence diagnostics are crucial for MCMC reliability, including trace plots to detect autocorrelation, effective sample size to gauge independence, and the Gelman-Rubin potential scale reduction factor, which assesses between-chain variance against within-chain variance across multiple parallel runs, targeting values below 1.1 for practical convergence. These ensure samples faithfully represent the posterior in network inference tasks.

Importance Sampling and Likelihood Weighting

Importance sampling approximates expectations under the posterior by drawing from an easy-to-sample proposal distribution q(\mathbf{x}) and reweighting via w(\mathbf{x}) = \frac{p(\mathbf{x} | \mathbf{e})}{q(\mathbf{x})}, with self-normalized estimators reducing variance through resampling. In Bayesian networks, the proposal is often a forward pass through the graph, adjusted for evidence. Likelihood weighting, a targeted importance sampling method, generates complete instantiations by sampling ancestors from their priors and setting evidence nodes to observed values, accumulating weights as the product of conditional likelihoods for evidence nodes: w = \prod_{i \in E} p(x_i | \pa(x_i)), where E is the evidence set. This avoids discarding mismatched samples, improving efficiency over rejection-based approaches, especially with hard evidence. The method converges faster when proposals align closely with the posterior, as in networks with sparse evidence. Likelihood weighting was developed as a simulation approach for general belief network inference. A precursor, logic sampling (rejection sampling), draws from the unconditioned prior and rejects samples inconsistent with evidence, but suffers high rejection rates in sparse-data scenarios. Henrion introduced probabilistic logic sampling for propagating uncertainty in multiply connected networks.^[26] Adaptive variants, like importance sampling with evidence pre-propagation, refine proposals by incorporating partial evidence early to reduce weight variance.^[27]

Variational Inference

Variational inference casts posterior approximation as an optimization problem, selecting a distribution q(\mathbf{x}) from a tractable family to minimize the Kullback-Leibler (KL) divergence \KL(q \| p) = \mathbb{E}_q[\log q(\mathbf{x}) - \log p(\mathbf{x} | \mathbf{e})], equivalent to maximizing the evidence lower bound (ELBO): \ELBO(q) = \mathbb{E}_q[\log p(\mathbf{x}, \mathbf{e})] - \mathbb{E}_q[\log q(\mathbf{x})]. For Bayesian networks, mean-field approximations assume full factorized q(\mathbf{x}) = \prod_i q_i(x_i), leading to coordinate ascent updates where each q_i is set to \exp(\mathbb{E}_{q_{-i}}[\log p(\mathbf{x} | \mathbf{e})]) up to normalization, computed using network structure. This yields fixed-point iterations similar to expectation-maximization but for inference, suitable for loopy or high-dimensional networks. Variational methods scale linearly with network size under mean-field, outperforming sampling in high-evidence cases by providing point estimates of marginals. The approach for graphical models, including Bayesian networks, was formalized in a tutorial framework emphasizing structured approximations.^[28]

Loopy Belief Propagation

Loopy belief propagation generalizes exact sum-product message passing to cyclic graphs by iteratively exchanging messages between nodes without loop elimination, computing approximate marginals as normalized products of incoming messages after convergence. Messages from node i to j update as m_{i \to j}(x_j) \propto \sum_{x_i} \psi_{ij}(x_i, x_j) \prod_{k \neq j} m_{k \to i}(x_i), where \psi are factors, iterated until fixed points. Fixed points of loopy BP minimize the Bethe free energy, a variational upper bound on the negative log-partition function, providing theoretical justification for its approximations despite cycles. Empirically, it excels in loosely connected loopy networks, like pairwise Markov random fields, with damping to aid convergence. Yedidia et al. unified belief propagation variants, showing loopy iterations as special cases of generalized BP on region graphs.^[29]

Computational Complexity

Exact inference in Bayesian networks is NP-hard in general, as demonstrated by a reduction from the 3-SAT problem, where computing the probability of evidence corresponds to determining satisfiability in a constructed network. This hardness holds even for networks with binary variables and evidence, and extends to #P-completeness when counting the number of satisfying assignments, underscoring the exponential growth in computational demands for arbitrary network structures. However, exact inference becomes tractable in polynomial time for specific subclasses of networks. In polytrees—directed acyclic graphs where no two nodes share multiple paths—belief propagation algorithms enable efficient message passing to compute marginal probabilities. More broadly, the complexity is governed by the treewidth of the network, defined as the size of the largest clique minus one in an optimal chordal completion of the moralized graph; inference via junction tree methods runs in time exponential only in the treewidth, making it polynomial when the treewidth is bounded. For approximate inference, sampling methods such as Markov chain Monte Carlo offer scalability, with PAC-Bayesian bounds providing probabilistic guarantees on the approximation error relative to the true posterior, ensuring high-probability closeness with sufficient samples. Recent advances as of 2025 include quantum-assisted inference protocols, such as implementations of quantum rejection sampling for approximate inference in discrete Bayesian networks using hybrid quantum-classical methods on simulators, as evaluated in studies showing reduced error in noisy settings.^[30]^[31] Additionally, hybrid approaches combining classical exact methods on low-treewidth subnetworks with quantum or variational approximations have been explored for high-dimensional applications, improving scalability in mixed-complexity structures.^[32]

Learning Procedures

Parameter Estimation

Parameter estimation in Bayesian networks involves learning the conditional probability tables (CPTs) or parameters of conditional probability distributions for each node, given a fixed network structure and observed data. This process assumes the directed acyclic graph (DAG) is known, allowing parameters to be estimated locally for each node based on its parents. The joint distribution factorizes according to the structure, enabling independent estimation per node.^[33] For discrete variables, maximum likelihood estimation (MLE) provides a closed-form solution when data is fully observed. The MLE for the probability P(X_i = x | Pa(X_i) = pa) is the relative frequency: the number of instances where X_i = x and its parents Pa(X_i) = pa, divided by the number of instances where Pa(X_i) = pa. This corresponds to maximizing the log-likelihood of the data under the model. When data has missing values, the expectation-maximization (EM) algorithm is used to find local MLEs iteratively. In the E-step, expected counts are computed by performing inference (e.g., using belief propagation) to fill in missing values probabilistically based on current parameters; in the M-step, these expected counts replace observed counts to update parameters as in the complete-data case. The EM algorithm guarantees non-decreasing likelihood and convergence to a local maximum.^[34]^[35] Bayesian estimation incorporates prior knowledge by treating parameters as random variables with a prior distribution, updated to a posterior given the data. For discrete nodes with multinomial CPTs, conjugate Dirichlet priors are commonly used, one per configuration of parents. For a node X with parents Pa(X) and |X| = r states, the prior for the row corresponding to pa is Dirichlet(α_1, ..., α_r), where α_j > 0 are hyperparameters. The posterior is then Dirichlet(α_1 + N_{x=1,pa}, ..., α_r + N_{x=r,pa}), where N_{x=j,pa} are the observed counts. The maximum a posteriori (MAP) estimate is the posterior mode: \hat{P}(X = j | pa) = \frac{α_j + N_{x=j,pa}}{\sum_{k=1}^r α_k + N_{pa}}, with N_{pa} the total counts for pa. Laplace smoothing corresponds to the uniform Dirichlet prior with all α_j = 1, adding pseudocounts to avoid zero probabilities. For incomplete data, EM can be adapted to maximize the expected posterior or integrate over it, though exact Bayesian updating is computationally intensive.^[33]^[36] For continuous variables, parameter estimation often assumes linear Gaussian conditional distributions, where each node's conditional mean is a linear function of its parents plus noise. Under this model, known as a linear Gaussian Bayesian network, the MLE for the regression coefficients (weights on parents) and variance is obtained via linear regression on the observed data for each node separately, regressing the node values against its parents' values. The variance parameter is the mean squared residual error from the regression fit. Bayesian estimation uses conjugate Normal-Inverse-Gamma priors for the mean and variance, leading to posterior updates similar to the discrete case, with the posterior mean as the point estimate. This assumption enables tractable estimation and inference via the joint multivariate Gaussian distribution implied by the network.^[35]

Structure Discovery

Structure discovery in Bayesian networks involves inferring the directed acyclic graph (DAG) topology from observational data, a key step in learning models without prior structural knowledge. This process is computationally challenging due to the super-exponential number of possible DAGs for even modest numbers of variables. Algorithms for structure discovery broadly fall into score-based and constraint-based categories, each leveraging different principles to explore the space of potential graphs.^[37] Score-based methods evaluate candidate DAGs using a scoring function that balances model fit to the data and complexity penalties, aiming to maximize the posterior probability of the structure given the data. Common scores include the Bayesian Information Criterion (BIC), which approximates the marginal likelihood by penalizing the number of parameters as \log n times half the degrees of freedom, where n is the sample size, promoting parsimonious models.^[38] Another widely used score is the Bayesian Dirichlet equivalent uniform (BDeu), which assumes uniform priors over parameters and structures, providing a decomposable measure suitable for discrete data. To search the space, greedy algorithms like the K2 algorithm start from an empty graph and iteratively add parents to each node in a fixed order, selecting the set that maximizes the local score until no improvement is possible or a fan-in limit is reached.^[39] For global exploration, Markov chain Monte Carlo (MCMC) sampling, such as the structure MCMC proposed by Madigan and York, generates samples from the posterior over DAGs by proposing edge additions, deletions, or reversals and accepting them via Metropolis-Hastings, enabling Bayesian model averaging over structures. Constraint-based methods, in contrast, construct the graph by testing for conditional independencies in the data, relying on the d-separation criterion to infer separation in the DAG. The PC algorithm, a seminal approach, begins by estimating the undirected skeleton through a series of conditional independence tests starting with marginal (order 0) and increasing conditioning set sizes (orders 1, 2, etc.), using tests like the chi-squared for discrete variables to remove edges when independence holds at a significance level. V-structures (collider patterns) are then oriented, followed by additional rules to propagate orientations while avoiding cycles, yielding a partially directed graph representing the Markov equivalence class.^[40] A major challenge in structure discovery is that multiple DAGs can encode the same set of conditional independencies, forming a Markov equivalence class where no algorithm can distinguish members from data alone without additional assumptions.^[41] This equivalence is characterized by shared skeletons and v-structures, limiting identifiability to the class rather than a unique DAG. Constraint-based methods assume faithfulness, which posits that all conditional independencies in the distribution are entailed by the graph's d-separations and vice versa, ensuring the tests recover the true independencies without spurious or missing ones.^[40] Violations of faithfulness can lead to incorrect structures, particularly in small samples. Recent advances up to 2025 have focused on hybrid methods combining score- and constraint-based elements for improved accuracy and scalability, including integrations with deep learning to enhance independence testing or score approximation in high-dimensional settings. For instance, neural networks have been used to learn nonlinear independence tests, outperforming traditional statistical tests in complex data regimes. Additionally, as of November 2025, large language models have been explored for structure discovery in a data-free manner, leveraging LLM-generated priors for conditional independence assessment.^[37]^[42]

Constraints on Priors and Data

In Bayesian network learning, data are typically assumed to consist of independent and identically distributed (i.i.d.) samples drawn from the underlying joint probability distribution, enabling consistent parameter estimation under sufficient sample sizes.^[43] This i.i.d. assumption facilitates the application of standard scoring functions like the Bayesian Dirichlet equivalent (BDe) score, but violations—such as dependencies or non-stationarity—can lead to biased structure discovery. For reliable estimates, sample sizes must be adequate relative to network complexity; for instance, in benchmark networks like ALARM, sizes below 500 often qualify as limited data scenarios, where learning accuracy degrades significantly without regularization.^[44] Complete datasets allow direct maximum likelihood estimation, whereas incomplete data, common in real-world applications, necessitates imputation or expectation-maximization approaches to handle missing values under the missing at random assumption.^[45] Prior distributions on network parameters impose key restrictions to ensure tractable inference and prevent overfitting, with the Dirichlet distribution serving as the canonical conjugate prior for multinomial conditional probability tables due to its compatibility with likelihood updates.^[46] This conjugacy yields closed-form posterior distributions, promoting posterior consistency in structure learning when hyperparameters are chosen to reflect equivalent sample sizes.^[47] To address sparsity, priors often incorporate penalties that favor simpler graphs, such as those penalizing edge additions via uniform or structured sparsity biases, which empirically outperform non-sparse priors in high-dimensional settings by reducing false positives in edge discovery.^[48] These restrictions mitigate the curse of dimensionality, where overly complex priors can dominate small datasets, leading to unstable estimates. Structural assumptions like faithfulness and causal sufficiency further constrain learning by linking observable data patterns to the underlying graph. Faithfulness posits that all conditional independencies in the distribution are faithfully represented by d-separations in the network, a condition necessary for consistent constraint-based structure recovery. Recent results show that faithful distributions are typical over a given DAG, with violations having measure zero in the continuous parameter space.^[49] Causal sufficiency assumes no hidden confounders, ensuring that the observed variables fully capture causal pathways without unmeasured common causes; breaches of this assumption hinder causal identifiability, requiring additional domain knowledge or sensitivity analyses for robust inference.^[37] Post-2020 advancements have addressed gaps in handling non-stationary and high-dimensional data, where traditional i.i.d. and stationary assumptions falter. In non-stationary environments, evolving dependencies challenge standard learning, prompting methods like time-varying Bayesian networks that adapt priors to detect regime shifts in data streams.^[50] For high-dimensional settings with thousands of variables, sparsity-inducing priors combined with scalable inference mitigate computational barriers, enabling structure learning from sparse, non-i.i.d. observations while preserving faithfulness under relaxed conditions.^[51] These developments underscore the need for flexible priors that balance data-driven estimation with assumption robustness in dynamic, large-scale applications.

Extensions and Variants

Causal Bayesian Networks

Causal Bayesian networks represent a specialized application of Bayesian networks where directed acyclic graphs (DAGs) encode causal relationships among variables, distinguishing them from standard Bayesian networks that primarily model associational dependencies for predictive inference from observational data.^[52] In associational interpretations, probabilities like P(Y|X) reflect correlations under passive observation, suitable for forecasting outcomes based on evidence.^[52] Causal interpretations, however, enable reasoning about interventions and "what-if" scenarios, such as the effect of changing X on Y while holding other factors fixed, which requires modifying the joint distribution to account for hypothetical manipulations.^[52] Central to causal Bayesian networks is the do-operator, \do(X=x), which denotes an intervention that forcibly sets variable X to value x, severing its dependence on parent variables by removing all incoming edges to X in the DAG—a process known as graph mutilation.^[52] This intervention alters the probability distribution from the observational P(X) to a deterministic point mass at x, while preserving the conditional distributions of other variables given their parents.^[52] Computations on the mutilated graph, such as interventional queries P(Y|\do(X=x)), can then leverage standard Bayesian network inference techniques, including adaptations of d-separation to assess independencies under intervention.^[52] The causal effect of X on Y, quantified by P(Y|\do(X)), generally differs from the observational P(Y|X) due to confounding via common causes that induce spurious associations.^[52] Identification of such effects from observational data relies on graphical criteria applied to the DAG. The back-door criterion identifies a set Z that blocks all back-door paths (non-directed paths from X to Y with an arrow into X) and contains no descendants of X; if satisfied, the effect is given by the adjustment formula:

P(y|\do(x)) = \sum_z P(y|x,z) P(z)

which averages over the confounders in Z.^[52] Complementing the back-door criterion, the front-door criterion applies when a set Z intercepts all directed paths from X to Y, no unblocked back-door path exists from X to Z, and X blocks all back-door paths from Z to Y.^[52] Under these conditions, the causal effect is identifiable even with unmeasured confounders affecting both X and Y, via the formula:

P(y|\do(x)) = \sum_z \left[ P(z|x) \sum_{x'} P(y|x',z) P(x') \right]

which first estimates the effect of X on Z observationally, then propagates it to Y while adjusting for X's role in blocking confounders.^[52] Both criteria ensure identifiability by verifying conditional independencies on appropriately mutilated graphs, enabling causal queries without direct experimentation.^[52]

Dynamic Bayesian Networks

Dynamic Bayesian networks (DBNs) extend standard Bayesian networks to model dynamic systems with temporal dependencies, representing sequences of random variables over discrete time steps. A DBN is constructed by unrolling a directed acyclic graph (DAG) across multiple time slices, where each slice contains a set of nodes corresponding to variables at that time point t. Dependencies within a single time slice are captured by intra-slice edges, forming a static Bayesian network structure repeated across slices, while inter-slice edges connect variables from time t to t+1, enforcing the temporal order and ensuring the overall unrolled graph remains acyclic. This structure allows DBNs to represent the joint probability distribution P(X_{1:T}) as a product of conditional probabilities factorized over time, P(X_{1:T}) = \prod_{t=1}^T P(X_t | X_{1:t-1}).^[53] For stationary processes, where the dependency structure does not change over time, DBNs are compactly represented using a two-time-slice Bayesian network (2TBN). The 2TBN specifies the intra-slice edges within the first slice and the inter-slice edges to the second slice, with parameters shared (tied) across all time steps to model the first-order Markov assumption. Transition matrices in the 2TBN encode the conditional probability distributions for inter-slice dependencies, such as P(X_{t+1} | X_t), enabling efficient replication over arbitrary sequence lengths T without explicitly unrolling the full graph. This representation assumes homogeneity in the transition dynamics, making it suitable for modeling persistent or evolving states in time-series data.^[53]^[54] Inference in DBNs adapts exact and approximate techniques from static Bayesian networks to the unrolled temporal structure. Exact inference methods, such as variable elimination or belief propagation, can be applied by treating the unrolled DAG as a large static network, though computational cost grows linearly with sequence length T in the best case. For hidden Markov models (HMMs), which are a special case of DBNs with a single hidden state per slice and no intra-slice edges, the forward-backward algorithm efficiently computes marginal posteriors via a single forward pass for filtering and backward pass for smoothing. More general DBNs often require approximate inference, including sampling techniques like particle filtering or variational methods, to handle the increased complexity from multiple variables per slice.^[53]^[55] In applications like speech recognition, DBNs model the sequential nature of acoustic signals by representing phonetic units and features across time slices, with inter-slice edges capturing transitions between phonemes or states. This structure accommodates overlapping dependencies and durations more flexibly than HMMs, while the time-unfolding resolves potential cyclic influences by directing all feedback forward through time. Seminal work demonstrated DBNs' effectiveness in this domain by integrating them with hidden Markov models for improved recognition accuracy on continuous speech tasks.^[56]

Applications

In Artificial Intelligence and Machine Learning

Bayesian networks play a pivotal role in artificial intelligence and machine learning by enabling probabilistic reasoning under uncertainty, particularly in diagnostic systems that extend traditional expert systems. Early expert systems like MYCIN relied on certainty factors for handling uncertainty in medical diagnosis, but subsequent work reinterpreted these factors probabilistically, paving the way for Bayesian network integration to model dependencies among symptoms, diseases, and treatments more rigorously. This approach allows for efficient belief propagation and evidence incorporation, improving diagnostic accuracy in domains where incomplete or noisy data is common. For instance, in clinical decision support, Bayesian networks facilitate the quantification of disease probabilities based on patient observations, outperforming rule-based methods in capturing causal relationships. In machine learning, Bayesian networks underpin simple yet effective classifiers such as the Naive Bayes model, which assumes conditional independence among features given the class label, forming a star-shaped network structure. This simplicity enables robust performance on high-dimensional data, like text classification, where it often rivals more complex models despite violating independence assumptions in practice. Seminal analysis has shown that under zero-one loss, the Naive Bayes classifier remains optimal even when attributes exhibit moderate dependencies, highlighting its theoretical and empirical strengths. Beyond standalone use, Bayesian networks enhance ensemble methods for explainable AI by providing interpretable structures that visualize feature interactions and uncertainty propagation, allowing users to trace predictions back to probabilistic dependencies rather than opaque black-box outputs. This interpretability is crucial in high-stakes applications, where ensembles combining Bayesian networks with other learners improve both accuracy and trustworthiness.^[57] Bayesian networks also support reinforcement learning in partially observable environments through modeling partially observable Markov decision processes (POMDPs), where belief states represent the agent's uncertainty over hidden world states. In POMDPs, the network's structure encodes transition and observation probabilities, enabling Bayesian updates to the belief distribution after each action and observation, which informs optimal policy selection. This framework is essential for tasks like robotic navigation or game playing under partial observability, as it balances exploration and exploitation via probabilistic planning. Recent advancements (2020–2025) integrate Bayesian networks with deep learning for uncertainty quantification, such as in graph neural networks where graphical models propagate epistemic and aleatoric uncertainties through layers, enhancing reliability in predictions for safety-critical systems like autonomous driving. These hybrids leverage variational inference over network parameters to approximate posteriors, providing calibrated confidence intervals that traditional deterministic deep models lack.

In Other Domains

Bayesian networks have found extensive application in bioinformatics, particularly for inferring gene regulatory networks from gene expression data. These models capture probabilistic dependencies among genes, enabling the reconstruction of regulatory interactions under varying conditions. A seminal approach, introduced by Friedman and colleagues, uses Bayesian networks to analyze multiple expression measurements, revealing co-expression patterns and potential regulatory mechanisms without assuming a specific functional form.^[58] This method has been foundational, influencing subsequent developments that incorporate time-series data and prior biological knowledge to enhance inference accuracy in complex genomic datasets.^[59] In risk assessment, Bayesian networks support decision-making in finance and epidemiology by modeling uncertainties in interconnected variables. For credit scoring in finance, they integrate diverse factors such as borrower history, economic indicators, and latent risks to predict default probabilities, outperforming traditional logistic models in handling nonlinear dependencies.^[60] In epidemiology, these networks model disease spread by representing transmission pathways, population susceptibilities, and intervention effects, allowing for probabilistic forecasts of outbreak dynamics.^[61] Such applications often reference causal interpretations briefly to evaluate policy interventions, like vaccination strategies, though the focus remains on descriptive modeling.^[62] Engineering domains leverage Bayesian networks for fault diagnosis in complex systems, such as aircraft electrical power systems, where they diagnose failures by propagating probabilities across component dependencies. This compositional approach scales to large networks, identifying root causes from sensor data amid uncertainties like intermittent faults.^[63] By updating beliefs with real-time evidence, these models improve reliability and maintenance scheduling in safety-critical environments.^[64] Recent advancements in the 2020s have extended Bayesian networks to climate modeling for scenario prediction, addressing uncertainties in environmental projections. For instance, they disentangle climate variables' effects on ecological outcomes, such as tree mortality under varying scenarios, by integrating stand-level data with global climate forecasts.^[65] In flood risk assessment, dynamic Bayesian networks simulate cascade effects from climate extremes, aiding infrastructure planning in vulnerable regions.^[66] These applications prioritize multi-risk perspectives, combining observational data with expert elicitation for robust predictions of future climate impacts.^[67]

Software and Implementation

Open-Source Tools

Several open-source tools facilitate the construction, inference, and learning of Bayesian networks, enabling researchers and practitioners to implement these models without proprietary software. These tools often support both discrete and continuous data types, structure learning algorithms, parameter estimation, and probabilistic inference, making them accessible for educational and research purposes.^[68] The bnlearn package for R provides comprehensive functionality for Bayesian network structure learning using constraint-based and score-based methods, parameter estimation via maximum likelihood or Bayesian approaches, and inference through algorithms like variable elimination or junction tree sampling. It handles both discrete and continuous variables, with built-in support for data discretization and model validation against independence tests. For instance, bnlearn implements procedures such as the PC algorithm for structure discovery from observational data. The package is actively maintained and available via CRAN, with extensive documentation including vignettes for practical workflows.^[69]^[70] pgmpy is a Python library dedicated to probabilistic graphical models, including Bayesian networks, offering modules for model construction, exact inference (e.g., via belief propagation), approximate inference (e.g., using Markov chain Monte Carlo), structure learning (e.g., K2 or hill-climb search), and visualization of network graphs. It supports discrete, continuous, and hybrid data through conditional probability distributions and integrates with libraries like NetworkX for graph operations. pgmpy also includes tools for causal inference and synthetic data generation, making it suitable for exploratory analysis in machine learning pipelines. The library is hosted on GitHub with Jupyter notebook tutorials for hands-on learning.^[71]^[68] UnBBayes is a Java-based framework and GUI tool for Bayesian networks and other probabilistic models, supporting structure learning, parameter estimation, inference, and decision analysis. It offers plugins for advanced features like dynamic Bayesian networks and is actively maintained, with the latest update in September 2025, making it suitable for both research and educational purposes.^[72] PyBNesian is a Python package primarily focused on learning Bayesian networks, providing implementations for various models including conditional Bayesian networks. It supports structure learning algorithms, parameter estimation, and inference, with an extensible design for custom extensions, and is implemented efficiently in C++ for performance. The package is available on PyPI and GitHub.^[73]^[74] SAMIAM, developed by the Automated Reasoning Group at UCLA, is a Java-based interactive tool for building and analyzing Bayesian networks through a graphical user interface. It allows users to edit network structures, define conditional probabilities, perform exact inference using algorithms like lazy propagation, and simulate scenarios for sensitivity analysis or what-if queries. The tool supports importing/exporting networks in formats like XML and is particularly useful for educational demonstrations of network propagation and evidence updating. While not under active development recently, it remains a lightweight option for standalone BN experimentation.^[75]

Commercial and Specialized Packages

Netica, developed by Norsys Software Corp., is a Windows-based application designed for constructing, analyzing, and deploying Bayesian networks through an intuitive graphical interface that supports node editing, probabilistic relations, and integration with databases or spreadsheets for data-driven learning.^[76]^[77] It includes advanced features such as utility-free sensitivity analysis to evaluate how changes in findings impact target nodes, enabling robust decision support in uncertainty management.^[78] Hugin Expert, from Hugin Expert A/S, specializes in real-time inference for decision support systems, utilizing a high-performance Bayesian network engine to compile and propagate evidence efficiently across complex models.^[79]^[80] This tool has been applied in forensic science, particularly for DNA profile identification and trace evidence evaluation, where it facilitates probabilistic reasoning under activity-level propositions.^[81]^[82] Specialized packages extend Bayesian network capabilities to domain-specific needs. GeNIe, offered by BayesFusion, provides a graphical modeler integrated with the SMILE inference engine, supporting interactive construction and relevance reasoning for diagnostic applications, including medical risk prediction and evidence-based prognosis in health outcomes research.^[83]^[84]^[85] For instance, it has been used to model vaccine-related risks by consolidating clinical data and probabilistic dependencies.^[85] BayesiaLab, developed by Bayesia, emphasizes causal discovery and inference, with tools for unsupervised structure learning from data and simulation of interventions to assess effects.^[86] In marketing, it enables optimization of media mix models by estimating causal impacts on outcomes like sales through Bayesian network parameterization from observational data.^[87] Bayes Server, a commercial platform for Bayesian networks and causal AI, supports building models from data or experts, performing inference, diagnostics, and automated decision-making. It includes features for structure learning, parameter estimation, and integration with enterprise systems, applicable in fields like finance and healthcare.^[88] As of 2025, cloud-based integrations enhance scalability for Bayesian network learning and inference. AWS Batch supports large-scale Bayesian machine learning workflows, allowing distributed computation for parameter estimation and model training on high-dimensional datasets, as demonstrated in audience impression modeling that achieved over 500x speedup through parallel processing.^[89] Amazon SageMaker further facilitates this by incorporating Bayesian optimization in hyperparameter tuning for network-related models, enabling efficient scaling across cloud resources.^[90]

Historical Development

Origins and Early Contributions

The foundational ideas underlying Bayesian networks trace back to Thomas Bayes' 1763 essay, which introduced Bayes' theorem for updating probabilities based on new evidence.^[91] This theorem provided the probabilistic core for reasoning under uncertainty, later central to network inference. Additionally, early graphical models drew inspiration from statistical physics, where undirected graphs like the Ising model (developed in 1920 by Wilhelm Lenz and solved in one dimension by Ernst Ising in 1925) represented interacting variables in systems such as ferromagnetism.^[92] In the 1920s, geneticist Sewall Wright pioneered path analysis as a graphical method to model causal relationships and correlations in biological traits, such as coat color inheritance in guinea pigs. Wright's approach, detailed in his 1921 paper "Correlation and Causation," used directed paths with coefficients to quantify direct and indirect effects, laying groundwork for representing dependencies in acyclic graphs. This technique influenced later probabilistic graphical representations by emphasizing structural diagrams for multivariate analysis in genetics.^[93] During the 1970s, early computational implementations emerged in expert systems, notably Stanford's PROSPECTOR, developed at the Stanford Research Institute for mineral exploration.^[94] PROSPECTOR employed Bayesian updating rules within a rule-based framework to assess site favorability, propagating probabilities through evidential chains and achieving notable success in recommending drilling sites.^[95] This system served as a precursor to full Bayesian networks by demonstrating practical probabilistic reasoning in AI, though limited by computational constraints.^[95] The formalization of Bayesian networks occurred in the 1980s through Judea Pearl's work on plausible inference in intelligent systems.^[3] In his 1988 book Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Pearl defined Bayesian networks as directed acyclic graphs encoding conditional independencies via joint probability factorizations, introducing concepts like d-separation to assess conditional independence.^[3] This framework unified earlier ideas into a computationally tractable model for AI applications under uncertainty.^[3]

Key Milestones and Modern Advances

In the 1990s, significant advances in Bayesian network learning algorithms emerged, enabling the automatic construction of network structures from data. David Heckerman and colleagues introduced a Bayesian approach that combines prior expert knowledge with statistical data to learn network parameters and structures, laying the groundwork for practical inference in uncertain domains.^[96] Concurrently, David Chickering developed search-based methods for structure learning, including algorithms that efficiently explore equivalence classes of networks to identify optimal structures under Bayesian scoring criteria.^[97] Additionally, the BUGS project, initiated in the late 1980s and maturing through the 1990s, provided one of the first software implementations for Bayesian inference using Gibbs sampling, facilitating the application of networks to complex statistical modeling.^[98] The 2000s saw extensions of Bayesian networks into causal inference, enhancing their utility beyond probabilistic reasoning. Judea Pearl's do-calculus, formalized in his 2000 book Causality, provided a graphical framework for identifying causal effects from observational data in Bayesian networks, allowing interventions to be distinguished from mere associations.^[99] Peter Spirtes and colleagues refined the PC algorithm during this period, with high-dimensional extensions enabling constraint-based structure learning in large-scale causal discovery tasks, assuming faithfulness and no latent confounders.^[100] From the 2010s onward, Bayesian networks evolved to handle big data and integrate with other machine learning paradigms. Scalable learning algorithms, such as those proposed by Martinez et al. in 2016, addressed computational challenges in high-volume datasets by leveraging out-of-core processing and parallel scoring, achieving error rates comparable to traditional methods on datasets exceeding millions of instances.^[101] Hybrids with neural networks gained prominence, exemplified by Blundell et al.'s 2015 introduction of variational inference for Bayesian neural networks, which quantifies weight uncertainty to improve generalization and robustness in deep learning tasks.^[102] In the 2020s, innovations focused on emerging computational frontiers and statistical challenges. Quantum Bayesian networks emerged as a tool for optimization in complex systems.^[103] Bayesian methods have been applied to address bias and interpretability in applied statistics, as discussed in Bon et al.'s 2023 overview of opportunities and challenges in modern Bayesian practice.^[104] Privacy-preserving federated learning for Bayesian networks advanced with works like Makhija's 2023 framework, which trains local models across distributed sites without sharing raw data, ensuring differential privacy while maintaining inference accuracy in heterogeneous settings.^[105]

References

[1]
[PDF] Lecture 13: Bayesian networks I
Without further ado, let's define a Bayesian network formally. A Bayesian network defines a large joint distribution in a modular way, one variable at a time. • ...
[2]
6.3 Bayesian Network Representation
A Bayesian network uses a directed acyclic graph and conditional probability tables (CPTs) for each node, encoding conditional independence relations.
[3]
Probabilistic Reasoning in Intelligent Systems - ScienceDirect.com
Probabilistic Reasoning in Intelligent Systems. Networks of Plausible Inference. Book • 1988. Author: Judea Pearl ... PDF version. Ways of reading. No ...
[4]
[PDF] Ba esian Networ s Judea Pearl Cognitive Systems - UCLA
1 INTRODUCTION. This paper surveys the historical development of Bayesian networks, summarizes their seman- tical basis and assesses their properties and ...
[5]
[PDF] The Representational Power of Discrete Bayesian Networks
Definition 1 A Bayesian network, or simply BN, is a directed acyclic graph G =< N,E > and a set P of probability distributions, where N = {A1,ททท,An} is the set ...
[6]
Real-world applications of Bayesian networks - ACM Digital Library
A computer-hased medical dianostic aid that integrates causal and probabilistic knowledge. Ph.D. dissertation. Medical Computer Science Group. Stanford Univ.
[7]
Application of the Bayesian network theory in clinical trial data
We review and apply Bayesian network theory to a clinical case study, presenting an analytical approach to investigating and visualizing causal relationships.
[8]
Bayesian networks for network inference in biology - Journals
May 7, 2025 · Bayesian networks (BNs) have been used for reconstructing interactions from biological data, in disciplines ranging from molecular biology ...
[9]
[PDF] BAYESIAN NETWORKS Judea Pearl Computer Science ...
Figure 1: A Bayesian network representing causal influences among five variables. Each arc indicates a causal influence of the "parent" node on the "child" node ...
[10]
Bayesian Networks - Communications of the ACM
Dec 1, 2010 · This term was coined by Judea Pearl to emphasize three aspects: the often subjective nature of the information used in constructing these ...
[11]
[PDF] Introducing Bayesian Networks
The conditional probability table specifies in order the probability of cancer for each of these cases to be: < 0.05,0.02,0.03,0.001 >. Since these are.
[12]
[PDF] Sanjeev Arora Elad Hazan - cs.Princeton
Nov 8, 2016 · Example: Medical diagnosis. Patient with dyspnoea (shortness of breath), concerned about lung cancer. • Other explanations: bronchitis ...Missing: cough | Show results with:cough
[13]
Local Computations with Probabilities on Graphical Structures and ...
S. L. Lauritzen, D. J. Spiegelhalter ... J. (. 1986b. ) Bayes and Markov networks: A comparison of two graphical representations of probabilistic knowledge .<|separator|>
[14]
Probabilistic reasoning in intelligent systems : networks of plausible ...
May 21, 2012 · Probabilistic reasoning in intelligent systems : networks of plausible inference. by: Pearl, Judea. Publication date: 1988. Topics: Artificial ...
[15]
[PDF] BAYESIAN NETWORKS* Judea Pearl Cognitive Systems ...
Bayesian networks were developed in the late 1970's to model distributed processing in reading comprehension, where both semantical expectations and ...
[16]
[PDF] Modeling Conditional Distributions of Continuous Variables in ...
Methods of modeling the conditional density functions of continuous variables in Bayesian networks ... In a Bayesian network, two types of probability density ...
[17]
[PDF] Identifying independence in Bayesian Networks - FTP Directory Listing
Given a probability distribution P on a set of variables U, a dag D = (U,E) is called a Bayesian network of P iff D is a minimal-edge I-map of P. The three ...
[18]
[PDF] Lecture 3: Probabilistic Independence and Graph Separation
(Local Markov property) In a Bayesian network, a variable X is independent of all its non-descendants given its parents. Proof: Because of Theorem 3.1, it ...
[19]
[PDF] Bayesian Networks: Directed Markov Properties
⇓. PAn(A∪B∪S) obeys the global Markov property on `GAn(A∪B∪S). ´m . Therefore,. A ⊥ B |S in `GAn(A∪B∪S). ´m. ⇒. XA ⊥ XB |XS . (5). The property in (5) is ...
[20]
[PDF] Markov property
Markov property for Bayesian networks. I-map, P-map, and chordal graphs ... global Markov property (G) with respect to G(V,E) if and only if it can be ...
[21]
d-SEPARATION WITHOUT TEARS (At the request of many readers)
d-separation is a criterion for deciding, from a given a causal graph, whether a set X of variables is independent of another set Y, given a third set Z.
[22]
[PDF] STAT 535 Lecture 4 Graphical representations of conditional ...
Here we show that D-separation in a DAG is equivalent to separation in an undirected graph obtained from the DAG and the variables we are interested in. First ...
[23]
[PDF] Bucket Elimination: A Unifying Framework for Probabilistic Inference
Bucket elimination is a unifying algorithmic framework that generalizes dy- namic programming to accommodate algorithms for many complex problem- solving and ...
[24]
[PDF] Fusion, Propagation, and Structuring in Belief Networks*
ABSTRACT. Belief networks are directed acyclic graphs in which the nodes represent propositions (or variables), the arcs signify direct dependencies between ...Missing: 1988 | Show results with:1988
[25]
Local Computations with Probabilities on Graphical Structures and ...
The resulting structure allows efficient algorithms for transfer between representations, providing rapid absorption and propagation of evidence. The scheme is ...
[26]
Gibbs sampling in Bayesian networks - ScienceDirect.com
14. J. Pearl. Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann, San Mateo, CA (1988). 15. J. Pearl, T. Verma. The logic of representing ...
[27]
https://www.sciencedirect.com/science/article/pii/S0895717705005443
[28]
Propagating Uncertainty in Bayesian Networks by Probabilistic ...
Probabilistic logic sampling is a new scheme employing stochastic simulation which can make probabilistic inferences in large, multiply connected networks.
[29]
Importance sampling algorithms for Bayesian networks: Principles ...
Some stochastic sampling algorithms, such as likelihood weighting and importance sampling, try to make use of all the samples by assigning weights for them.
[30]
[PDF] An Introduction to Variational Methods for Graphical Models
This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov ...
[31]
[PDF] Understanding Belief Propagation and its Generalizations
Final Draft January 2002. Page 4. Understanding Belief Propagation and its. Generalizations. Jonathan S. Yedidia. MERL.
[32]
[1605.03392] Learning Bounded Treewidth Bayesian Networks with ...
May 11, 2016 · Bounding the treewidth of a Bayesian greatly reduces the complexity of inferences. Yet, being a global property of the graph, it considerably ...
[33]
Evaluation of Hybrid Quantum Approximate Inference Methods on ...
Dec 7, 2023 · In this paper, we propose a technique to implement hybrid quantum approximate inference methods and subsequently analyze their performance on the quantum ...
[34]
Learning Bayesian Networks: The Combination of Knowledge and ...
Mar 4, 1994 · We describe a Bayesian approach for learning Bayesian networks from a combination of prior knowledge and statistical data.
[35]
[PDF] Bayesian networks: EM algorithm - CS221 Stanford
In summary, we introduced the EM algorithm for estimating the parameters of a Bayesian network when there are unobserved variables. • The principle we ...
[36]
[PDF] 6 : Learning Fully Observed Bayesian Networks 1 Introduction
This section describes the parameter estimation of multivariate Gaussian using MLE and Bayesian estima- tion. 3.2.1 MLE for Multivariate Gaussian. Given a ...Missing: EM | Show results with:EM
[37]
[PDF] Bayesian Parameter Estimation in Bayesian Networks - CEDAR
• Comparison of Bayesian and MLE in ICU example. Page 3. Machine Learning ... • If Dirichlet(1,1) is prior for all parameters θ. Y|x i. • Imaginary sample ...Missing: EM | Show results with:EM
[38]
A survey of Bayesian Network structure learning
Jan 17, 2023 · Node ordering is a topological ordering of the nodes in the DAG such that a node can only have parents which are higher up the ordering than it.
[39]
[PDF] Learning Equivalence Classes of Bayesian-Network Structures
We used a Bayesian scoring criterion for discrete variables in all of our experiments. The criterion, called the BDeu criterion, is derived by Heckerman et al.
[40]
[PDF] A Bayesian Method for the Induction of Probabilistic Networks from ...
The algorithm is named K2 because it evolved from a system named Kutato (Herskovits & Cooper, 1990) that applies the same greedy-search heuristics. As we ...Missing: original | Show results with:original
[41]
[PDF] Strong Faithfulness and Uniform Consistency in Causal Inference
The two commonly adopted assumptions are the Markov and Faithfulness assumptions. The. Markov assumption says that every variable is in- dependent of its non- ...
[42]
[PDF] Equivalence and Synthesis of Causal Models - FTP Directory Listing
Verma and J. Pearl, Identifying Independence in Bayesian. Networks, Networks, vol. 20 no. 5 pp 507-534,. 1990.
[43]
[PDF] Required sample size for learning sparse Bayesian networks ...
This work addresses the problem of learning the probability distribution on a large set of variables by considering the case that a temporal (or ...Missing: iid | Show results with:iid
[44]
Model averaging strategies for structure learning in Bayesian ...
Aug 24, 2012 · Download PDF. Volume 13 ... Heckerman D, Geiger D, Chickering DM: Learning Bayesian networks: the combination of knowledge and data.
[45]
Learning Bayesian networks from incomplete data with the node ...
In this paper, we give general sufficient conditions for the consistency of NAL; and we prove consistency and identifiability for conditional Gaussian BNs.
[46]
Learning Bayesian network parameters via minimax algorithm
As the Dirichlet distribution is a natural conjugate of the multinomial distribution [29], it is the most convenient way to convey the prior on parameters. With ...
[47]
[PDF] Learning the Bayesian Network Structure: Dirichlet Prior versus Data
importantly, the Dirichlet prior is the only distribution that ensures likelihood equivalence [Heckerman et al., 1995], a desirable property in structure ...Missing: conjugacy | Show results with:conjugacy
[48]
[PDF] On Structure Priors for Learning Bayesian Networks
Our results suggest that, in practice, priors that strongly favor sparsity perform significantly better than the uniform prior or even the informed variant that ...
[49]
Are Bayesian networks typically faithful? - arXiv
Jan 20, 2025 · Faithfulness is a ubiquitous assumption in causal inference, often motivated by the fact that the faithful parameters of linear Gaussian and ...
[50]
[PDF] arXiv:2002.00498v1 [stat.ML] 2 Feb 2020
Feb 2, 2020 · In this paper, we revisit the structure learn- ing problem for dynamic Bayesian networks and propose a method that simultaneously.
[51]
Bayesian inference for high-dimensional nonstationary Gaussian ...
Here, we develop methodology for implementing formal Bayesian inference for a general class of nonstationary GPs.
[52]
[PDF] CAUSAL DIAGRAMS FOR EMPIRICAL RESEARCH
... d-separation [Pearl 1988] permits us to read o the DAG the sum total of all independencies implied by a given decomposition. De nition 2.1 (d-separation) ...
[53]
[PDF] Dynamic Bayesian Networks∗ - UBC Computer Science
Nov 12, 2002 · If all arcs are directed, both within and between slices, the model is called a dynamic. Bayesian network (DBN). (The term “dynamic” means we ...
[54]
Dynamic Bayesian network modeling for longitudinal brain ...
We propose a Bayesian data-mining approach to the detection of longitudinal morphological changes in the human brain.Missing: seminal | Show results with:seminal
[55]
[PDF] An Introduction to Hidden Markov Models and Bayesian Networks
We provide a tutorial on learning and inference in hidden Markov models in the context of the recent literature on Bayesian networks.
[56]
[PDF] Speech Recognition with Dynamic Bayesian Networks
Dynamic Bayesian networks (DBNs) are a useful tool for representing complex stochastic processes. Recent developments in inference and learning in DBNs allow.
[57]
On the Optimality of the Simple Bayesian Classifier under Zero-One ...
The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditi.
[58]
Using Bayesian networks to analyze expression data - PubMed
In this paper, we propose a new framework for discovering interactions between genes based on multiple expression measurements.Missing: regulatory | Show results with:regulatory
[59]
Inferring gene regulatory networks from gene expression data by ...
Results: In this work, we present a novel method for inferring GRNs from gene expression data considering the non-linear dependence and topological structure of ...
[60]
Credit risk modeling using Bayesian network with a latent variable
Aug 1, 2019 · In this paper, we used a discrete Bayesian network with a latent variable to model the payment default of loans subscribers.Missing: seminal | Show results with:seminal
[61]
Bayesian inference of spreading processes on networks - Journals
Jul 18, 2018 · Infectious diseases are studied to understand their spreading mechanisms, to evaluate control strategies and to predict the risk and course ...
[62]
Using Bayesian Networks to Model Hierarchical Relationships in ...
Jun 17, 2011 · Abstract. OBJECTIVES: To propose an alternative procedure, based on a Bayesian network (BN), for estimation and prediction, and to discuss ...Missing: spread seminal
[63]
[PDF] Developing Large-Scale Bayesian Networks by Composition
Developing Large-Scale Bayesian Networks by Composition: Fault Diagnosis of. Electrical Power Systems in Aircraft and Spacecraft. Ole J. Mengshoel. Scott Poll.Missing: seminal | Show results with:seminal
[64]
https://journals.sagepub.com/doi/10.1177/0959651819884747
[65]
A Bayesian network model to disentangle the effects of stand and ...
A Bayesian network was used to model tree mortality in response to stand and climate factors, as well as comparing this approach with logistic regression and ...
[66]
Bayesian network modeling of flood cascade and climate risks in the ...
Jul 22, 2025 · This study develops a Bayesian network model to assess flood control infrastructure vulnerability (including flood susceptibility and critical failure nodes) ...
[67]
Using dynamic Bayesian belief networks to infer the effects of ...
In view of this, this paper attempts to use dynamic Bayesian belief networks to reveal the impact of climate change and human activities on ecosystem services ...
[68]
pgmpy: A Python Toolkit for Bayesian Networks
pgmpy is a python package that provides a collection of algorithms and tools to work with BNs and related models. It implements algorithms for structure ...
[69]
bnlearn - Bayesian network structure learning
bnlearn is an R package for learning Bayesian network structure, estimating parameters, and performing probabilistic and causal inference.Examples · Documentation · Bayesian Network Repository · Research Notes
[70]
Learning Bayesian Networks with the bnlearn R Package
Jul 16, 2010 · bnlearn is an R package with algorithms for learning Bayesian network structure, using constraint and score-based methods, and parallel ...<|control11|><|separator|>
[71]
[2304.08639] pgmpy: A Python Toolkit for Bayesian Networks - arXiv
Apr 17, 2023 · pgmpy is a python package that provides a collection of algorithms and tools to work with BNs and related models. It implements algorithms for ...
[72]
SamIam - Sensitivity Analysis, Modeling, Inference and More
SamIam is a comprehensive tool for modeling and reasoning with Bayesian networks, developed in Java by the Automated Reasoning Group of Professor Adnan ...Missing: Stanford | Show results with:Stanford
[73]
Norsys - Netica Application
Provides easy graphical editing of belief networks and influence diagrams, including: cut / paste / duplicate nodes without losing their probabilistic relation ...
[74]
Installation - Norsys Software Corp.
Netica Application requires a PC running any version of Microsoft Windows from XP to Windows10. The 64 bit version of Netica requires Windows 7 or higher.
[75]
Sensitivity Analysis
Netica can do extensive utility-free single-finding sensitivity analysis. Select a node (called the "target node") and choose Network → Sensitivity to Findings ...Missing: feature | Show results with:feature
[76]
Technology | Hugin Expert
Programmed into computers, HUGIN Bayesian network technology automatically compiles and evaluates network information to deduce probability and identify the ...
[77]
BayesAML - Hugin Expert
Efficient inference engine supports real-time risk identification. Models are easy to develop, extend and change. Integration is straight forward. Intuitive ...
[78]
[PDF] Whose DNA? - Hugin Expert
This white paper illustrates how to use HUGIN Bayesian network software for forensic identification problems using DNA profiles*. Bayesian networks are ideal ...
[79]
Use of Bayesian networks in forensic soil casework - ScienceDirect
The software Hugin Expert version 8.6 (www.hugin.com) was used for the construction of the BNs and contains the necessary calculation tools. The steps for ...
[80]
GeNIe Modeler – BayesFusion
GeNIe Modeler is a graphical user interface (GUI) to SMILE Engine and allows for interactive model building and learning.Missing: medical | Show results with:medical
[81]
[PDF] GeNIe Modeler User's Manual - BayesFusion Support
Nov 23, 2024 · A feature that is unique to GeNIe among all Bayesian network software that we are aware of is relevance reasoning. Very often in a decision ...
[82]
Designing an evidence-based Bayesian network for estimating the ...
Our Bayesian network model was built using GeNIe Modeler (BayesFusion 2019), available free of charge for academic research and teaching use from https://www.Missing: applications | Show results with:applications
[83]
BayesiaLab
BayesiaLab, the leading Bayesian network software for knowledge modeling, machine learning, and causal inference.BayesiaLab Conference · 2025 BayesiaLab Conference... · Skip to Content · Bricks
[84]
[PDF] Marketing Mix Optimization - BayesiaLab
Now we have the qualitative part of a causal. Bayesian network. Page 29. 29. “Parameters”. • We can estimate the quantitative part of the network from the ...
[85]
Bayesian ML Models at Scale with AWS Batch | AWS HPC Blog
Jun 14, 2022 · In this blog post, we will provide an overview of how Ampersand built their TV audience impressions (“impressions”) models at scale on AWS.Bayesian Ml Models At Scale... · Modeling Tv Impressions At... · Architecture And Workflow...
[86]
Understand the hyperparameter tuning strategies available in ...
Amazon SageMaker AI hyperparameter tuning uses either a Bayesian or a random search strategy to find the best values for hyperparameters.
[87]
LII. An essay towards solving a problem in the doctrine of chances ...
Bayes Thomas. 1763LII. An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, F. R. S. communicated by Mr. Price, in a ...
[88]
Ising Model - an overview | ScienceDirect Topics
The Ising model is defined as a statistical physics model that describes complex emergent phenomena arising from a large set of simple binary units interacting ...<|control11|><|separator|>
[89]
Wright's path analysis: Causal inference in the early twentieth century
Wright (1920, p. 321) raises the following causal question: what is the comparative importance of the influence of heredity (i.e., genetic constitutions) and ...
[90]
[PDF] Model design in the Prospector consultant system for mineral ...
Duda, Gaschnig, and Hart (1979) is a brief description of PROSPECTOR. See also Duda et al. (1977, 1978). Page 66. iWfel>iW ...
[91]
model design in the prospector consultant system for mineral ...
Application of the PROSPECTOR system to geological exploration problernst · J. Gaschnig. Geology ; The DIPMETER ADVISOR: Interpretation of Geologic Signals.
[92]
Learning Bayesian Networks: The Combination of Knowledge and ...
We describe a Bayesian approach for learning Bayesian networks from a combination of prior knowledge and statistical data.
[93]
Learning Bayesian Networks: Search Methods and Experimental ...
May 7, 2022 · Chickering, D.M., Geiger, D. & Heckerman, D.. (1995). Learning Bayesian Networks: Search Methods and Experimental Results. Pre-proceedings of ...
[94]
The BUGS Project | MRC Biostatistics Unit - University of Cambridge
BUGS is a language and various software packages for Bayesian inference Using Gibbs Sampling, conceived and initially developed at the BSU.
[95]
Causality - Cambridge University Press & Assessment
Written by one of the preeminent researchers in the field, this book provides a comprehensive exposition of modern analysis of causation.
[96]
[PDF] Estimating High-Dimensional Directed Acyclic Graphs with the PC ...
We consider the PC-algorithm (Spirtes et al., 2000) for estimating the skeleton and equivalence class of a very high-dimensional directed acyclic graph (DAG) ...
[97]
[PDF] Scalable Learning of Bayesian Network Classifiers
We show that our highly scalable algorithms achieve comparable or lower error on large data than a range of out-of-core and in-core state-of-the-art.
[98]
[1505.05424] Weight Uncertainty in Neural Networks - arXiv
May 20, 2015 · We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network.
[99]
A Bayesian-network-based quantum procedure for failure risk analysis
May 8, 2023 · In this paper, we propose a procedure for creating a model of the system on a quantum computer using a restricted representation of Bayesian networks.Missing: advances | Show results with:advances
[100]
Being Bayesian in the 2020s: opportunities and challenges in the ...
In this paper, we touch on six modern opportunities and challenges in applied Bayesian statistics: intelligent data collection, new data sources, federated ...
[101]
Privacy Preserving Bayesian Federated Learning in Heterogeneous ...
Jun 13, 2023 · This paper presents a unified FL framework to simultaneously address all these constraints and concerns, based on training customized local Bayesian models.Missing: 2020s | Show results with:2020s