Fact-checked by Grok 2 weeks ago

Conditional dependence

In probability theory, conditional dependence describes a relationship between two or more random variables or events where their statistical dependence persists or emerges even after accounting for the influence of one or more conditioning variables. Specifically, for random variables X and Y given Z, conditional dependence holds if the conditional probability P(X | Y, Z) differs from P(X | Z) for some values where P(Y, Z) > 0, meaning knowledge of Y provides additional information about X beyond what Z alone offers. This contrasts with unconditional dependence, where P(X, Y) ≠ P(X)P(Y), and can manifest in scenarios where variables appear independent marginally but become dependent upon conditioning, or vice versa, as seen in examples like disease indicators (e.g., malaria and bacterial infection) that are independent overall but dependent given the presence of fever.^[1] Conditional dependence plays a central role in probabilistic modeling, particularly in graphical models such as Bayesian networks, where it helps encode complex joint distributions through conditional relationships and d-separation criteria to identify independencies.^[2] In causal inference and machine learning, measuring conditional dependence is essential for tasks like feature selection, where irrelevant variables are screened out given others, and causal discovery, which distinguishes direct effects from spurious correlations. Various metrics have been developed to quantify it, including kernel-based approaches using reproducing kernel Hilbert spaces for non-linear dependencies and simple coefficients based on mutual information for practical computation in high dimensions.^[3] These concepts underpin advancements in artificial intelligence, enabling efficient inference in large-scale systems by exploiting conditional structures to reduce computational complexity.^[4]

Core Concepts

Definition

Conditional dependence refers to a relationship between random variables or events where the probability distribution of one is influenced by the other, even after incorporating information from a conditioning variable or set.^[5] Intuitively, it arises when knowing the outcome of one variable alters the expected behavior of another, despite accounting for the conditioning factor, reflecting a residual association not explained by the conditioner alone.^[6] Formally, two random variables X and Y are conditionally dependent given a third variable Z (with P(Z = z) > 0) if there exist values x, y, z in their supports such that

P(X = x, Y = y \mid Z = z) \neq P(X = x \mid Z = z) \, P(Y = y \mid Z = z).

^[5] This inequality indicates that the joint conditional distribution does not factorize into the product of the marginal conditionals, signifying dependence.^[7] Unlike unconditional (marginal) dependence, which assesses association without conditioning, conditional dependence can emerge or disappear based on the conditioner; notably, X and Y may be unconditionally independent yet conditionally dependent given Z, as in collider bias where Z is a common effect of X and Y, inducing spurious association upon conditioning.^[8] Conversely, unconditional dependence may vanish under certain conditioning, highlighting the context-specific nature of probabilistic relationships.^[5] The concept was first formalized within modern probability theory in the early 20th century, building on Andrei Kolmogorov's axiomatic foundations established in 1933, which provided the rigorous framework for conditional probabilities underlying dependence relations.

Relation to Unconditional Dependence

Unconditional dependence between two random variables X and Y occurs when their joint probability distribution does not factorize into the product of their marginal distributions, that is, when P(X, Y) \neq P(X) P(Y). This contrasts with conditional dependence, which, as defined earlier, evaluates the joint distribution relative to a conditioning variable Z. In essence, unconditional dependence captures marginal associations without additional context, while conditional dependence reveals how these associations may alter given knowledge of Z. Conditioning on Z can induce conditional independence from unconditional dependence, particularly in scenarios involving a common cause. For instance, if Z directly influences both X and Y (as in a directed acyclic graph where arrows point from Z to X and from Z to Y), X and Y exhibit unconditional dependence due to their shared origin, but become conditionally independent given Z, as the influence of the common cause is accounted for.^[5] This structure, known as a common cause or fork, illustrates how conditioning removes spurious associations propagated through Z.^[9] Conversely, conditioning can induce conditional dependence where unconditional independence previously held, a phenomenon exemplified by the V-structure in directed acyclic graphs. In a V-structure, arrows converge on Z from both X and Y (i.e., X \to Z \leftarrow Y), rendering X and Y unconditionally independent since they lack a direct path of influence.^[5] However, conditioning on Z—the common effect—creates a dependence between X and Y, as observing Z provides evidence that selects paths linking the two causes through the collider at Z.^[9] This is the basis for "explaining away," where evidence for one cause (say, X) reduces the likelihood of the alternative cause (Y) given the observed effect Z, thereby inducing negative conditional dependence between the causes. Overall, conditioning on Z can thus create new dependencies, remove existing ones, or even invert the direction of association between X and Y, fundamentally altering the dependence structure depending on the underlying causal relationships.^[5] These dynamics underscore the importance of graphical models like directed acyclic graphs in visualizing how marginal and conditional dependencies interact.^[9]

Formal Framework

Probabilistic Formulation

In probability theory, conditional dependence between two events A and B given a third event C with P(C) > 0 is defined as the failure of the equality P(A \cap B \mid C) \neq P(A \mid C) P(B \mid C), where the conditional probability is given by P(A \mid C) = P(A \cap C)/P(C).^[10] This inequality indicates that the occurrence of A affects the probability of B (or vice versa) even after accounting for C.^[11] For random variables, consider random variables X, Y, and Z defined on a probability space. The joint conditional probability mass or density function encapsulates the probabilistic structure. Specifically, the joint conditional distribution satisfies P(X, Y \mid Z) = P(X \mid Y, Z) P(Y \mid Z), derived from the chain rule for conditional probabilities: starting from the joint distribution P(X, Y, Z) = P(X \mid Y, Z) P(Y, Z) = P(X \mid Y, Z) P(Y \mid Z) P(Z), dividing by P(Z) yields the conditional form, assuming P(Z) > 0.^[12] Conditional dependence holds when this factorization does not imply P(X \mid Y, Z) = P(X \mid Z), i.e., when P(X, Y \mid Z) \neq P(X \mid Z) P(Y \mid Z). Unconditional dependence arises as the special case where Z is a constant event with probability 1.^[10] In the discrete case, for random variables taking values in countable sets, the conditional joint probability mass function is p_{X,Y \mid Z}(x,y \mid z) = p_{X,Y,Z}(x,y,z) / p_Z(z) for p_Z(z) > 0, and the marginal conditionals are p_{X \mid Z}(x \mid z) = \sum_y p_{X,Y \mid Z}(x,y \mid z) and similarly for Y. Dependence occurs if p_{X,Y \mid Z}(x,y \mid z) \neq p_{X \mid Z}(x \mid z) p_{Y \mid Z}(y \mid z) for some x, y, z with p_Z(z) > 0.^[13] For continuous random variables with joint density f_{X,Y,Z}, the conditional joint density is f_{X,Y \mid Z}(x,y \mid z) = f_{X,Y,Z}(x,y,z) / f_Z(z) for f_Z(z) > 0, with marginal conditionals f_{X \mid Z}(x \mid z) = \int f_{X,Y \mid Z}(x,y \mid z) \, dy and analogously for Y. Conditional dependence is present when f_{X,Y \mid Z}(x,y \mid z) \neq f_{X \mid Z}(x \mid z) f_{Y \mid Z}(y \mid z) for some x, y, z with f_Z(z) > 0.^[12] From an axiomatic perspective in measure-theoretic probability, conditional dependence is framed using sigma-algebras. Let (\Omega, \mathcal{F}, P) be a probability space, and let \sigma(X), \sigma(Y), \sigma(Z) be the sigma-algebras generated by measurable functions X, Y, Z: \Omega \to \mathbb{R}, respectively. The random variables X and Y are conditionally dependent given Z if \sigma(X) and \sigma(Y) are not conditionally independent given \sigma(Z), meaning there exist events A \in \sigma(X), B \in \sigma(Y) such that P(A \cap B \mid \sigma(Z)) \neq P(A \mid \sigma(Z)) P(B \mid \sigma(Z)) on a set of positive probability, where conditional probability given a sigma-algebra is defined via the Radon-Nikodym derivative of the restricted measures.^[14] Equivalently, for bounded measurable functions f on the range of X and g on the range of Y, E[f(X) g(Y) \mid \sigma(Z)] \neq E[f(X) \mid \sigma(Z)] E[g(Y) \mid \sigma(Z)] almost surely. This setup ensures the formulation aligns with Kolmogorov's axioms extended to conditional expectations.^[14]

Measure of Conditional Dependence

One prominent measure of conditional dependence is the conditional mutual information, denoted I(X; Y \mid Z), which quantifies the amount of information shared between random variables X and Y after conditioning on Z.^[15] Defined in terms of entropies as I(X; Y \mid Z) = H(X \mid Z) + H(Y \mid Z) - H(X, Y \mid Z), where H(X \mid Z) is the conditional entropy of X given Z measuring the remaining uncertainty in X after observing Z, and similarly for the other terms, this metric captures the expected reduction in uncertainty about one variable from knowing the other, conditional on Z.^[15] It equals zero if and only if X and Y are conditionally independent given Z, providing a symmetric, non-negative measure applicable to both discrete and continuous variables without assuming linearity.^[15] For jointly Gaussian random variables, partial correlation offers a computationally efficient alternative, measuring the correlation between X and Y after removing the linear effects of Z. The partial correlation coefficient is given by

\rho_{XY \cdot Z} = \frac{\rho_{XY} - \rho_{XZ} \rho_{YZ}}{\sqrt{(1 - \rho_{XZ}^2)(1 - \rho_{YZ}^2)}},

where \rho_{XY}, \rho_{XZ}, and \rho_{YZ} are the pairwise Pearson correlation coefficients.^[16] Under Gaussian assumptions, \rho_{XY \cdot Z} = 0 if and only if X and Y are conditionally independent given Z, enabling straightforward hypothesis tests for dependence via its standardized distribution.^[16] For non-linear dependencies, rank-based measures such as conditional Kendall's tau and conditional Spearman's rho extend unconditional rank correlations to the conditional setting. Conditional Kendall's tau assesses the concordance probability between X and Y given Z, providing a robust, distribution-free measure of monotonic dependence that ranges from -1 to 1.^[17] Similarly, conditional Spearman's rho evaluates the correlation of ranks after conditioning, suitable for detecting non-linear associations in non-Gaussian data.^[18] Kernel-based approaches, like the conditional Hilbert-Schmidt Independence Criterion (HSIC), embed variables into reproducing kernel Hilbert spaces to detect arbitrary dependence forms, with the criterion equaling zero under conditional independence and otherwise positive, scaled by kernel choices.^[2] These measures have specific limitations tied to their assumptions and practicality. Partial correlation assumes linearity and Gaussianity, potentially underestimating non-linear dependencies, while requiring inversion of covariance matrices that scales cubically with the dimension of Z.^[16] Conditional mutual information, though versatile, demands entropy estimation, which is computationally intensive for high dimensions and sensitive to sample size in continuous cases.^[15] Rank-based metrics like conditional Kendall's tau and Spearman's rho are robust to outliers but may lack power against weak or non-monotonic relations, and kernel methods such as conditional HSIC suffer from the curse of dimensionality due to kernel matrix computations, often requiring careful hyperparameter tuning.^[17]^[18]^[2]

Properties and Theorems

Basic Properties

Conditional dependence exhibits symmetry: if random variables X and Y are conditionally dependent given Z, then Y and X are also conditionally dependent given Z. This property arises directly from the definitional equivalence p(x \mid y, z) \neq p(x \mid z) if and only if p(y \mid x, z) \neq p(y \mid z).^[11] Measures of conditional dependence, such as conditional mutual information I(X; Y \mid Z), possess non-negativity, satisfying I(X; Y \mid Z) \geq 0, with equality holding if and only if X and Y are conditionally independent given Z. This non-negativity stems from the interpretation of conditional mutual information as a Kullback-Leibler divergence, which is inherently non-negative. Additionally, conditional mutual information is symmetric, as I(X; Y \mid Z) = I(Y; X \mid Z).^[19] Conditional dependence lacks transitivity with respect to unconditional dependence: the presence of dependence between X and Y given Z does not imply dependence between X and Y unconditionally. A sketch of a counterexample involves scenarios where X and Y are marginally independent but become dependent upon conditioning on Z, such as when Z acts as a common effect (collider) of X and Y.^[20] Conditional dependence integrates with marginal distributions through the chain rule of probability, which expresses the joint distribution p(x, y, z) as a product of conditional probabilities, such as p(x, y, z) = p(z) p(x \mid z) p(y \mid x, z). In this factorization, conditional dependence between X and Y given Z manifests in the term p(y \mid x, z) deviating from p(y \mid z), thereby aggregating local dependencies into the overall joint structure while preserving the marginals.^[21]

Key Theorems

The Hammersley-Clifford theorem establishes a foundational link between conditional independence structures in Markov random fields and the factorization of their joint distributions. Specifically, for a finite undirected graph G = (V, E) and random variables X_V = (X_v)_{v \in V} with strictly positive joint probability distribution P(X_V) > 0 that satisfies the local Markov property with respect to G—meaning that each X_v is conditionally independent of X_{V \setminus (N(v) \cup \{v\})} given X_{N(v)}, where N(v) is the set of neighbors of v—the distribution admits a factorization over the maximal cliques \mathcal{C} of G:

P(X_V) = \frac{1}{Z} \prod_{C \in \mathcal{C}} \psi_C(X_C),

where Z is the normalizing constant and each \psi_C is a non-negative potential function defined on the variables in clique C. This implies that the conditional dependence relations encoded by the graph's separation properties are fully captured by interactions within cliques, enabling the representation of complex dependence structures through local potentials in graphical models. A high-level proof outline proceeds by constructing the potentials iteratively from the conditional distributions implied by the Markov property, ensuring the product reproduces the joint via telescoping factorization and normalization, assuming positivity to avoid zero probabilities that could violate the Markov assumptions.^[22] The decomposition property governs how conditional independence over composite sets implies independence over subsets, with direct implications for conditional dependence as its contrapositive. For conditional independence, if X \perp\!\!\!\perp (Y, W) \mid Z, then X \perp\!\!\!\perp Y \mid Z and X \perp\!\!\!\perp W \mid Z. Equivalently, for conditional dependence (the negation), if X \not\perp\!\!\!\perp Y \mid Z or X \not\perp\!\!\!\perp W \mid Z (i.e., X depends on at least one of Y or W given Z), then X \not\perp\!\!\!\perp (Y, W) \mid Z. This property, part of the semi-graphoid axioms, ensures that joint conditional dependence cannot arise without at least one marginal dependence. A proof sketch for the independence direction uses marginalization: integrate the joint conditional density p(x, y, w \mid z) = p(x \mid z) p(y, w \mid z) over w to obtain p(x, y \mid z) = p(x \mid z) p(y \mid z), and similarly for the other subset; the dependence contrapositive follows immediately.^[23] The intersection property further characterizes compositions of conditional independences, again with nuanced implications for dependence. For conditional independence under strictly positive distributions, if X \perp\!\!\!\perp Y \mid Z \cup W and X \perp\!\!\!\perp W \mid Z, then X \perp\!\!\!\perp (Y, W) \mid Z. This axiom completes the graphoid properties, allowing inference of broader independences from restricted ones, but it fails without positivity—e.g., in distributions with zero probabilities, the property may not hold, leading to spurious conditional dependences where none are implied by the graph structure. For conditional dependence, the contrapositive is: if X \not\perp\!\!\!\perp (Y, W) \mid Z, then either X \not\perp\!\!\!\perp Y \mid Z \cup W or X \not\perp\!\!\!\perp W \mid Z, though failure cases arise in non-positive measures where joint dependence does not propagate to both components, complicating graphical representations. A high-level proof sketch relies on the definition: from X \perp\!\!\!\perp Y \mid Z \cup W, p(x \mid y, z, w) = p(x \mid z, w); substituting the second independence p(x \mid z, w) = p(x \mid z) yields p(x \mid y, z, w) = p(x \mid z), with positivity ensuring all conditionals are well-defined via Bayes' rule without division by zero. Information-theoretic variants use mutual information inequalities, where I(X; Y \mid Z \cup W) = 0 and I(X; W \mid Z) = 0 imply I(X; (Y, W) \mid Z) = 0 by chain rule additivity under positivity.^[23]

Examples and Illustrations

Elementary Example

Consider two fair coins flipped independently, resulting in random variables X and Y, where 1 denotes heads and 0 denotes tails, each with P(X=1) = P(Y=1) = 0.5. Define Z = X \oplus Y (the XOR operation), so Z = 0 if the outcomes match (both heads or both tails) and Z = 1 if they differ. This setup simulates a scenario where Z acts as a signal of outcome consistency, analogous to a "fair" (matching, Z=0) or "biased" (mismatching, Z=1) indication. Marginally, X and Y are independent, as their joint distribution factors: P(X,Y) = P(X)P(Y), with each of the four outcomes (X,Y) = (0,0), (0,1), (1,0), (1,1) having probability 0.25. Consequently, P(X=1,Y=1) = 0.25 = P(X=1)P(Y=1). Also, P(Z=0) = P(Z=1) = 0.5. However, conditioning on Z=0 induces dependence between X and Y. The conditional joint probabilities are P(X=0,Y=0 \mid Z=0) = 0.5, P(X=1,Y=1 \mid Z=0) = 0.5, and P(X=0,Y=1 \mid Z=0) = P(X=1,Y=0 \mid Z=0) = 0. The marginals are P(X=0 \mid Z=0) = P(X=1 \mid Z=0) = 0.5 and similarly for Y. Thus, P(X=1,Y=1 \mid Z=0) = 0.5 \neq 0.25 = P(X=1 \mid Z=0) P(Y=1 \mid Z=0), demonstrating conditional dependence. A similar inequality holds for Z=1. The full joint probability distribution over X, Y, Z is given in the following table:

X	Y	Z	P(X,Y,Z)
0	0	0	0.25
0	1	1	0.25
1	0	1	0.25
1	1	0	0.25

This table highlights the deterministic link Z = X \oplus Y, with each row equally likely. For visualization given Z=0, the contingency table for X and Y shows the dependence clearly:

	Y=0	Y=1
X=0	0.5	0
X=1	0	0.5

In contrast, if X and Y were independent given Z=0, the table would show 0.25 in each cell (based on the marginals). A bar chart comparing the joint P(X=1,Y=1 \mid Z=0) = 0.5 to the product $0.25 would emphasize the deviation, illustrating how the signal Z=0 (matching outcomes) forces X and Y to align. In this example, Z represents a common effect of X and Y, and conditioning on it induces dependence, even though X and Y are independent marginally; this simulates a confounding "common cause" scenario in reverse, where the signal Z explains the apparent correlation by revealing the shared outcome structure.

Advanced Example in Graphical Models

In graphical models, particularly Bayesian networks, conditional dependence is vividly illustrated through structures like the V-structure, also known as a collider, where two variables X and Y both point to a common child Z, forming the directed acyclic graph (DAG) X \to Z \leftarrow Y.^[24] In this configuration, X and Y are unconditionally independent, meaning P(X, Y) = P(X)P(Y), as there is no direct path connecting them without the collider. However, conditioning on Z induces dependence between X and Y, such that P(X \mid Z, Y) \neq P(X \mid Z), because observing Z provides evidence about the common cause through the converging arrows.^[25] This phenomenon is formalized by the d-separation criterion in Bayesian networks, which determines conditional independence by analyzing paths in the DAG. In a V-structure, the path from X to Y through Z is blocked (d-separated) when Z is not observed, preserving unconditional independence. Conditioning on Z or any of its descendants opens the path, activating the collider and rendering X and Y conditionally dependent, as information flows bidirectionally through the observed node.^[24] This criterion ensures that the graph structure compactly encodes the full set of conditional independencies in the joint distribution, enabling efficient probabilistic inference.^[26] A numerical illustration of this emerges in the classic burglar-alarm domain, modeled as a Bayesian network with nodes for Burglary (B), Earthquake (E), and Alarm (A), forming the V-structure B \to A \leftarrow E. The parameters are: P(B = \true) = 0.001, P(E = \true) = 0.002, P(A = \true \mid B = \true, E = \true) = 0.95, P(A = \true \mid B = \true, E = \false) = 0.94, P(A = \true \mid B = \false, E = \true) = 0.29, and P(A = \true \mid B = \false, E = \false) = 0.001.^[27] Unconditionally, B and E are independent: P(B = \true, E = \true) = 0.001 \times 0.002 = 0.000002. However, conditioning on A = \true yields P(B = \true \mid A = \true) \approx 0.374, while P(B = \true \mid A = \true, E = \true) \approx 0.0033 (via Bayes' rule, as E = \true explains away A, reducing belief in B), demonstrating the induced dependence where observing the earthquake alters the probability of burglary given the alarm. Extending to undirected graphical models, the moralization process converts a Bayesian network DAG into an undirected moral graph by adding edges between all co-parents (e.g., connecting X and Y in the V-structure) and dropping arrow directions, thereby capturing conditional dependencies through graph separation: variables are conditionally independent given a set if separated in the moral graph.^[25]

Applications

In Statistics and Hypothesis Testing

In statistical hypothesis testing, conditional dependence is often assessed through tests of conditional independence, which evaluate whether two variables are independent given a third conditioning variable. These tests are crucial for identifying direct associations in multivariate data, particularly in categorical settings. For categorical data, the log-likelihood ratio test is commonly employed to test conditional independence by comparing the likelihood of the observed data under a model of independence given the conditioner against the saturated model.^[28] This approach leverages log-linear models, where the test statistic follows a chi-squared distribution under the null hypothesis of conditional independence, enabling p-value computation for significance.^[29] For contingency tables stratified by a conditioning variable Z, the chi-squared test for conditional independence involves summing partial chi-squared statistics across the levels of Z to obtain an overall test statistic. This method detects deviations from independence within each stratum, aggregating evidence against the null hypothesis that the row and column variables are independent given Z.^[30] The resulting statistic is asymptotically chi-squared distributed, providing a robust framework for three-way tables in observational data analysis.^[31] In spatial or network data, where autocorrelation complicates standard tests, the partial Mantel test measures conditional correlation by partialling out the effect of a third matrix, such as spatial distances, on the correlation between two distance matrices. This permutation-based test quantifies the association between variables while controlling for spatial structure, making it suitable for landscape genetics and ecological networks.^[32] It extends the classical Mantel test to conditional settings, with significance determined via randomized permutations to account for non-independence.^[33] Ignoring conditional dependence in observational studies can lead to confounding, where a third variable induces spurious associations between an exposure and outcome by violating conditional independence. For instance, failure to condition on a confounder distorts the marginal association, mistaking it for a causal link, as seen in epidemiological analyses of risk factors.^[34] Measures like partial correlation briefly address this by estimating the correlation between two variables after removing the linear effect of the conditioner, though they assume normality and linearity.^[35] Common software implementations facilitate these tests, with the R package CondIndTests providing nonlinear conditional independence tests, including kernel-based methods for general data.^[36] In Python, the pgmpy library supports conditional independence testing via chi-squared and G-squared statistics within structure learning algorithms.^[37] Updates in the 2020s, such as deep learning approaches for high-dimensional settings (as of 2023), enhance these tools for large-scale data, improving power in complex scenarios like time series.^[38] More recent advances as of 2024-2025 include conditional diffusion models and transport map-based tests, which offer improved performance in generative and high-dimensional contexts.^[39]^[40]

In Machine Learning and Causal Inference

In machine learning, conditional dependence plays a crucial role in feature selection by enabling the identification of non-redundant features that provide unique information about the target variable given the presence of other features. One prominent approach uses conditional mutual information (CMI), which measures the mutual information between a candidate feature and the target conditioned on previously selected features, thereby minimizing redundancy. For instance, the Conditional Mutual Information Maximization (CMIM) algorithm selects features by greedily choosing those that maximize CMI with the target while minimizing overlap with the current set, demonstrating superior performance in text categorization tasks compared to mutual information alone. This method is particularly effective in high-dimensional settings, such as wrapper-based filters, where it reduces computational overhead while maintaining predictive accuracy. In causal inference, conditional dependence underpins constraint-based algorithms for discovering causal structures from observational data. The Peter-Clark (PC) algorithm, a seminal method, starts with a complete undirected graph and iteratively removes edges based on conditional independence tests between variable pairs given subsets of other variables, ultimately orienting edges to form a directed acyclic graph (DAG) consistent with the data. Under assumptions like causal faithfulness and Markov condition, the PC algorithm recovers the skeleton of the causal graph with high probability as sample size increases, making it foundational for inferring causal directed acyclic graphs (DAGs) in domains like epidemiology and economics. Its efficiency stems from conditioning on increasingly larger sets only when necessary, avoiding exhaustive testing.^[41] Bayesian networks leverage conditional dependence to represent joint probability distributions compactly through graphical structures, where nodes denote random variables and directed edges capture direct dependencies. The network's structure encodes conditional independencies via d-separation, allowing the joint distribution to factorize as the product of each variable's conditional probability given its parents: P(X_1, \dots, X_n) = \prod_{i=1}^n P(X_i \mid \mathrm{Pa}(X_i)), which exploits these independencies to reduce the number of parameters needed for representation. This factorization enables efficient inference, such as belief propagation or message passing algorithms, which propagate evidence through the graph to compute marginal or conditional probabilities in polynomial time for tree-structured networks and approximately for loopy ones. In practice, this has facilitated scalable probabilistic modeling in applications like medical diagnosis and fault detection.^[24] Recent advances in causal discovery have integrated conditional dependence into continuous optimization frameworks to address the combinatorial challenges of traditional methods. The NOTEARS algorithm reformulates DAG structure learning as a constrained optimization problem, minimizing a score function (e.g., least squares) subject to an acyclicity constraint enforced via a continuous penalty on the weighted adjacency matrix, thereby incorporating conditional dependencies implicitly through the fitted linear model. Post-2020 extensions, such as nonparametric variants like NTS-NOTEARS, extend this to nonlinear relationships by estimating conditional independencies via kernel methods or neural networks, improving scalability for high-dimensional data in fields like genomics.^[42] These developments enable end-to-end differentiable learning of causal structures, outperforming discrete search methods in terms of speed and accuracy on synthetic benchmarks.^[43] As of 2024-2025, further progress includes large language model-assisted causal discovery and efficient ensemble conditional independence tests, enhancing robustness in complex, high-dimensional settings.^[44]^[45]

References

[1]
[PDF] Independence
Apr 12, 2017 · That's the math definition of conditional dependence. Test shows. Malaria. Page 84. Conditional Dependence. Malaria (M). Bacteria (B). Fever (F).
[2]
[PDF] On Distance and Kernel Measures of Conditional Dependence
Measuring conditional dependence is one of the important tasks in statistical inference and is fundamental in causal discovery, feature selection, ...
[3]
A simple measure of conditional dependence - Project Euclid
Dec 14, 2021 · Abstract. We propose a coefficient of conditional dependence between two random variables Y and Z given a set of other variables ...
[4]
[PDF] CS109: Conditional Independence and Random Variables
Dependent events can become conditionally independent. And vice versa: Independent events can become conditionally dependent. Netflix and Condition. 12.
[5]
[PDF] CONDITIONAL INDEPENDENCE AND ITS REPRESENTATIONS*
A powerful formalism for informational relevance is provided by probability theory, where the notion of relevance is identified with dependence or, more ...
[6]
Conditional Independence - Probability Course
In particular, Definition. Two events A and B are conditionally independent given an event C with P(C)>0 if P(A∩B|C)=P(A|C)P(B|C)(1.8) Recall that from the ...
[7]
Chapter 2 Conditional Independence | Graphical Models
Chapter 2 Conditional Independence. The primary tool we will use to provide statistical and computationally feasible models is conditional independence.
[8]
[PDF] The Magnitude and Direction of Collider Bias for Binary Variables
Jan 14, 2019 · We define collider bias generally as the departure of the conditional association of X and Y from their marginal association. This departure ...
[9]
[PDF] 2 Directed Graphical Models - MIT OpenCourseWare
We now introduce the idea of graph separation: testing conditional independence properties by looking for particular kinds of paths in a graph. From Examples 1.
[10]
[PDF] Conditional Probability Theory - Michael Betancourt
Conditional probability theory provides a rigorous way to decompose probability distributions over a space 𝑋 into a collection of probability distributions ...
[11]
[PDF] Conditional Independence
Sep 4, 2024 · Definition: Random variables X and Y are conditionally independent given Z, if p(x|y, z) = p(x|z) for all x, y, z ∈ R such that p(y, z) > 0.
[12]
[PDF] 6. Conditional Probability and Expectation
We introduce measure-theoretic definitions of conditional probability and conditional expectations. 1 Conditional Expectation. The measure-theoretic definition ...Missing: algebra | Show results with:algebra
[13]
[PDF] Conditional probability and independence - Purdue Math
What is the probability that a randomly chosen flashlight will give more than 100 hours of use? 2. Given that a flashlight lasted over 100 hours, what is the.Missing: probabilistic textbook
[14]
[PDF] MATHEMATICAL PROBABILITY THEORY IN A NUTSHELL 2 Contents
3.7 Conditional independence with respect to a sigma-algebra. Two events A and B are conditionally independent given a sigma-algebra A if. IP{A ∩ B|A} = IP{A ...
[15]
Elements of Information Theory, 2nd Edition | Wiley
Now current and enhanced, the Second Edition of Elements of Information Theory remains the ideal textbook for upper-level undergraduate and graduate courses in ...
[16]
Gaussian and Mixed Graphical Models as (multi-)omics data ...
Here, we introduce the partial correlation coefficient ρXY ·Z, which measures the association between two random variables X and Y, controlling for a set of ...
[17]
On Kendall's regression - ScienceDirect.com
Conditional Kendall's tau is a measure of dependence between two random variables, conditionally on some covariates.
[18]
Multivariate conditional versions of Spearman's rho and related ...
A new family of conditional-dependence measures based on Spearman's rho is introduced. The corresponding multidimensional versions are established.
[19]
[PDF] Entropy, Relative Entropy and Mutual Information - CS@Columbia
p(XiIXi-1,. We now define the conditional mutual information as the reduction in the uncertainty of X due to knowledge of Y when 2 is given. = qdx, y, 2) log ...
[20]
Conditional Probability | Formulas | Calculation | Chain Rule
If A and B are two events in a sample space S, then the conditional probability of A given B is defined as P(A|B)= P(A∩B) P(B) , when P(B)>0.
[21]
[PDF] Graphoids - FTP Directory Listing
Thus we see that the major graphical properties found in probabilistic independencies are consequences of the intersection and union properties, (11.c) and (11.
[22]
[PDF] BAYESIAN NETWORKS* Judea Pearl Cognitive Systems ...
The strength of a dependency is represented by conditional probabilities that are attached to each cluster of parents-child nodes in the network. Figure 1 ...
[23]
[PDF] BAYESIAN NETWORKS Judea Pearl Computer Science ...
J. Pearl and S. Russell, Bayesian Networks. 2. INTRODUCTION. Probabilistic models based on directed acyclic graphs have a long and rich tradition,.
[24]
[PDF] Introduction to Bayesian Networks - mimuw
The models that were later to be called Bayesian networks were introduced into artificial intelligence by. J. Pearl in (1982) [103], a seminal article in ...
[25]
[PDF] Bayesian networks
Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls ... For burglary net, 1+1+4+2+2=10 numbers (vs. 25 − 1 = 31). Chapter 14.1 ...
[26]
[PDF] Log-linear Models
If you add likelihood ratio chi-squares, you get the standard test of conditional independence for loglinear models. But adding Pearson chi-squares is valid too ...
[27]
Constraint-based causal discovery with mixed data - PMC - NIH
We consider conditional independence tests based on nested likelihood-ratio tests, using linear, logistic, multinomial, and ordinal regression to handle ...
[28]
5.3.4 - Conditional Independence | STAT 504
There are three possible conditional independence models with three random variables: ( X Y ; which means that Y ; P ( Y = j , Z = k | X = i ) = P ( Y = j | X = i ) ...
[29]
[PDF] Three-Way Tables - College of Education | Illinois
Conditional Independence means that θXY (k) = 1 for all k = 1,...,K. Conditional Dependence means that θXY (k) 6= 1 for at least one k = 1,...,K. Marginal ...
[30]
[PDF] SPATIAL AUTOCORRELATION AND AUTOREGRESSIVE MODELS ...
The correlation between species and environment, while controlling for space, is cal- culated by a partial Mantel test (Manly 1986, Legendre ... residuals ( ...Missing: network | Show results with:network
[31]
Using simulations to evaluate Mantel‐based methods for assessing ...
May 21, 2016 · Mantel‐based tests have been the primary analytical methods for understanding how landscape features influence observed spatial genetic structure.
[32]
Causal inference with observational data: the need for triangulation ...
Confounding and reverse causality. A confounder is a third variable (C) that influences both the exposure (X) and the outcome (Y), causing a spurious ...
[33]
[PDF] Causal Inference
Dec 24, 2018 · risk ratio does not equal the causal risk ratio; association is not causation. Examples of confounding abound in observational research.
[34]
CRAN: Package CondIndTests
Nov 12, 2019 · CondIndTests: Nonlinear Conditional Independence Tests. Code for a variety of nonlinear conditional independence tests.
[35]
PC (Constraint-Based Estimator) — 1.0.0 | pgmpy docs
The PC algorithm is a constraint-based method for estimating DAGs using statistical independence tests to identify dependencies.
[36]
[PDF] Testing for the Markov Property in Time Series via Deep Conditional ...
May 30, 2023 · In this article, we propose a nonparametric test for the. Markov property in high-dimensional time series via deep conditional generative learn-.<|control11|><|separator|>
[37]
[PDF] An Algorithm for Fast Recovery of Sparse Causal Graphs
Aug 1, 1990 · An Algorithm for Fast Recovery of. Sparse Causal Graphs by. Peter Spirtes, Clark Glymour. August 1990. Report CMU-PHIL-15. Carnegie. Melldn.Missing: paper | Show results with:paper<|separator|>
[38]
[PDF] DAGs with NO TEARS: Continuous Optimization for Structure Learning
In this paper, we introduce a fundamentally different strategy: we formulate the structure learning problem as a purely continuous optimization problem over ...