Causal Markov condition
The Causal Markov Condition (CMC), also known as the Markov assumption in causal modeling, is a core principle in causal inference stating that, for a set of variables \mathbf{V} whose causal structure is represented by a directed acyclic graph (DAG) \mathbf{G}, each variable X_i \in \mathbf{V} is conditionally independent of all its non-descendants given its direct causes, or parents \mathbf{PA}(X_i) in \mathbf{G}.[1] Formally, this is expressed as X_i \perp\!\!\!\perp \mathrm{NonDesc}(X_i) \mid \mathbf{PA}(X_i), where \mathrm{NonDesc}(X_i) denotes the non-descendants of X_i.[2] This condition implies that the joint probability distribution over \mathbf{V} factorizes according to the DAG structure: P(\mathbf{v}) = \prod_i P(v_i \mid \mathbf{pa}_i), linking graphical causal representations directly to observable probabilistic dependencies.[2] Developed within the framework of structural causal models (SCMs) by Judea Pearl, the CMC serves as a foundational axiom that enables the translation of causal diagrams into testable conditional independence statements, facilitating inference about interventions and counterfactuals from data.[3] It assumes acyclicity in the causal graph to avoid feedback loops, mutual independence of exogenous error terms (disturbances) driving each variable, and often causal sufficiency (no unmeasured confounders among observed variables).[1] Violations of these assumptions, such as latent common causes, require extensions like latent variable models or directed acyclic mixed graphs (DAMGs).[1] The CMC underpins key methodologies in causal discovery and estimation, including d-separation criteria for reading off conditional independencies from the graph and the do-calculus for identifying causal effects via adjustment sets (e.g., back-door criterion).[2] For instance, it ensures that blocking paths through parents screens off spurious associations, allowing unbiased estimation of direct effects.[3] When paired with the faithfulness assumption—that all independencies implied by the graph are present in the data and vice versa—the CMC supports algorithms like PC (Peter-Clark) for learning causal structures from observational data.[4] However, its empirical validity depends on the absence of unmodeled influences, making it a subject of philosophical debate regarding determinism, invariance, and the completeness of causal models.[5]Prerequisites
Conditional Independence
Conditional independence is a key concept in probability theory that describes a relationship between random variables where the dependence between two variables is removed or "screened off" by a third. For disjoint sets of random variables X, Y, and Z, X is conditionally independent of Y given Z, denoted X \perp Y \mid Z, if P(X \mid Y, Z) = P(X \mid Z) for all values where P(Y, Z) > 0. Equivalently, this holds if P(X, Y \mid Z) = P(X \mid Z) P(Y \mid Z).[6][7] This notion generalizes unconditional independence, which is the special case where Z provides no additional information (e.g., Z is empty or constant), reducing to P(X, Y) = P(X) P(Y). Conditional independence allows for more nuanced modeling of dependencies in multivariate distributions, enabling the joint probability P(X, Y, Z) to factor as P(Z) P(X \mid Z) P(Y \mid Z), which significantly simplifies computations and storage for high-dimensional data by breaking down complex joints into conditional factors.[8][9] A simple example illustrates this: suppose a box contains one fair coin and one two-headed coin, and one is selected at random and flipped twice. Let A be the event of heads on the first flip, B heads on the second, and C the selection of the fair coin. Unconditionally, P(A \cap B) = 5/8 \neq (3/4)(3/4) = P(A) P(B), so A and B are dependent. However, given C, P(A \mid B, C) = 1/2 = P(A \mid C), confirming A \perp B \mid C since the flips of the fair coin are independent. For dice, consider rolling two fair dice where X is the outcome of the first, Y the second, and Z their sum; unconditionally X and Y are independent, but conditioning on Z (e.g., sum=7) introduces dependence.[6] The structure of conditional independence is governed by the semi-graphoid axioms, a minimal set of properties satisfied by probabilistic conditional independence that allow inference of new independencies from known ones. These include:- Symmetry: X \perp Y \mid Z \iff Y \perp X \mid Z
- Decomposition: If X \perp (Y \cup W) \mid Z, then X \perp Y \mid Z and X \perp W \mid Z
- Weak union: If X \perp (Y \cup W) \mid Z, then X \perp Y \mid (Z \cup W)
- Contraction: If X \perp Y \mid (Z \cup W) and X \perp W \mid Z, then X \perp (Y \cup W) \mid Z
Directed Acyclic Graphs
In causal inference, a directed acyclic graph (DAG) serves as the foundational structure for modeling causal relationships among variables. It consists of nodes representing random variables and directed edges indicating direct causal influences from one variable to another, with the critical property of acyclicity ensuring no directed cycles exist in the graph. This acyclicity prevents circular causation or feedback loops, allowing for a clear temporal or hierarchical ordering of influences. Key components of a causal DAG include the notions of parents, children, ancestors, descendants, and non-descendants relative to each node. Parents of a node are those variables with direct outgoing edges to it, signifying immediate causal effects. Children are the nodes receiving direct incoming edges from it, representing variables directly affected. Ancestors encompass all nodes from which there is a directed path to the given node, while descendants include all nodes reachable by a directed path from it; non-descendants are the remaining variables that are neither ancestors nor descendants. Constructing a causal DAG involves specifying edges only for direct causal mechanisms while adhering to rules that maintain acyclicity. Edges are drawn based on domain knowledge of causal influences, omitting them where no direct effect is hypothesized, which implicitly encodes assumptions of conditional independence. Cycles must be avoided to eliminate feedback loops that would complicate causal interpretation, often achieved through topological ordering where variables are arranged such that causes precede effects. Basic graphical criteria in DAGs include paths, chains, forks, and colliders, which describe the structural arrangements of edges. A path is any sequence of edges connecting nodes, regardless of direction. Chains represent serial connections, such as X \to Z \to Y, indicating mediation. Forks depict diverging influences from a common cause, like X \leftarrow Z \to Y, while colliders show converging effects on a common effect, as in X \to Z \leftarrow Y. These structures provide the syntactic building blocks for analyzing causal pathways without implying probabilistic dependencies.Definition
Local Causal Markov Condition
The local causal Markov condition is a fundamental assumption in causal inference that links the structure of a causal directed acyclic graph (DAG) to the conditional independencies in the associated probability distribution. Formally, in a causal DAG G over variables V, for every variable X \in V, X is conditionally independent of its non-descendants given its parents: X \perp \mathrm{NonDesc}(X) \mid \mathrm{Pa}(X), where \mathrm{Pa}(X) denotes the set of direct causes (parents) of X in G, i.e., variables with incoming edges to X. This condition posits that the parents of X fully screen off X from all other variables that do not depend on X. The term "non-descendants" refers to the set of variables in V \setminus \{X\} that are not reachable from X via any directed path in G; this set includes ancestors of X (variables from which X is reachable) and any unrelated variables, but excludes descendants of X (variables influenced by X). In other words, non-descendants encompass all variables whose probabilistic relationships with X are mediated solely through the causal structure upstream or parallel to X, without downstream effects from X. This local property emerges from the assumptions underlying structural causal models: direct causation is represented by edges in the DAG, such that each variable X depends functionally only on its parents and an independent exogenous noise term U_X, with no unobserved confounding between mechanisms. Specifically, the model specifies X = f_X(\mathrm{Pa}(X), U_X), where the U's are mutually independent across variables. Given the parents, the noise U_X renders X independent of other non-descendants, as any common influences are captured by the parents, and independence of the noises ensures no residual dependencies. Under the local causal Markov condition, the joint probability distribution P(V) over the variables factorizes recursively according to the DAG structure: P(V) = \prod_{X \in V} P(X \mid \mathrm{Pa}(X)). This factorization follows directly from the chain rule of probability and the conditional independencies imposed by the condition, enabling the representation of complex causal systems through local mechanisms.Global Causal Markov Condition
The global causal Markov condition provides a comprehensive characterization of conditional independencies in a causal directed acyclic graph (DAG). For a causal DAG G over a set of variables V, and any disjoint subsets A, B, C \subseteq V, the condition states that if there is no active path between A and B when conditioned on C in G (i.e., A and B are d-separated given C), then A is conditionally independent of B given C in the joint distribution P(V), denoted as:A \perp_d B \mid C \; [G] \implies A \perp B \mid C.
This formulation applies to arbitrary disjoint sets of nodes, in contrast to the node-specific focus of the local causal Markov condition, thereby capturing the full set of independencies encoded by the graph's structure.[10][11] The d-separation criterion serves as the graphical test for these independencies, determining whether all paths between A and B are blocked by C; such blocking prevents informational flow and implies the corresponding probabilistic independence under the condition.[10] For DAGs, the global causal Markov condition is equivalent to the local causal Markov condition, which serves as its foundational building block. The proof of this equivalence proceeds by showing that the local condition implies a recursive factorization of the joint distribution:
P(V) = \prod_{i \in V} P(i \mid \pa_G(i)),
where \pa_G(i) denotes the parents of node i in G. This factorization encodes all conditional independencies via the graph's topology, directly yielding the global set-based independencies through the properties of d-separation.[10][12]