Conditional probability distribution
In probability theory and statistics, a conditional probability distribution is the probability distribution of a random variable given the realized value of one or more other random variables, quantifying the likelihood of various outcomes under specified conditions.[1] It arises from the rules of conditional probability and is fundamental for modeling dependencies between variables in bivariate or multivariate settings.[2] For discrete random variables X and Y, the conditional probability mass function (PMF) of X given Y = y is defined as P_{X|Y}(x|y) = \frac{P_{X,Y}(x,y)}{P_Y(y)} for P_Y(y) > 0, where P_{X,Y}(x,y) is the joint PMF and P_Y(y) is the marginal PMF of Y.[3] This conditional PMF satisfies the properties of a valid PMF: it is non-negative and sums to 1 over all possible x.[2] Similarly, for continuous random variables, the conditional probability density function (PDF) of X given Y = y is f_{X|Y}(x|y) = \frac{f_{X,Y}(x,y)}{f_Y(y)} for f_Y(y) > 0, where f_{X,Y}(x,y) is the joint PDF and f_Y(y) is the marginal PDF of Y; this conditional PDF integrates to 1 over the real line.[1] The joint distribution can be recovered as the product of the conditional and marginal: P_{X,Y}(x,y) = P_{X|Y}(x|y) \cdot P_Y(y) for discrete cases, and analogously for continuous.[3] Key properties include non-symmetry—P_{X|Y} \neq P_{Y|X} in general—and the special case of independence, where the random variables are independent if and only if the conditional distribution equals the unconditional (marginal) distribution, i.e., P_{X|Y}(x|y) = P_X(x) for all x, y.[2] Conditional distributions enable computations of expectations, variances, and other moments restricted to subpopulations, such as the conditional mean E[X|Y=y], which plays a central role in regression and Bayesian inference.[1] They are widely applied in fields like statistics, machine learning, and decision theory to update beliefs based on new evidence.[3]Fundamentals
Definition
In probability theory, a conditional probability distribution describes the probability distribution of a random variable conditional on the value of another random variable or the occurrence of a specific event, effectively restricting the probability measure to that conditioning information. This concept extends the basic notion of conditional probability—defined for events—to the entire distribution of a random variable, allowing for the assessment of probabilities across its possible outcomes given partial knowledge about related variables.[4] Formally, consider random variables X and Y defined on a probability space (\Omega, \mathcal{F}, P). The conditional distribution of X given Y = y is given by the family of conditional probabilities P(X \in A \mid Y = y) for all measurable sets A \subseteq \mathbb{R}, where y lies in the support of Y. This defines a probability measure on the space of X, normalized such that the total probability sums to 1 over the possible values or integrates to 1 over the range of X.[5] Unlike the unconditional or marginal distribution of X, which captures the overall probabilities without additional constraints, the conditional distribution incorporates the observed value y to update and refine these probabilities, often leading to a narrower or shifted spread depending on the dependence between X and Y. This updating process is fundamental to inference and modeling in probabilistic systems.[2] The foundational elements include probability spaces, comprising a sample space \Omega of possible outcomes, an event algebra \mathcal{F}, and a probability measure P: \mathcal{F} \to [0,1] satisfying Kolmogorov's axioms, along with random variables as measurable functions from \Omega to \mathbb{R}. These prerequisites enable the rigorous construction of conditional distributions.[6]Notation and Interpretation
The standard notation for a conditional probability distribution distinguishes between discrete and continuous random variables. For discrete random variables X and Y, the conditional probability mass function is denoted as P_{X|Y}(x|y) = P(X = x \mid Y = y), which gives the probability that X takes the value x given that Y takes the value y.[3] For continuous random variables, the conditional probability density function is denoted as f_{X|Y}(x|y) = f_{X,Y}(x,y) / f_Y(y), representing the density of X at x given Y = y.[1] In a more general sense, P(X \mid Y) refers to the entire conditional distribution of X given Y, which is a random probability measure that depends on the value of Y.[7] The vertical bar | is the conventional symbol used to denote conditioning in these notations, separating the conditioned variable from the conditioning one, as in P(X \mid Y = y).[1] Intuitively, a conditional probability distribution can be interpreted as updating prior beliefs about a random variable based on new information from the conditioning variable; for instance, in weather forecasting, the distribution of rainfall amounts given a measured temperature refines predictions by restricting possibilities to scenarios consistent with that temperature.[8][9] Conditioning can apply to events or to random variables, leading to distinct notational forms. When conditioning on an event A with positive probability, the notation P(X \mid A) describes the distribution of X restricted to outcomes in A, often using indicator functions to formalize the event.[1] In contrast, P(X \mid Y) conditions on the random variable Y, yielding a family of distributions parameterized by the possible values of Y, which is essential for handling continuous cases where individual values have zero probability.[3] In degenerate cases, such as when the conditioning set has zero measure in continuous spaces, the conditional distribution may be represented using the Dirac delta function \delta, which concentrates the probability mass at a point while preserving integrability; this generalized function ensures the formalism remains consistent even for singular distributions.[7]Discrete Case
Conditional Probability Mass Function
In the discrete case, the conditional probability mass function (PMF) of a random variable X given another discrete random variable Y = y extends the fundamental concept of conditional probability to the probability mass over the support of X.[10] For y in the support of Y where p_Y(y) > 0, the conditional PMF is defined as p_{X|Y}(x|y) = \frac{p_{X,Y}(x,y)}{p_Y(y)}, where p_{X,Y}(x,y) is the joint PMF of X and Y, and p_Y(y) is the marginal PMF of Y.[10][11] This formula derives directly from the definition of conditional probability applied to events \{X = x\} and \{Y = y\}: P(X = x \mid Y = y) = P(X = x, Y = y) / P(Y = y). Equivalently, the joint PMF can be factored as p_{X,Y}(x,y) = p_{X|Y}(x|y) \, p_Y(y), and solving for the conditional term yields the expression above; this factorization holds by the law of total probability, as summing over x recovers the marginal p_Y(y) = \sum_x p_{X,Y}(x,y).[12][13] The support of the conditional PMF p_{X|Y}(\cdot|y) consists of all x such that p_{X,Y}(x,y) > 0, and it is zero elsewhere; thus, the domain of the conditional distribution may vary with the specific value of y, but it always forms a valid PMF summing to 1 over its support.[11][14] To compute the conditional PMF, consider the outcomes of two independent fair six-sided dice, with X as the result of the first die and Y as the result of the second; the joint PMF is uniform at p_{X,Y}(x,y) = 1/36 for x, y = 1, \dots, 6. The marginal PMF of Y is p_Y(y) = 1/6 for each y. Thus, for any fixed y, the conditional PMF is p_{X|Y}(x|y) = (1/36) / (1/6) = 1/6 for x = 1, \dots, 6, independent of y. This can be visualized in the joint PMF table below, where the conditional row for a specific y (e.g., y=3) is obtained by dividing the joint row by the marginal p_Y(3) = 1/6:| x \backslash y | 1 | 2 | 3 | ... | 6 |
|---|---|---|---|---|---|
| 1 | 1/36 | 1/36 | 1/36 | ... | 1/36 |
| 2 | 1/36 | 1/36 | 1/36 | ... | 1/36 |
| ... | ... | ... | ... | ... | ... |
| 6 | 1/36 | 1/36 | 1/36 | ... | 1/36 |
| Marginal p_Y(y) | 1/6 | 1/6 | 1/6 | ... | 1/6 |
Examples and Applications
A basic example of a conditional PMF for dependent discrete variables uses a joint probability table. Suppose X and Y are binary random variables with joint PMF:| X \backslash Y | 0 | 1 |
|---|---|---|
| 0 | 0.3 | 0.2 |
| 1 | 0.1 | 0.4 |