Fact-checked by Grok 2 weeks ago

Conditional expectation

In probability theory, the conditional expectation of a random variable X given another random variable Y = y is defined as the expected value of X with respect to its conditional probability distribution given Y = y.^[1] This concept provides a way to compute averages under partial information about the random outcome, generalizing the unconditional expectation to scenarios where conditioning on events or other variables refines the prediction.^[1] Formally, for discrete random variables, the conditional expectation is given by \mathbb{E}[X \mid Y = y] = \sum_x x \cdot P(X = x \mid Y = y), while for continuous cases, it takes the form \mathbb{E}[X \mid Y = y] = \int x \cdot f_{X \mid Y}(x \mid y) \, dx, where f_{X \mid Y} is the conditional density function.^[1] In the more general measure-theoretic framework, the conditional expectation \mathbb{E}[X \mid \mathcal{G}] with respect to a sub-σ-algebra \mathcal{G} is the unique \mathcal{G}-measurable random variable that satisfies \int_A \mathbb{E}[X \mid \mathcal{G}] \, dP = \int_A X \, dP for all A \in \mathcal{G}, relying on the Radon-Nikodym theorem.^[2] This rigorous definition was first established by Andrey Kolmogorov in his 1933 monograph Foundations of the Theory of Probability, which axiomatized probability using measure theory and introduced conditional expectation as a projection onto the space of \mathcal{G}-measurable functions.^[3] Key properties of conditional expectation include linearity: \mathbb{E}[aX + bZ \mid Y] = a \mathbb{E}[X \mid Y] + b \mathbb{E}[Z \mid Y] for constants a, b, and the tower property (or law of iterated expectations): \mathbb{E}[\mathbb{E}[X \mid Y]] = \mathbb{E}[X], which underscores its role in breaking down complex expectations hierarchically.^[1] Additionally, if X is \mathcal{G}-measurable, then \mathbb{E}[X \mid \mathcal{G}] = X, reflecting that full information about X yields the variable itself.^[2] Conditional expectation is pivotal in numerous fields, enabling the computation of expected values under constraints or additional data, such as in Bayesian inference where it represents posterior means.^[4] It forms the basis for advanced stochastic processes, including martingales—where the conditional expectation of the future value equals the current value—and is essential in filtering theory, optimal prediction, and financial mathematics for modeling asset prices under uncertainty.^[4] In statistics, it underpins regression analysis, where the conditional expectation function describes the relationship between predictors and responses.^[5]

Examples

Dice Rolling

Consider the experiment of rolling two fair six-sided dice, where each die is independent and uniformly distributed over the outcomes {1, 2, 3, 4, 5, 6}. Let X denote the sum of the numbers shown on the two dice. The unconditional expected value E[X] is 7, as it equals the sum of the expected values of the individual dice, each of which has E[\text{die}] = (1+2+3+4+5+6)/6 = 3.5. Now suppose we are given the information that the first die shows a 1. This restricts the possible outcomes to the six equally likely cases where the second die shows 1 through 6, yielding sums of 2, 3, 4, 5, 6, or 7. The conditional expectation E[X \mid \text{first die} = 1] is the average over this restricted sample space:

E[X \mid \text{first die} = 1] = \frac{2 + 3 + 4 + 5 + 6 + 7}{6} = \frac{27}{6} = 4.5.

This value, 4.5, contrasts with the unconditional expectation of 7, illustrating how the additional information about the first die being 1 updates our prediction of the total sum downward, since the first contribution is now fixed at a low value of 1 while the second die remains unbiased. In essence, the conditional expectation serves as the best (in the mean squared error sense) prediction of X given the partial information from the first die, averaging the possible outcomes consistent with that information.

Rainfall Data

In a practical setting, conditional expectation can be illustrated using historical monthly rainfall data collected over multiple years in a temperate region. Let X represent the rainfall amount in a given month, measured in inches. The conditional expectation E[X \mid \text{summer months}] is computed as the average of the observed rainfall values specifically for the summer months (June, July, and August), providing an empirical estimate of the expected rainfall given that the month falls in summer. This approach leverages the law of total expectation in a data-driven manner, where the overall expected rainfall is a weighted average of seasonal conditionals. For instance, consider a small sample of observed summer rainfall values from historical records: 2.5 inches, 3.0 inches, and 1.8 inches. The conditional expectation is then (2.5 + 3.0 + 1.8)/3 = 2.4 inches, representing the best estimate of typical summer rainfall based on these observations. To highlight how conditioning on the season reduces variability compared to the unconditional case, examine the following table of sample monthly rainfall data from one year, categorized by season. The unconditional monthly average across all months is approximately 2.8 inches, with higher spread due to seasonal differences. In contrast, the summer conditional average of 2.4 inches shows less variation within that subset, illustrating how conditional expectation narrows the focus and typically lowers uncertainty for predictions within the conditioned event.

Month	Season	Rainfall (inches)
June	Summer	2.5
July	Summer	3.0
August	Summer	1.8
December	Winter	4.2
January	Winter	3.5
February	Winter	2.0

The summer values exhibit a standard deviation of about 0.6 inches, lower than the overall sample's 1.0 inches, demonstrating the variability reduction achieved by conditioning. This empirical method parallels the averaging in discrete cases like dice rolls but applies to continuous measurements derived from real-world observations.

History

Early Ideas in Conditional Probability

The origins of conditional probability concepts can be traced to the mid-17th century, particularly through the correspondence between Blaise Pascal and Pierre de Fermat on the "problem of points." In 1654, prompted by the gambler Chevalier de Méré, Pascal and Fermat addressed the question of fairly dividing stakes in an interrupted game of chance, such as one where players alternate tossing a coin until one reaches a required number of successes. Their solutions implicitly relied on conditional probabilities by evaluating the likelihood of each player winning from the current state onward, based on the remaining rounds needed, rather than restarting the game. For instance, if a game required three successes and one player had two while the other had one, they calculated the division by considering the probabilities of outcomes conditional on the interruption point, effectively apportioning stakes proportional to these chances.^[6] A significant advancement came with Thomas Bayes' posthumously published 1763 essay, "An Essay towards Solving a Problem in the Doctrine of Chances," which introduced inverse probability to formalize conditional reasoning. Bayes derived the rule for updating probabilities based on observed evidence, stating that the probability of cause A given effect B is proportional to the probability of B given A times the prior probability of A, normalized by the total probability of B:

P(A|B) = \frac{P(B|A) P(A)}{P(B)}

This theorem provided a systematic way to compute conditional probabilities without directly specifying expectations, laying essential groundwork for later probabilistic inference by reversing the direction of conditioning from effect to cause.^[7] Pierre-Simon Laplace further extended these ideas in his 1812 Théorie Analytique des Probabilités, applying conditional and inverse probabilities to astronomical problems such as estimating planetary masses and perturbations in celestial mechanics. Laplace used these methods to weigh observations conditionally on prior assumptions, producing estimates that functioned as weighted averages of possible values—foreshadowing modern conditional expectations—particularly in analyzing the stability of the solar system through probabilistic corrections to observational data.^[8]

Modern Formalization

The modern formalization of conditional expectation arose in the early 20th century as part of the axiomatization of probability theory within the framework of measure theory. Andrey Kolmogorov's 1933 monograph Grundbegriffe der Wahrscheinlichkeitsrechnung (translated as Foundations of the Theory of Probability) established probability as a measure-theoretic discipline and introduced conditional expectation E[X \mid \mathcal{G}] as the Radon-Nikodym derivative of the signed measure \nu(A) = \int_A X \, dP with respect to the \sigma-algebra \mathcal{G} \subseteq \mathcal{F}, where (\Omega, \mathcal{F}, P) is the probability space and X is an integrable random variable.^[9] This definition generalized conditional expectations to arbitrary \sigma-algebras, providing a precise tool for incorporating partial information in probabilistic models beyond finite or countable sample spaces.^[9] In the ensuing decades, particularly during the 1930s and 1940s, scholars like Joseph L. Doob advanced this foundation by embedding conditional expectation into the theory of martingales and stochastic processes. Doob's work formalized martingales as sequences of random variables where E[X_{n+1} \mid \mathcal{F}_n] = X_n, leveraging conditional expectations to study convergence, optional stopping, and the behavior of stochastic processes under incomplete information.^[10]^[11] These developments, building on Kolmogorov's axioms, transformed conditional expectation from a definitional construct into a cornerstone for analyzing time-dependent phenomena in probability, including diffusion processes and filtering problems.^[10] By the 1950s, these abstract concepts gained wider accessibility through authoritative textbooks that highlighted their practical utility. William Feller's An Introduction to Probability Theory and Its Applications, Volume I (1950) introduced conditional expectations in the context of discrete and continuous distributions, underscoring their role in predictive modeling and optimal estimation, such as in sequential analysis and renewal theory.^[12] This pedagogical emphasis helped integrate conditional expectation into mainstream probability education, facilitating its application in fields like statistics and engineering.

Definitions

Conditioning on an Event

The conditional expectation of an integrable random variable X given an event A with P(A) > 0 is defined by

E[X \mid A] = \frac{1}{P(A)} \int_A X \, dP = \frac{E[X \mathbf{1}_A]}{P(A)},

where \mathbf{1}_A denotes the indicator random variable of A.^[13] This expression represents the expected value of X restricted to the outcomes in A, normalized by the probability of A.^[14] The quantity E[X \mid A] is a constant (degenerate random variable) that satisfies the fundamental defining property of conditional expectation in this context:

\int_A E[X \mid A] \, dP = \int_A X \, dP.

Similarly, E[X \mid A^c] = \frac{E[X \mathbf{1}_{A^c}]}{P(A^c)} is defined for the complementary event A^c with P(A^c) > 0, and it satisfies \int_{A^c} E[X \mid A^c] \, dP = \int_{A^c} X \, dP. More generally, the random variable E[X \mid \sigma(A)] is E[X \mid A] on A and E[X \mid A^c] on A^c, which is \sigma(A)-measurable and satisfies \int_G E[X \mid \sigma(A)] \, dP = \int_G X \, dP for all G \in \sigma(A).^[13]^[14] These properties confirm that E[X \mid A] preserves the expectation of X over the relevant partitions of the sample space induced by A. To illustrate, consider a discrete uniform probability space \Omega = \{0, 1\} with P(\{\omega\}) = \frac{1}{2} for each \omega \in \Omega, and let X(\omega) = \omega (a Bernoulli random variable with parameter \frac{1}{2}). The event A = \{X > 0\} = \{1\} has P(A) = \frac{1}{2}. Then \mathbf{1}_A(\omega) = X(\omega), so E[X \mathbf{1}_A] = E[X^2] = E[X] = \frac{1}{2} (since X^2 = X). Thus,

E[X \mid A] = \frac{\frac{1}{2}}{\frac{1}{2}} = 1,

which is the value of X on A. This example highlights how the conditional expectation collapses to the realized value when X is the indicator of A itself.^[13] This notion of conditioning on a single event serves as the foundational case, which extends to conditioning on random variables in more general settings.^[14]

Discrete Random Variables

In the discrete case, consider two random variables X and Y defined on a probability space, where Y takes values in a countable set \{y_i : i \in I\} with P(Y = y_i) > 0 for each i. The conditional expectation of X given Y = y_i, denoted E[X \mid Y = y_i], is defined as the expected value of X under the conditional probability mass function of X given Y = y_i:

E[X \mid Y = y_i] = \sum_{x} x \, P(X = x \mid Y = y_i),

where the sum is over the support of X, and the conditional probability is given by P(X = x \mid Y = y_i) = P(X = x, Y = y_i) / P(Y = y_i).^[15]^[16] The conditional expectation E[X \mid Y] is then a random variable on the original probability space, expressed as

E[X \mid Y](\omega) = \sum_{i} E[X \mid Y = y_i] \, 1_{\{Y = y_i\}}(\omega)

for each outcome \omega, where $1_{\{Y = y_i\}} is the indicator function of the event \{Y = y_i\}. This construction yields a step function that is constant on each atom \{Y = y_i\} of the partition induced by Y.^[15] A key property verifying this definition is that E[X \mid Y] satisfies the integral condition for each i: E[ E[X \mid Y] \, 1_{\{Y = y_i\}} ] = E[ X \, 1_{\{Y = y_i\}} ]. To see this, note that E[ E[X \mid Y] \, 1_{\{Y = y_i\}} ] = E[X \mid Y = y_i] \, P(Y = y_i), while the right side expands to \sum_x x \, P(X = x, Y = y_i), and substituting the definition of E[X \mid Y = y_i] confirms equality. This holds under the assumption that E[|X|] < \infty.^[15]^[16] When Y is an indicator random variable of an event A, the construction reduces to conditioning on the event A. For a concrete illustration linking to the dice rolling example, suppose two fair six-sided dice are rolled independently, letting D_1 be the outcome of the first die and S = D_1 + D_2 the sum. Then E[S \mid D_1 = k] = k + 3.5 for k = 1, \dots, 6, since E[D_2] = 3.5 by independence, so E[S \mid D_1] = D_1 + 3.5.^[15]

Continuous Random Variables

For continuous random variables, the concept of conditional expectation extends the discrete case by relying on probability densities rather than mass functions, allowing computation through integration over the support.^[17] Suppose X and Y are jointly continuous random variables with joint probability density function f_{X,Y}(x,y), and let Y have marginal density f_Y(y). The conditional expectation of X given Y = y, denoted E[X \mid Y = y], is defined as

E[X \mid Y = y] = \int_{-\infty}^{\infty} x f_{X \mid Y}(x \mid y) \, dx,

where the conditional density f_{X \mid Y}(x \mid y) is given by

f_{X \mid Y}(x \mid y) = \frac{f_{X,Y}(x,y)}{f_Y(y)}

provided f_Y(y) > 0.^[17] This formulation requires the joint distribution to be absolutely continuous with respect to Lebesgue measure on \mathbb{R}^2, ensuring the existence of densities, and f_Y(y) to be positive on the relevant support to avoid division by zero.^[17] The conditional expectation E[X \mid Y] is itself a random variable, defined as a measurable function of Y with respect to the \sigma-algebra generated by Y, denoted \sigma(Y). It satisfies the characterizing property that for any bounded measurable function g,

E\left[ E[X \mid Y] \, g(Y) \right] = E\left[ X \, g(Y) \right].

This property ensures E[X \mid Y] captures the best prediction of X based on Y in an integral sense, analogous to the discrete case but adapted to continuous spaces.^[18] A concrete computation arises when X and Y are jointly normal with means \mu_X and \mu_Y, variances \sigma_X^2 and \sigma_Y^2, and correlation coefficient \rho. In this case,

E[X \mid Y = y] = \mu_X + \rho \frac{\sigma_X}{\sigma_Y} (y - \mu_Y),

which is a linear function of y, highlighting the affine relationship in Gaussian settings.^[19]

L² Random Variables

In the context of a probability space (\Omega, \mathcal{F}, P), the space L^2(\Omega, \mathcal{F}, P) of square-integrable random variables forms a Hilbert space with inner product \langle X, Y \rangle = E[XY]. For a sub-σ-algebra \mathcal{G} \subseteq \mathcal{F}, the subspace L^2(\Omega, \mathcal{G}, P) is a closed subspace of this Hilbert space. The conditional expectation E[X \mid \mathcal{G}] of X \in L^2(\Omega, \mathcal{F}, P) is the orthogonal projection of X onto L^2(\Omega, \mathcal{G}, P), unique by Hilbert space theory.^[20]^[13] The defining property of this projection is orthogonality in L^2:

E[(X - E[X \mid \mathcal{G}]) Z] = 0

for all Z \in L^2(\Omega, \mathcal{G}, P). This condition ensures that the error X - E[X \mid \mathcal{G}] is perpendicular to every element of the subspace.^[20]^[13] Existence of the projection follows from the completeness of L^2 and the Riesz representation theorem, which guarantees a unique minimizer of the distance \|X - Y\|_{L^2} over Y \in L^2(\Omega, \mathcal{G}, P). Uniqueness holds up to equivalence classes modulo null sets, as any discrepancy on a set of positive probability would contradict the orthogonality.^[20]^[13] This L^2 formulation defines conditional expectation in the mean-square sense, emphasizing approximation in the L^2 norm rather than pointwise values. It avoids reliance on densities or explicit integrals, unlike formulations for continuous or discrete variables, and serves as a foundational case for broader measure-theoretic extensions.^[20]^[13]

σ-Algebra Conditioning

The conditional expectation with respect to a sub-σ-algebra provides the most general formulation, applicable to any integrable random variable without requiring finite variance. Consider a probability space (\Omega, \mathcal{F}, P) and an integrable random variable X \in L^1(\Omega, \mathcal{F}, P). For a sub-σ-algebra \mathcal{G} \subset \mathcal{F}, the conditional expectation E[X \mid \mathcal{G}] is defined as the \mathcal{G}-measurable random variable Y satisfying

\int_G Y \, dP = \int_G X \, dP

for all G \in \mathcal{G}.^[21]^[22] This characterizing property arises from applying the Radon-Nikodym theorem to the signed measure \nu(G) = \int_G X \, dP for G \in \mathcal{G}, which is absolutely continuous with respect to the restriction of P to \mathcal{G}.^[23]^[4] The Radon-Nikodym theorem guarantees the existence of such a Y, and uniqueness holds up to P-almost sure equality.^[23]^[24] In the special case of conditioning on a random variable Y, E[X \mid Y] is defined as E[X \mid \sigma(Y)], where \sigma(Y) denotes the σ-algebra generated by Y.^[20]^[22] This framework accommodates conditioning on non-denumerable information structures, such as filtrations \{\mathcal{F}_t\}_{t \geq 0} in stochastic processes, where \mathcal{F}_t represents the accumulated information up to time t and enables the analysis of adapted processes.^[25]^[26] For X \in L^2, this conditional expectation coincides with the orthogonal projection of X onto the closed subspace of \mathcal{G}-measurable square-integrable random variables.^[21]^[24]

Properties

Linearity and Tower Property

The linearity of conditional expectation is a fundamental algebraic property that mirrors the linearity of unconditional expectation. For constants a, b \in \mathbb{R} and integrable random variables X, Y on a probability space (\Omega, \mathcal{F}, P), the conditional expectation with respect to a sub-\sigma-algebra \mathcal{G} \subset \mathcal{F} satisfies

\mathbb{E}[aX + bY \mid \mathcal{G}] = a \mathbb{E}[X \mid \mathcal{G}] + b \mathbb{E}[Y \mid \mathcal{G}]

almost surely.^[26] This holds because conditional expectation is defined as the \mathcal{G}-measurable random variable Z = \mathbb{E}[X \mid \mathcal{G}] satisfying \int_A Z \, dP = \int_A X \, dP for all A \in \mathcal{G}, and linearity of the integral extends directly to linear combinations: for any A \in \mathcal{G},

\int_A (aX + bY) \, dP = a \int_A X \, dP + b \int_A Y \, dP = a \int_A \mathbb{E}[X \mid \mathcal{G}] \, dP + b \int_A \mathbb{E}[Y \mid \mathcal{G}] \, dP,

which implies the result by uniqueness of the conditional expectation.^[27] This property facilitates computations involving sums and scalar multiples in conditional settings, such as in filtering problems or risk assessment models. Another key structural property is the tower property, also known as the law of iterated expectations, which governs nested conditioning on nested \sigma-algebras. If \mathcal{H} \subset \mathcal{G} \subset \mathcal{F} and X is integrable, then

\mathbb{E}[\mathbb{E}[X \mid \mathcal{G}] \mid \mathcal{H}] = \mathbb{E}[X \mid \mathcal{H}]

almost surely.^[28] To see this, let Z = \mathbb{E}[X \mid \mathcal{G}], which is \mathcal{G}-measurable and thus \mathcal{F}-integrable. For any B \in \mathcal{H},

\int_B Z \, dP = \int_B \mathbb{E}[X \mid \mathcal{G}] \, dP = \int_B X \, dP,

by the defining property of Z applied over sets in \mathcal{G} (which includes \mathcal{H}). Therefore, \mathbb{E}[Z \mid \mathcal{H}] satisfies the integral condition for \mathbb{E}[X \mid \mathcal{H}], yielding the equality by uniqueness.^[21] This double application of the integral characterization underscores the property's reliance on the projection-like nature of conditional expectation onto coarser information structures. A concrete illustration arises in the discrete case with nested conditioning. Suppose X, Y, Z are discrete random variables where conditioning on Y provides intermediate information relative to Z, such as Z representing coarse outcomes and Y finer partitions. Then \mathbb{E}[\mathbb{E}[X \mid Y] \mid Z] = \mathbb{E}[X \mid Z] almost surely, as the tower property simplifies the computation by collapsing the inner expectation to the outer one. For instance, if X is the outcome of a multi-stage experiment, Y the result after the first stage, and Z an even coarser summary, this allows efficient hierarchical evaluation without recomputing full joint distributions.^[5] These properties enable the decomposition of complex conditioning hierarchies into manageable steps, essential for applications like sequential decision-making under uncertainty or multi-level stochastic modeling, where information accrues progressively across filtrations.^[20]

Monotonicity and Integrability

The conditional expectation operator preserves the order of random variables. Specifically, if X and Y are integrable random variables on a probability space (\Omega, \mathcal{F}, P) such that X \leq Y almost surely, then E[X \mid \mathcal{G}] \leq E[Y \mid \mathcal{G}] almost surely for any sub-\sigma-algebra \mathcal{G} \subseteq \mathcal{F}. This monotonicity property follows directly from the linearity of conditional expectation: since Y - X \geq 0 almost surely and E[(Y - X) \mid \mathcal{G}] \geq 0 (as shown below for non-negative variables), it implies E[Y \mid \mathcal{G}] - E[X \mid \mathcal{G}] \geq 0 almost surely.^[5]^[23] A key consequence of monotonicity is the preservation of positivity. If X \geq 0 almost surely and E[|X|] < \infty, then E[X \mid \mathcal{G}] \geq 0 almost surely. To see this, apply monotonicity to the case where Y = 0 (which is non-negative) and X \geq Y, yielding the desired inequality. This property ensures that conditional expectations align with intuitive notions of averaging non-negative quantities.^[20]^[29] Conditional expectations also satisfy Jensen's inequality for convex functions. Let \phi: \mathbb{R} \to \mathbb{R} be convex, and suppose X is integrable with E[|\phi(X)|] < \infty. Then \phi(E[X \mid \mathcal{G}]) \leq E[\phi(X) \mid \mathcal{G}] almost surely. For X \in L^1, this follows from the definition of conditional expectation as the integral projection onto \mathcal{G}-measurable functions and the convexity of \phi, which implies the inequality via supporting hyperplanes. In the L^2 setting, where conditional expectation is the orthogonal projection onto the subspace of \mathcal{G}-measurable square-integrable functions, the result holds by applying the unconditional Jensen's inequality to the conditional distribution.^[21]^[20] These order-preserving properties extend to integrability conditions. For integrable X, the absolute value of the conditional expectation is bounded by the conditional expectation of the absolute value: |E[X \mid \mathcal{G}]| \leq E[|X| \mid \mathcal{G}] almost surely. This inequality arises because |E[X \mid \mathcal{G}]| = E[\operatorname{sign}(E[X \mid \mathcal{G}]) X \mid \mathcal{G}] (by the defining property on sets where E[X \mid \mathcal{G}] has constant sign) and |\operatorname{sign}(E[X \mid \mathcal{G}]) X| = |X|, so monotonicity or positivity applies to yield the bound. Taking unconditional expectations further gives E[|E[X \mid \mathcal{G}]|] \leq E[|X|], confirming that conditional expectation does not increase overall integrability.^[30]^[21]

Connections

To Regression

In the framework of L^2 random variables, the conditional expectation E[Y \mid X] serves as the optimal predictor of Y given X in the mean squared error sense. Specifically, it minimizes E[(Y - g(X))^2] over all \sigma(X)-measurable functions g, as E[Y \mid X] is the orthogonal projection of Y onto the closed subspace of L^2 consisting of \sigma(X)-measurable random variables; the error Y - E[Y \mid X] is then orthogonal to this subspace, ensuring the minimum is attained uniquely up to almost sure equivalence.^[20]^[13] This minimization property positions conditional expectation at the core of regression analysis, where predicting Y from X aims to reduce prediction error. In the linear regression setting, if X and Y are jointly normally distributed, E[Y \mid X = x] takes the explicit linear form \mu_Y + \rho \frac{\sigma_Y}{\sigma_X} (x - \mu_X), aligning precisely with the ordinary least squares (OLS) regression line; without joint normality, the regression function E[Y \mid X] is generally nonlinear, allowing for more flexible modeling beyond linear assumptions.^[31] To illustrate in the simple linear case, suppose Y = \beta_0 + \beta_1 X + \epsilon where \epsilon is independent of X with E[\epsilon] = 0 and \operatorname{Var}(\epsilon) = \sigma^2. Then E[Y \mid X = x] = \beta_0 + \beta_1 x, and the law of total variance decomposes \operatorname{Var}(Y) = E[\operatorname{Var}(Y \mid X)] + \operatorname{Var}(E[Y \mid X]), yielding \operatorname{Var}(Y) = \sigma^2 + \beta_1^2 \operatorname{Var}(X); here, \operatorname{Var}(E[Y \mid X]) quantifies the variance explained by X, while E[\operatorname{Var}(Y \mid X)] = \sigma^2 captures the residual variability.^[13] Historically, this probabilistic perspective on conditional expectation links to the Gauss-Markov theorem, which establishes that, under assumptions of linearity, no serial correlation, homoscedasticity, and exogeneity (i.e., E[\epsilon \mid X] = 0), the OLS estimator is the best linear unbiased estimator (BLUE) for the regression coefficients; in this view, OLS recovers the conditional mean when the model is correctly specified as linear.^[32]

To Martingales

In martingale theory, conditional expectation serves as the defining property for a class of stochastic processes known as martingales. A martingale is a sequence of random variables (M_n)_{n \geq 0} adapted to an increasing filtration (\mathcal{F}_n)_{n \geq 0} of sigma-algebras such that \mathbb{E}[|M_n|] < \infty for all n and \mathbb{E}[M_{n+1} \mid \mathcal{F}_n] = M_n almost surely for each n \geq 0. This condition implies that the process has no predictable drift, making it a model for fair games or unbiased evolution in stochastic settings. A key construction linking conditional expectation directly to martingales is Doob's martingale. For an integrable random variable X and a filtration (\mathcal{F}_t)_{t \geq 0}, the process defined by M_t = \mathbb{E}[X \mid \mathcal{F}_t] forms a martingale, as it satisfies the conditional expectation equality with respect to the filtration. This process captures the progressive revelation of information about X through the filtration, preserving the martingale property at each step. The optional sampling theorem extends this framework to stopping times, which are random times adapted to the filtration. For a martingale M and a stopping time \tau that is bounded or satisfies uniform integrability conditions (such as \sup_t \mathbb{E}[|M_t|] < \infty), the theorem asserts that \mathbb{E}[M_\tau] = \mathbb{E}[M_0] almost surely. This result, central to analyzing paths that halt at random times, relies on the martingale property derived from conditional expectations.^[13] Martingales constructed via conditional expectations find applications in modeling fair games and solving stopping problems. In a fair game, where each bet has zero expected gain, the gambler's fortune process is a martingale, and the optional sampling theorem implies that the expected fortune at any stopping time equals the initial amount. For instance, in the gambler's ruin problem—a random walk on \{0, 1, \dots, N\} starting at i (with $0 < i < N) that absorbs at 0 or N, and fair steps (p = 1/2)—the position S_n is a martingale. Applying optional sampling at the ruin time \tau yields the probability of reaching N before 0 as i/N.^[13]

References

[1]
Conditional expectation | Definition, formula, examples - StatLect
The conditional expectation (or conditional expected value, or conditional mean) is the expected value of a random variable, computed with respect to a ...
[2]
[PDF] Conditional expectation - Purdue Math
We are given a probability space (Ω,F0,P) and. A σ-algebra F⊂F0. X ∈ F0 such that E[|X|] < ∞. Conditional expectation of X given F: Denoted by E[X|F].
[3]
[PDF] FOUNDATIONS THEORY OF PROBABILITY - University of York
FOUNDATIONS. OF THE. THEORY OF PROBABILITY. BY. A.N. KOLMOGOROV. Second English Edition. TRANSLATION EDITED BY. NATHAN MORRISON. WITH AN ADDED BIBLIOGRPAHY BY.
[4]
[PDF] 5. Conditional Expectation
The general definition of conditional expectation is fairly abstract and takes a bit of getting used to. We shall first build intuition by recalling the ...
[5]
[PDF] 6. Conditional Probability and Expectation
The measure-theoretic definition of conditional expectation is a bit unintuitive, but we will show how it matches what we already know from earlier study. ...
[6]
July 1654: Pascal's Letters to Fermat on the "Problem of Points"
Jul 1, 2009 · In 1654, a French essayist and amateur mathematician named Antoine Gombaud, who was fond of gambling, found himself pondering what is known as “the problem of ...
[7]
LII. An essay towards solving a problem in the doctrine of chances ...
This is an essay by Rev. Mr. Bayes, found among his papers, sent to John Canton, and published on 01 January 1763.
[8]
[PDF] FROM LAPLACE TO SUPERNOVA SN 1987A: BAYESIAN ...
Later, independently, Laplace greatly developed probability theory, with BT playing a key role. He used it to address many concrete problems in astrophysics. ...
[9]
[PDF] The origins and legacy of Kolmogorov's Grundbegriffe
Feb 8, 2003 · This is the most novel chapter; here Kolmogorov defines conditional probability and conditional expectation using the Radon-Nikodym theorem.
[10]
Joseph Doob (1910 - 2004) - Biography - MacTutor
Doob built on work by Paul Lévy and, during the 1940's and 1950's, he developed basic martingale theory and many of its applications. Doob's work has become ...
[11]
[PDF] Martingales - Books by Rene Schilling
Jul 20, 2014 · Following earlier work by Paul Lévy and Jean Ville, Joseph Leo Doob developed in the 1940s and 1950s basic martingale theory and many of its ...
[12]
[PDF] Introduction to Probability Theory and Its Applications
FELLER · An Introduction to Probability Theory and Its Applications,. Volume ... V CONDITIONAL PROBABILITY. STOCHASTIC INDEPENDENCE . 114. 1. Conditional ...
[13]
[PDF] Probability: Theory and Examples Rick Durrett Version 5 January 11 ...
Jan 11, 2019 · ... Probability: Theory and Examples. Rick Durrett. Version 5 January 11, 2019. Copyright 2019, All rights reserved. Page 2. ii. Page 3. Preface.
[14]
[PDF] Probability and Measure - University of Colorado Boulder
Measure and integral are used together in Chapters 4 and 5 for the study of random sums, the Poisson process, convergence of measures, characteristic functions, ...
[15]
[PDF] A Conditional expectation - Arizona Math
Conditional expectations such as E[X|Y = 2] or E[X|Y = 5] are numbers. If we consider E[X|Y = y], it is a number that depends on y. So it is a function of y.
[16]
[PDF] Conditional Expectation - Texas A&M University
The conditional expectation of a discrete random variable Y given that X “ x is defined as. ErY | X “ xs “ ÿ y y PrrY “ y | X “ xs. The conditional ...
[17]
20.2 - Conditional Distributions for Continuous Random Variables
We can find the conditional mean of Y given X = x just by using the definition in the continuous case. That is: Note that given that the conditional ...
[18]
[PDF] 18.600: Lecture 26 .1in Conditional expectation
▷ It all starts with the definition of conditional probability: P(A|B) = P(AB)/P(B). ▷ If X and Y are jointly discrete random variables, we can use this ...
[19]
21.1 - Conditional Distribution of Y Given X | STAT 414
If the conditional distribution of given follows a normal distribution with mean and constant variance, then the conditional variance is:<|control11|><|separator|>
[20]
[PDF] CONDITIONAL EXPECTATION Definition 1. Let (Ω,F,P) be a ...
For any real random variable X ∈ L2(Ω,F,P), define E(X |G) to be the orthogonal projection of X onto the closed subspace L2(Ω,G,P). This definition may seem a ...
[21]
[PDF] Lecture 10 Conditional Expectation
Since probability is simply an expectation of an indicator, and expectations are linear, it will be easier to work with expectations and no generality will be ...
[22]
[PDF] Radon-Nikodym Theorem and Conditional Expectation
Feb 13, 2002 · Conditional expectation reflects the change in unconditional probabilities due to some auxiliary information. The latter is represented by a ...Missing: Kolmogorov Grundbegriffe der Wahrscheinlichkeitstheorie 1933
[23]
[PDF] Conditional Expectation - UC Davis Mathematics
Conditional expectation: Remarks on the definition. (4) Existence follows from the Radon-Nikodym theorem. If µ, ν are σ-finite positive measures on (Ω,F) ...<|control11|><|separator|>
[24]
[PDF] a brief primer on conditional expectation - People
A BRIEF PRIMER ON CONDITIONAL EXPECTATION. Another way to define the conditional expectation is to invoke the Radon-Nikodym theorem. (By this point students ...
[25]
[PDF] Lecture 9: Filteration and martingales - MIT OpenCourseWare
Oct 2, 2013 · Theorem 1. The conditional expectation E[X|G] exists and is unique. Uniqueness means that if Y ' ∈ G is any other random variable satisfying.
[26]
[PDF] conditional expectation and martingales
INTRODUCTION. Martingales play a role in stochastic processes roughly similar to that played by conserved quantities in dynamical systems.Missing: development 1930s 1940s
[27]
[PDF] Lecture 2 : Conditional Expectation II
[Dur10] Rick Durrett. Probability: theory and examples. Cambridge Series in. Statistical and Probabilistic Mathematics. Cambridge University Press,. Cambridge, ...
[28]
[PDF] Conditional Expectation - Stat@Duke
E[X | G] = E E[X | H] G This is called the “tower” property of conditional expectation. If X ∈ L2(Ω,F,P) and {Yn} ⊂ L2(Ω,F,P) and if X and {Yn} are jointly ...
[29]
[PDF] Conditional Expectation - Stat@Duke
Oct 28, 2020 · Monotonicity: If X ≥ Z a.s., then E[X | G] ≥ E[Z | G] a.s. for ... This MCT can be used to prove a conditional Fatou's Lemma for Xn ≥ 0:.
[30]
[PDF] Conditional probability and conditional expectation
From (1), E[E[X|G]|H] = E[X|H]. E[|X||G]dP hence |E[X|G]| ≤ E[|X||G]. The useful lemma can be used to easily compute conditional expectation. on set C ∈ G1 for ...
[31]
[PDF] Conditional Expectations and Regression Analysis
. If the joint distribution of x and y is a normal distribution, then it is straightforward to find an expression for the function E(y|x). In the case of a.
[32]
Gauss Markov theorem - StatLect
The Gauss Markov theorem: under what conditions the OLS estimator of the coefficients of a linear regression is BLUE (best linear unbiased estimator).