Fact-checked by Grok 2 weeks ago

Conditional expectation

In , the conditional expectation of a X given another Y = y is defined as the of X with respect to its given Y = y. This concept provides a way to compute averages under partial information about the random outcome, generalizing the unconditional to scenarios where on events or other variables refines the prediction. Formally, for discrete random variables, the conditional expectation is given by \mathbb{E}[X \mid Y = y] = \sum_x x \cdot P(X = x \mid Y = y), while for continuous cases, it takes the form \mathbb{E}[X \mid Y = y] = \int x \cdot f_{X \mid Y}(x \mid y) \, dx, where f_{X \mid Y} is the conditional density function. In the more general measure-theoretic framework, the conditional expectation \mathbb{E}[X \mid \mathcal{G}] with respect to a sub-σ-algebra \mathcal{G} is the unique \mathcal{G}-measurable random variable that satisfies \int_A \mathbb{E}[X \mid \mathcal{G}] \, dP = \int_A X \, dP for all A \in \mathcal{G}, relying on the Radon-Nikodym theorem. This rigorous definition was first established by Andrey Kolmogorov in his 1933 monograph Foundations of the Theory of Probability, which axiomatized probability using measure theory and introduced conditional expectation as a projection onto the space of \mathcal{G}-measurable functions. Key properties of conditional expectation include : \mathbb{E}[aX + bZ \mid Y] = a \mathbb{E}[X \mid Y] + b \mathbb{E}[Z \mid Y] for constants a, b, and the tower property (or law of iterated expectations): \mathbb{E}[\mathbb{E}[X \mid Y]] = \mathbb{E}[X], which underscores its role in breaking down complex expectations hierarchically. Additionally, if X is \mathcal{G}-measurable, then \mathbb{E}[X \mid \mathcal{G}] = X, reflecting that full information about X yields the variable itself. Conditional expectation is pivotal in numerous fields, enabling the computation of expected values under constraints or additional data, such as in where it represents posterior means. It forms the basis for advanced stochastic processes, including martingales—where the conditional expectation of the future value equals the current value—and is essential in filtering theory, optimal prediction, and financial mathematics for modeling asset prices under uncertainty. In statistics, it underpins , where the conditional expectation function describes the relationship between predictors and responses.

Examples

Dice Rolling

Consider the experiment of rolling two fair six-sided , where each die is and uniformly distributed over the outcomes {1, 2, 3, 4, 5, 6}. Let X denote the sum of the numbers shown on the two . The unconditional E[X] is 7, as it equals the sum of the expected values of the individual , each of which has E[\text{die}] = (1+2+3+4+5+6)/6 = 3.5. Now suppose we are given the information that the first die shows a 1. This restricts the possible outcomes to the six equally likely cases where the second die shows 1 through 6, yielding sums of 2, 3, 4, 5, 6, or 7. The conditional expectation E[X \mid \text{first die} = 1] is the average over this restricted : E[X \mid \text{first die} = 1] = \frac{2 + 3 + 4 + 5 + 6 + 7}{6} = \frac{27}{6} = 4.5. This value, 4.5, contrasts with the unconditional expectation of 7, illustrating how the additional information about the first die being 1 updates our prediction of the total sum downward, since the first contribution is now fixed at a low value of 1 while the second die remains unbiased. In essence, the conditional expectation serves as the best (in the mean squared error sense) prediction of X given the partial information from the first die, averaging the possible outcomes consistent with that information.

Rainfall Data

In a practical setting, conditional expectation can be illustrated using historical monthly rainfall data collected over multiple years in a temperate region. Let X represent the rainfall amount in a given month, measured in inches. The conditional expectation E[X \mid \text{summer months}] is computed as the of the observed rainfall values specifically for the summer months (, , and ), providing an empirical estimate of the expected rainfall given that the month falls in summer. This approach leverages the in a data-driven manner, where the overall expected rainfall is a weighted of seasonal conditionals. For instance, consider a small sample of observed summer rainfall values from historical records: 2.5 inches, 3.0 inches, and 1.8 inches. The conditional expectation is then (2.5 + 3.0 + 1.8)/3 = 2.4 inches, representing the best estimate of typical summer rainfall based on these observations. To highlight how conditioning on the reduces variability compared to the unconditional case, examine the following of sample monthly rainfall data from one year, categorized by . The unconditional monthly across all months is approximately 2.8 inches, with higher spread due to seasonal differences. In contrast, the summer conditional of 2.4 inches shows less variation within that , illustrating how conditional expectation narrows the focus and typically lowers for predictions within the conditioned event.
MonthSeasonRainfall (inches)
JuneSummer2.5
JulySummer3.0
AugustSummer1.8
DecemberWinter4.2
JanuaryWinter3.5
FebruaryWinter2.0
The summer values exhibit a standard deviation of about 0.6 inches, lower than the overall sample's 1.0 inches, demonstrating the variability reduction achieved by conditioning. This empirical method parallels the averaging in cases like rolls but applies to continuous measurements derived from real-world observations.

History

Early Ideas in Conditional Probability

The origins of conditional probability concepts can be traced to the mid-17th century, particularly through the correspondence between Blaise Pascal and Pierre de Fermat on the "problem of points." In 1654, prompted by the gambler Chevalier de Méré, Pascal and Fermat addressed the question of fairly dividing stakes in an interrupted game of chance, such as one where players alternate tossing a coin until one reaches a required number of successes. Their solutions implicitly relied on conditional probabilities by evaluating the likelihood of each player winning from the current state onward, based on the remaining rounds needed, rather than restarting the game. For instance, if a game required three successes and one player had two while the other had one, they calculated the division by considering the probabilities of outcomes conditional on the interruption point, effectively apportioning stakes proportional to these chances. A significant advancement came with ' posthumously published essay, "An Essay towards Solving a Problem in the Doctrine of Chances," which introduced to formalize conditional reasoning. Bayes derived the rule for updating probabilities based on observed evidence, stating that the probability of cause A given effect B is proportional to the probability of B given A times the of A, normalized by the total probability of B: P(A|B) = \frac{P(B|A) P(A)}{P(B)} This theorem provided a systematic way to compute conditional probabilities without directly specifying expectations, laying essential groundwork for later probabilistic inference by reversing the direction of conditioning from effect to cause. Pierre-Simon Laplace further extended these ideas in his 1812 Théorie Analytique des Probabilités, applying conditional and inverse probabilities to astronomical problems such as estimating planetary masses and perturbations in celestial mechanics. Laplace used these methods to weigh observations conditionally on prior assumptions, producing estimates that functioned as weighted averages of possible values—foreshadowing modern conditional expectations—particularly in analyzing the stability of the solar system through probabilistic corrections to observational data.

Modern Formalization

The modern formalization of conditional expectation arose in the early as part of the axiomatization of within the framework of measure theory. Andrey Kolmogorov's 1933 monograph Grundbegriffe der Wahrscheinlichkeitsrechnung (translated as Foundations of the Theory of Probability) established probability as a measure-theoretic discipline and introduced conditional expectation E[X \mid \mathcal{G}] as the Radon-Nikodym derivative of the \nu(A) = \int_A X \, dP with respect to the \sigma-algebra \mathcal{G} \subseteq \mathcal{F}, where (\Omega, \mathcal{F}, P) is the and X is an integrable . This definition generalized conditional expectations to arbitrary \sigma-algebras, providing a precise tool for incorporating partial information in probabilistic models beyond finite or countable sample spaces. In the ensuing decades, particularly during the and , scholars like advanced this foundation by embedding conditional expectation into the theory of martingales and stochastic processes. Doob's work formalized martingales as sequences of random variables where E[X_{n+1} \mid \mathcal{F}_n] = X_n, leveraging conditional expectations to study convergence, optional stopping, and the behavior of stochastic processes under incomplete information. These developments, building on Kolmogorov's axioms, transformed conditional expectation from a definitional construct into a cornerstone for analyzing time-dependent phenomena in probability, including diffusion processes and filtering problems. By the 1950s, these abstract concepts gained wider accessibility through authoritative textbooks that highlighted their practical utility. William Feller's An Introduction to Probability Theory and Its Applications, Volume I (1950) introduced conditional expectations in the context of discrete and continuous distributions, underscoring their role in predictive modeling and optimal estimation, such as in sequential analysis and renewal theory. This pedagogical emphasis helped integrate conditional expectation into mainstream probability education, facilitating its application in fields like statistics and engineering.

Definitions

Conditioning on an Event

The conditional expectation of an integrable X given an event A with P(A) > 0 is defined by E[X \mid A] = \frac{1}{P(A)} \int_A X \, dP = \frac{E[X \mathbf{1}_A]}{P(A)}, where \mathbf{1}_A denotes the indicator of A. This expression represents the of X restricted to the outcomes in A, normalized by the probability of A. The quantity E[X \mid A] is a constant (degenerate random variable) that satisfies the fundamental defining property of conditional expectation in this context: \int_A E[X \mid A] \, dP = \int_A X \, dP. Similarly, E[X \mid A^c] = \frac{E[X \mathbf{1}_{A^c}]}{P(A^c)} is defined for the A^c with P(A^c) > 0, and it satisfies \int_{A^c} E[X \mid A^c] \, dP = \int_{A^c} X \, dP. More generally, the E[X \mid \sigma(A)] is E[X \mid A] on A and E[X \mid A^c] on A^c, which is \sigma(A)-measurable and satisfies \int_G E[X \mid \sigma(A)] \, dP = \int_G X \, dP for all G \in \sigma(A). These properties confirm that E[X \mid A] preserves the of X over the relevant partitions of the induced by A. To illustrate, consider a uniform probability space \Omega = \{0, 1\} with P(\{\omega\}) = \frac{1}{2} for each \omega \in \Omega, and let X(\omega) = \omega (a random variable with parameter \frac{1}{2}). The event A = \{X > 0\} = \{1\} has P(A) = \frac{1}{2}. Then \mathbf{1}_A(\omega) = X(\omega), so E[X \mathbf{1}_A] = E[X^2] = E[X] = \frac{1}{2} (since X^2 = X). Thus, E[X \mid A] = \frac{\frac{1}{2}}{\frac{1}{2}} = 1, which is the value of X on A. This example highlights how the conditional expectation collapses to the realized value when X is the indicator of A itself. This notion of conditioning on a single event serves as the foundational case, which extends to conditioning on random variables in more general settings.

Discrete Random Variables

In the discrete case, consider two random variables X and Y defined on a , where Y takes values in a \{y_i : i \in I\} with P(Y = y_i) > 0 for each i. The conditional expectation of X given Y = y_i, denoted E[X \mid Y = y_i], is defined as the of X under the conditional probability mass function of X given Y = y_i: E[X \mid Y = y_i] = \sum_{x} x \, P(X = x \mid Y = y_i), where the sum is over the support of X, and the conditional probability is given by P(X = x \mid Y = y_i) = P(X = x, Y = y_i) / P(Y = y_i). The conditional expectation E[X \mid Y] is then a random variable on the original probability space, expressed as E[X \mid Y](\omega) = \sum_{i} E[X \mid Y = y_i] \, 1_{\{Y = y_i\}}(\omega) for each outcome \omega, where $1_{\{Y = y_i\}} is the indicator function of the event \{Y = y_i\}. This construction yields a step function that is constant on each atom \{Y = y_i\} of the partition induced by Y. A key property verifying this definition is that E[X \mid Y] satisfies the integral condition for each i: E[ E[X \mid Y] \, 1_{\{Y = y_i\}} ] = E[ X \, 1_{\{Y = y_i\}} ]. To see this, note that E[ E[X \mid Y] \, 1_{\{Y = y_i\}} ] = E[X \mid Y = y_i] \, P(Y = y_i), while the right side expands to \sum_x x \, P(X = x, Y = y_i), and substituting the definition of E[X \mid Y = y_i] confirms equality. This holds under the assumption that E[|X|] < \infty. When Y is an indicator random variable of an event A, the construction reduces to conditioning on the event A. For a concrete illustration linking to the dice rolling example, suppose two fair six-sided dice are rolled independently, letting D_1 be the outcome of the first die and S = D_1 + D_2 the sum. Then E[S \mid D_1 = k] = k + 3.5 for k = 1, \dots, 6, since E[D_2] = 3.5 by independence, so E[S \mid D_1] = D_1 + 3.5.

Continuous Random Variables

For continuous random variables, the concept of conditional expectation extends the discrete case by relying on probability densities rather than mass functions, allowing computation through integration over the support. Suppose X and Y are jointly continuous random variables with joint probability density function f_{X,Y}(x,y), and let Y have marginal density f_Y(y). The conditional expectation of X given Y = y, denoted E[X \mid Y = y], is defined as E[X \mid Y = y] = \int_{-\infty}^{\infty} x f_{X \mid Y}(x \mid y) \, dx, where the conditional density f_{X \mid Y}(x \mid y) is given by f_{X \mid Y}(x \mid y) = \frac{f_{X,Y}(x,y)}{f_Y(y)} provided f_Y(y) > 0. This formulation requires the joint distribution to be absolutely continuous with respect to on \mathbb{R}^2, ensuring the existence of densities, and f_Y(y) to be positive on the relevant to avoid . The conditional expectation E[X \mid Y] is itself a random variable, defined as a measurable function of Y with respect to the \sigma-algebra generated by Y, denoted \sigma(Y). It satisfies the characterizing property that for any bounded measurable function g, E\left[ E[X \mid Y] \, g(Y) \right] = E\left[ X \, g(Y) \right]. This property ensures E[X \mid Y] captures the best of X based on Y in an sense, analogous to the case but adapted to continuous spaces. A concrete computation arises when X and Y are jointly with means \mu_X and \mu_Y, variances \sigma_X^2 and \sigma_Y^2, and \rho. In this case, E[X \mid Y = y] = \mu_X + \rho \frac{\sigma_X}{\sigma_Y} (y - \mu_Y), which is a of y, highlighting the affine relationship in Gaussian settings.

L² Random Variables

In the context of a (\Omega, \mathcal{F}, P), the space L^2(\Omega, \mathcal{F}, P) of square-integrable random variables forms a with inner product \langle X, Y \rangle = E[XY]. For a sub-σ-algebra \mathcal{G} \subseteq \mathcal{F}, the subspace L^2(\Omega, \mathcal{G}, P) is a closed subspace of this . The conditional expectation E[X \mid \mathcal{G}] of X \in L^2(\Omega, \mathcal{F}, P) is the orthogonal projection of X onto L^2(\Omega, \mathcal{G}, P), unique by theory. The defining property of this is orthogonality in L^2: E[(X - E[X \mid \mathcal{G}]) Z] = 0 for all Z \in L^2(\Omega, \mathcal{G}, P). This ensures that the X - E[X \mid \mathcal{G}] is to every element of the . of the projection follows from the completeness of L^2 and the , which guarantees a unique minimizer of the distance \|X - Y\|_{L^2} over Y \in L^2(\Omega, \mathcal{G}, P). holds up to equivalence classes modulo null sets, as any discrepancy on a set of positive probability would contradict the orthogonality. This L^2 formulation defines conditional expectation in the mean-square sense, emphasizing approximation in the L^2 norm rather than values. It avoids reliance on densities or explicit integrals, unlike formulations for continuous or variables, and serves as a foundational case for broader measure-theoretic extensions.

σ-Algebra Conditioning

The conditional expectation with respect to a sub-σ-algebra provides the most general formulation, applicable to any integrable without requiring finite variance. Consider a (\Omega, \mathcal{F}, P) and an integrable X \in L^1(\Omega, \mathcal{F}, P). For a sub-σ-algebra \mathcal{G} \subset \mathcal{F}, the conditional expectation E[X \mid \mathcal{G}] is defined as the \mathcal{G}-measurable Y satisfying \int_G Y \, dP = \int_G X \, dP for all G \in \mathcal{G}. This characterizing property arises from applying the Radon-Nikodym theorem to the \nu(G) = \int_G X \, dP for G \in \mathcal{G}, which is absolutely continuous with respect to the restriction of P to \mathcal{G}. The Radon-Nikodym theorem guarantees the existence of such a Y, and uniqueness holds up to P-almost sure equality. In the special case of conditioning on a Y, E[X \mid Y] is defined as E[X \mid \sigma(Y)], where \sigma(Y) denotes the σ-algebra generated by Y. This framework accommodates conditioning on non-denumerable information structures, such as filtrations \{\mathcal{F}_t\}_{t \geq 0} in stochastic processes, where \mathcal{F}_t represents the accumulated information up to time t and enables the analysis of adapted processes. For X \in L^2, this conditional expectation coincides with the orthogonal of X onto the closed of \mathcal{G}-measurable square-integrable random variables.

Properties

Linearity and Tower Property

The of conditional expectation is a fundamental algebraic property that mirrors the linearity of unconditional . For constants a, b \in \mathbb{R} and integrable random variables X, Y on a (\Omega, \mathcal{F}, P), the conditional expectation with respect to a sub-\sigma-algebra \mathcal{G} \subset \mathcal{F} satisfies \mathbb{E}[aX + bY \mid \mathcal{G}] = a \mathbb{E}[X \mid \mathcal{G}] + b \mathbb{E}[Y \mid \mathcal{G}] almost surely. This holds because conditional expectation is defined as the \mathcal{G}-measurable random variable Z = \mathbb{E}[X \mid \mathcal{G}] satisfying \int_A Z \, dP = \int_A X \, dP for all A \in \mathcal{G}, and linearity of the integral extends directly to linear combinations: for any A \in \mathcal{G}, \int_A (aX + bY) \, dP = a \int_A X \, dP + b \int_A Y \, dP = a \int_A \mathbb{E}[X \mid \mathcal{G}] \, dP + b \int_A \mathbb{E}[Y \mid \mathcal{G}] \, dP, which implies the result by uniqueness of the conditional expectation. This property facilitates computations involving sums and scalar multiples in conditional settings, such as in filtering problems or risk assessment models. Another key structural property is the tower property, also known as the law of iterated expectations, which governs nested conditioning on nested . If \mathcal{H} \subset \mathcal{G} \subset \mathcal{F} and X is integrable, then \mathbb{E}[\mathbb{E}[X \mid \mathcal{G}] \mid \mathcal{H}] = \mathbb{E}[X \mid \mathcal{H}] almost surely. To see this, let Z = \mathbb{E}[X \mid \mathcal{G}], which is \mathcal{G}-measurable and thus \mathcal{F}-integrable. For any B \in \mathcal{H}, \int_B Z \, dP = \int_B \mathbb{E}[X \mid \mathcal{G}] \, dP = \int_B X \, dP, by the defining property of Z applied over sets in \mathcal{G} (which includes \mathcal{H}). Therefore, \mathbb{E}[Z \mid \mathcal{H}] satisfies the integral condition for \mathbb{E}[X \mid \mathcal{H}], yielding the equality by uniqueness. This double application of the integral characterization underscores the property's reliance on the projection-like nature of conditional expectation onto coarser information structures. A illustration arises in the case with nested . Suppose X, Y, Z are random variables where on Y provides intermediate information relative to Z, such as Z representing coarse outcomes and Y finer partitions. Then \mathbb{E}[\mathbb{E}[X \mid Y] \mid Z] = \mathbb{E}[X \mid Z] , as the tower property simplifies the computation by collapsing the inner to the outer one. For instance, if X is the outcome of a multi-stage experiment, Y the result after the first stage, and Z an even coarser summary, this allows efficient hierarchical evaluation without recomputing full joint distributions. These properties enable the decomposition of complex conditioning hierarchies into manageable steps, essential for applications like sequential decision-making under uncertainty or multi-level stochastic modeling, where information accrues progressively across filtrations.

Monotonicity and Integrability

The conditional expectation operator preserves the order of random variables. Specifically, if X and Y are integrable random variables on a probability space (\Omega, \mathcal{F}, P) such that X \leq Y almost surely, then E[X \mid \mathcal{G}] \leq E[Y \mid \mathcal{G}] almost surely for any sub-\sigma-algebra \mathcal{G} \subseteq \mathcal{F}. This monotonicity property follows directly from the linearity of conditional expectation: since Y - X \geq 0 almost surely and E[(Y - X) \mid \mathcal{G}] \geq 0 (as shown below for non-negative variables), it implies E[Y \mid \mathcal{G}] - E[X \mid \mathcal{G}] \geq 0 almost surely. A key consequence of monotonicity is the preservation of positivity. If X \geq 0 and E[|X|] < \infty, then E[X \mid \mathcal{G}] \geq 0 . To see this, apply monotonicity to the case where Y = 0 (which is non-negative) and X \geq Y, yielding the desired inequality. This property ensures that conditional expectations align with intuitive notions of averaging non-negative quantities. Conditional expectations also satisfy for convex functions. Let \phi: \mathbb{R} \to \mathbb{R} be convex, and suppose X is integrable with E[|\phi(X)|] < \infty. Then \phi(E[X \mid \mathcal{G}]) \leq E[\phi(X) \mid \mathcal{G}] almost surely. For X \in L^1, this follows from the definition of conditional expectation as the integral projection onto \mathcal{G}-measurable functions and the convexity of \phi, which implies the inequality via supporting hyperplanes. In the L^2 setting, where conditional expectation is the orthogonal projection onto the subspace of \mathcal{G}-measurable square-integrable functions, the result holds by applying the unconditional to the conditional distribution. These order-preserving properties extend to integrability conditions. For integrable X, the of the conditional expectation is bounded by the conditional expectation of the : |E[X \mid \mathcal{G}]| \leq E[|X| \mid \mathcal{G}] . This inequality arises because |E[X \mid \mathcal{G}]| = E[\operatorname{sign}(E[X \mid \mathcal{G}]) X \mid \mathcal{G}] (by the defining property on sets where E[X \mid \mathcal{G}] has constant sign) and |\operatorname{sign}(E[X \mid \mathcal{G}]) X| = |X|, so monotonicity or positivity applies to yield the bound. Taking unconditional expectations further gives E[|E[X \mid \mathcal{G}]|] \leq E[|X|], confirming that conditional expectation does not increase overall integrability.

Connections

To Regression

In the framework of L^2 random variables, the conditional expectation E[Y \mid X] serves as the optimal predictor of Y given X in the sense. Specifically, it minimizes E[(Y - g(X))^2] over all \sigma(X)-measurable functions g, as E[Y \mid X] is the orthogonal of Y onto the closed of L^2 consisting of \sigma(X)-measurable random variables; the error Y - E[Y \mid X] is then orthogonal to this , ensuring the minimum is attained uniquely up to almost sure equivalence. This minimization property positions conditional expectation at the core of , where predicting Y from X aims to reduce prediction error. In the setting, if X and Y are jointly normally distributed, E[Y \mid X = x] takes the explicit linear form \mu_Y + \rho \frac{\sigma_Y}{\sigma_X} (x - \mu_X), aligning precisely with the ordinary (OLS) regression line; without joint normality, the regression function E[Y \mid X] is generally nonlinear, allowing for more flexible modeling beyond linear assumptions. To illustrate in the simple linear case, suppose Y = \beta_0 + \beta_1 X + \epsilon where \epsilon is of X with E[\epsilon] = 0 and \operatorname{Var}(\epsilon) = \sigma^2. Then E[Y \mid X = x] = \beta_0 + \beta_1 x, and the decomposes \operatorname{Var}(Y) = E[\operatorname{Var}(Y \mid X)] + \operatorname{Var}(E[Y \mid X]), yielding \operatorname{Var}(Y) = \sigma^2 + \beta_1^2 \operatorname{Var}(X); here, \operatorname{Var}(E[Y \mid X]) quantifies the variance explained by X, while E[\operatorname{Var}(Y \mid X)] = \sigma^2 captures the residual variability. Historically, this probabilistic perspective on conditional expectation links to the Gauss-Markov theorem, which establishes that, under assumptions of , no serial , homoscedasticity, and exogeneity (i.e., E[\epsilon \mid X] = 0), the OLS estimator is the best linear unbiased (BLUE) for the coefficients; in this view, OLS recovers the conditional mean when the model is correctly specified as linear.

To Martingales

In martingale theory, conditional expectation serves as the defining property for a class of processes known as martingales. A martingale is a of random variables (M_n)_{n \geq 0} adapted to an increasing (\mathcal{F}_n)_{n \geq 0} of sigma-algebras such that \mathbb{E}[|M_n|] < \infty for all n and \mathbb{E}[M_{n+1} \mid \mathcal{F}_n] = M_n for each n \geq 0. This condition implies that the process has no predictable drift, making it a model for fair games or unbiased evolution in settings. A key construction linking conditional expectation directly to martingales is Doob's martingale. For an integrable X and a (\mathcal{F}_t)_{t \geq 0}, the process defined by M_t = \mathbb{E}[X \mid \mathcal{F}_t] forms a martingale, as it satisfies the conditional expectation equality with respect to the filtration. This process captures the progressive revelation of information about X through the filtration, preserving the martingale property at each step. The optional sampling theorem extends this framework to stopping times, which are random times adapted to the filtration. For a martingale M and a stopping time \tau that is bounded or satisfies uniform integrability conditions (such as \sup_t \mathbb{E}[|M_t|] < \infty), the theorem asserts that \mathbb{E}[M_\tau] = \mathbb{E}[M_0] almost surely. This result, central to analyzing paths that halt at random times, relies on the martingale property derived from conditional expectations. Martingales constructed via conditional expectations find applications in modeling fair games and solving stopping problems. In a fair game, where each bet has zero expected gain, the gambler's fortune process is a martingale, and the optional sampling theorem implies that the expected fortune at any equals the initial amount. For instance, in the problem—a on \{0, 1, \dots, N\} starting at i (with $0 < i < N) that absorbs at 0 or N, and fair steps (p = 1/2)—the position S_n is a martingale. Applying optional sampling at the ruin time \tau yields the probability of reaching N before 0 as i/N.

References

  1. [1]
    Conditional expectation | Definition, formula, examples - StatLect
    The conditional expectation (or conditional expected value, or conditional mean) is the expected value of a random variable, computed with respect to a ...
  2. [2]
    [PDF] Conditional expectation - Purdue Math
    We are given a probability space (Ω,F0,P) and. A σ-algebra F⊂F0. X ∈ F0 such that E[|X|] < ∞. Conditional expectation of X given F: Denoted by E[X|F].
  3. [3]
    [PDF] FOUNDATIONS THEORY OF PROBABILITY - University of York
    FOUNDATIONS. OF THE. THEORY OF PROBABILITY. BY. A.N. KOLMOGOROV. Second English Edition. TRANSLATION EDITED BY. NATHAN MORRISON. WITH AN ADDED BIBLIOGRPAHY BY.
  4. [4]
    [PDF] 5. Conditional Expectation
    The general definition of conditional expectation is fairly abstract and takes a bit of getting used to. We shall first build intuition by recalling the ...
  5. [5]
    [PDF] 6. Conditional Probability and Expectation
    The measure-theoretic definition of conditional expectation is a bit unintuitive, but we will show how it matches what we already know from earlier study. ...
  6. [6]
    July 1654: Pascal's Letters to Fermat on the "Problem of Points"
    Jul 1, 2009 · In 1654, a French essayist and amateur mathematician named Antoine Gombaud, who was fond of gambling, found himself pondering what is known as “the problem of ...
  7. [7]
    LII. An essay towards solving a problem in the doctrine of chances ...
    This is an essay by Rev. Mr. Bayes, found among his papers, sent to John Canton, and published on 01 January 1763.
  8. [8]
    [PDF] FROM LAPLACE TO SUPERNOVA SN 1987A: BAYESIAN ...
    Later, independently, Laplace greatly developed probability theory, with BT playing a key role. He used it to address many concrete problems in astrophysics. ...
  9. [9]
    [PDF] The origins and legacy of Kolmogorov's Grundbegriffe
    Feb 8, 2003 · This is the most novel chapter; here Kolmogorov defines conditional probability and conditional expectation using the Radon-Nikodym theorem.
  10. [10]
    Joseph Doob (1910 - 2004) - Biography - MacTutor
    Doob built on work by Paul Lévy and, during the 1940's and 1950's, he developed basic martingale theory and many of its applications. Doob's work has become ...
  11. [11]
    [PDF] Martingales - Books by Rene Schilling
    Jul 20, 2014 · Following earlier work by Paul Lévy and Jean Ville, Joseph Leo Doob developed in the 1940s and 1950s basic martingale theory and many of its ...
  12. [12]
    [PDF] Introduction to Probability Theory and Its Applications
    FELLER · An Introduction to Probability Theory and Its Applications,. Volume ... V CONDITIONAL PROBABILITY. STOCHASTIC INDEPENDENCE . 114. 1. Conditional ...
  13. [13]
    [PDF] Probability: Theory and Examples Rick Durrett Version 5 January 11 ...
    Jan 11, 2019 · ... Probability: Theory and Examples. Rick Durrett. Version 5 January 11, 2019. Copyright 2019, All rights reserved. Page 2. ii. Page 3. Preface.
  14. [14]
    [PDF] Probability and Measure - University of Colorado Boulder
    Measure and integral are used together in Chapters 4 and 5 for the study of random sums, the Poisson process, convergence of measures, characteristic functions, ...
  15. [15]
    [PDF] A Conditional expectation - Arizona Math
    Conditional expectations such as E[X|Y = 2] or E[X|Y = 5] are numbers. If we consider E[X|Y = y], it is a number that depends on y. So it is a function of y.
  16. [16]
    [PDF] Conditional Expectation - Texas A&M University
    The conditional expectation of a discrete random variable Y given that X “ x is defined as. ErY | X “ xs “ ÿ y y PrrY “ y | X “ xs. The conditional ...
  17. [17]
    20.2 - Conditional Distributions for Continuous Random Variables
    We can find the conditional mean of Y given X = x just by using the definition in the continuous case. That is: Note that given that the conditional ...
  18. [18]
    [PDF] 18.600: Lecture 26 .1in Conditional expectation
    ▷ It all starts with the definition of conditional probability: P(A|B) = P(AB)/P(B). ▷ If X and Y are jointly discrete random variables, we can use this ...
  19. [19]
    21.1 - Conditional Distribution of Y Given X | STAT 414
    If the conditional distribution of given follows a normal distribution with mean and constant variance, then the conditional variance is:<|control11|><|separator|>
  20. [20]
    [PDF] CONDITIONAL EXPECTATION Definition 1. Let (Ω,F,P) be a ...
    For any real random variable X ∈ L2(Ω,F,P), define E(X |G) to be the orthogonal projection of X onto the closed subspace L2(Ω,G,P). This definition may seem a ...
  21. [21]
    [PDF] Lecture 10 Conditional Expectation
    Since probability is simply an expectation of an indicator, and expectations are linear, it will be easier to work with expectations and no generality will be ...
  22. [22]
    [PDF] Radon-Nikodym Theorem and Conditional Expectation
    Feb 13, 2002 · Conditional expectation reflects the change in unconditional probabilities due to some auxiliary information. The latter is represented by a ...Missing: Kolmogorov Grundbegriffe der Wahrscheinlichkeitstheorie 1933
  23. [23]
    [PDF] Conditional Expectation - UC Davis Mathematics
    Conditional expectation: Remarks on the definition. (4) Existence follows from the Radon-Nikodym theorem. If µ, ν are σ-finite positive measures on (Ω,F) ...<|control11|><|separator|>
  24. [24]
    [PDF] a brief primer on conditional expectation - People
    A BRIEF PRIMER ON CONDITIONAL EXPECTATION. Another way to define the conditional expectation is to invoke the Radon-Nikodym theorem. (By this point students ...
  25. [25]
    [PDF] Lecture 9: Filteration and martingales - MIT OpenCourseWare
    Oct 2, 2013 · Theorem 1. The conditional expectation E[X|G] exists and is unique. Uniqueness means that if Y ' ∈ G is any other random variable satisfying.
  26. [26]
    [PDF] conditional expectation and martingales
    INTRODUCTION. Martingales play a role in stochastic processes roughly similar to that played by conserved quantities in dynamical systems.Missing: development 1930s 1940s
  27. [27]
    [PDF] Lecture 2 : Conditional Expectation II
    [Dur10] Rick Durrett. Probability: theory and examples. Cambridge Series in. Statistical and Probabilistic Mathematics. Cambridge University Press,. Cambridge, ...
  28. [28]
    [PDF] Conditional Expectation - Stat@Duke
    E[X | G] = E E[X | H] G This is called the “tower” property of conditional expectation. If X ∈ L2(Ω,F,P) and {Yn} ⊂ L2(Ω,F,P) and if X and {Yn} are jointly ...
  29. [29]
    [PDF] Conditional Expectation - Stat@Duke
    Oct 28, 2020 · Monotonicity: If X ≥ Z a.s., then E[X | G] ≥ E[Z | G] a.s. for ... This MCT can be used to prove a conditional Fatou's Lemma for Xn ≥ 0:.
  30. [30]
    [PDF] Conditional probability and conditional expectation
    From (1), E[E[X|G]|H] = E[X|H]. E[|X||G]dP hence |E[X|G]| ≤ E[|X||G]. The useful lemma can be used to easily compute conditional expectation. on set C ∈ G1 for ...
  31. [31]
    [PDF] Conditional Expectations and Regression Analysis
    . If the joint distribution of x and y is a normal distribution, then it is straightforward to find an expression for the function E(y|x). In the case of a.
  32. [32]
    Gauss Markov theorem - StatLect
    The Gauss Markov theorem: under what conditions the OLS estimator of the coefficients of a linear regression is BLUE (best linear unbiased estimator).