Expected value
In probability theory, the expected value (also known as the mathematical expectation, expectation, or simply the mean) of a random variable is a measure of the central tendency that represents the long-run average value of the random variable over infinitely many independent repetitions of the associated experiment. For a discrete random variable X taking values x_i with probabilities p_i, the expected value is calculated as E[X] = \sum x_i p_i; for a continuous random variable with probability density function f(x), it is E[X] = \int_{-\infty}^{\infty} x f(x) \, dx.[1] This concept quantifies the "average" outcome weighted by the likelihood of each possibility, distinguishing it from the most probable value, and serves as the cornerstone for understanding distributions in statistics. The concept of expected value originated in the 17th century from analyses of games of chance, with Christiaan Huygens introducing it in 1657 in his treatise De Ratiociniis in Ludo Aleae to compute fair divisions in interrupted games; it was later formalized by Abraham de Moivre in 1718 and advanced by Pierre-Simon Laplace in 1814.[2] Key properties of expected value underpin its utility across disciplines, with linearity of expectation being particularly notable: for any random variables R_1 and R_2 and constants a_1, a_2, E[a_1 R_1 + a_2 R_2] = a_1 E[R_1] + a_2 E[R_2], holding even without independence between the variables.[3] This property enables efficient computations in complex scenarios, such as using indicator random variables where E[I_A] = Pr[A] for an event A.[3] In statistics, expected value defines the population mean \mu, guiding hypothesis testing and confidence intervals; in economics and finance, it informs risk assessment by calculating weighted averages of potential profits and costs, as in net present value analyses for investments where outcomes are probabilistic.[4] For instance, in evaluating a drilling project, expected value aggregates probabilities of dry holes (70%) versus successful yields (30%) to determine long-term viability, often yielding positive returns like $425,000 on average despite variability.[4] Beyond core applications, expected value extends to decision theory and optimization, where it maximizes utility under uncertainty, as in expected utility theory for rational choice.[5] It also appears in algorithms, such as the coupon collector problem, where the expected trials to gather n types is n H_n (with H_n the harmonic number), approximately n \ln n + \gamma n for large n, illustrating its role in computational complexity.[3] Overall, expected value remains indispensable for modeling uncertainty, from insurance pricing to machine learning expectations in neural networks, always emphasizing the balance between probability and payoff.[4]History and Etymology
Historical Development
The concept of expected value emerged in the mid-17th century amid efforts to resolve disputes in gambling, particularly through the correspondence between Blaise Pascal and Pierre de Fermat in 1654. Prompted by the Chevalier de Méré, they addressed the "problem of points," which involved fairly dividing stakes in an interrupted game of chance, such as dice or cards, based on the probabilities of completing the game. Their exchange, preserved in letters, laid foundational principles for calculating fair shares proportional to winning chances, marking the inception of systematic probability reasoning applied to expectations in games.[6] Building on this, Christiaan Huygens formalized the idea in his 1657 treatise De Ratiociniis in Ludo Aleae, the first published work on probability theory. Huygens introduced mathematical expectation as the value a player could reasonably anticipate from a game, using it to analyze fair divisions and advantages in various chance scenarios, such as lotteries and dice rolls. His propositions equated expectation to the weighted average of possible outcomes, providing a practical tool for gamblers and establishing expectation as a core probabilistic concept.[7] Jacob Bernoulli advanced the notion significantly in his posthumously published 1713 work Ars Conjectandi, extending expectations beyond simple games to broader combinatorial outcomes and moral certainty. Bernoulli demonstrated how repeated trials converge to the expected value, introducing the law of large numbers as a theorem justifying the reliability of expectations in empirical settings. His analysis connected expectations to binomial expansions, influencing applications in annuities and demographics.[8] Abraham de Moivre further refined these ideas in his 1718 book The Doctrine of Chances, where he developed approximations linking expectations to the binomial distribution for large numbers of trials. De Moivre's methods allowed estimation of expected outcomes in complex scenarios, bridging combinatorial probability with continuous approximations and enhancing the precision of expectation calculations in insurance and gaming.[9] The modern rigorous framework for expected value was established by Andrey Kolmogorov in his 1933 monograph Grundbegriffe der Wahrscheinlichkeitsrechnung, which axiomatized probability theory using measure theory. Kolmogorov integrated expectation as the Lebesgue integral of a random variable over the probability space, unifying discrete and continuous cases within a general abstract setting and enabling its application across mathematics and sciences.[10]Etymology
The term "expectation" in probability theory originated in the 17th century, deriving from the Latin expectatio, which was introduced in Frans van Schooten's 1657 Latin translation of Christiaan Huygens' treatise De ratiociniis in ludo aleae. This work, based on Huygens' unpublished Dutch manuscript Van Rekeningh in Spelen van Gluck (1656), addressed problems in games of chance, where the concept denoted the anticipated monetary gain a player could reasonably foresee from fair play. The Latin root exspectatio, from the verb exspectare meaning "to look out for" or "to await," aligned with the gambling context of awaiting outcomes, emphasizing a balanced anticipation rather than mere hope.[11][12] In French, the parallel term espérance mathématique ("mathematical hope" or "mathematical expectation") first appeared in a letter by Gabriel Cramer dated May 21, 1728, marking its initial documented use with the modern probabilistic meaning. This phrasing influenced subsequent works, including Pierre-Simon Laplace's adoption of espérance in Théorie analytique des probabilités (1812), where it signified the weighted average outcome. Meanwhile, in German mathematical literature, Erwartungswert ("expected value") emerged as an equivalent, with roots traceable to early 18th-century translations; for instance, Jakob Bernoulli employed related Latin expressions like valor expectationis (value of expectation) in Ars Conjectandi (1713) to describe anticipated gains, and occasionally mediocris to denote the mean or average value in probabilistic calculations.[11][13][14] The English adoption evolved further in the 19th century, with Augustus De Morgan coining "mathematical expectation" in An Essay on Probabilities (1838) to formalize the numerical aspect of the concept. By the 20th century, "expected value" supplanted "expectation" in many English texts to underscore its role as a precise average, avoiding connotations of subjective anticipation; this shift is evident in works like Arne Fisher's The Mathematical Theory of Probabilities (1915), which used the term to highlight the mean of a random variable's distribution.[11]Notations and Terminology
Standard Notations
The standard notation for the expected value of a random variable X is E[X], where E stands for expectation. Alternative notations include \mathcal{E}(X) or \mathbb{E}[X], the latter often using blackboard bold to distinguish it in printed texts. The integral form \int x \, dF(x) represents the expected value in terms of the cumulative distribution function F.[15] For conditional expectation, the subscripted notation E[X \mid Y] is commonly used, indicating the expected value of X given the random variable Y. In statistics, the expected value of a random variable is frequently denoted by \mu, representing the population mean.[16] For multiple random variables, the joint expectation may be written as E[X,Y], denoting the expectation of their product XY.[17]Related Concepts
Variance serves as a fundamental measure of the dispersion or spread of a random variable's values around its expected value, quantifying the average squared deviation from the mean. Formally, for a random variable X, the variance is defined as \operatorname{Var}(X) = E[(X - E[X])^2], which captures the second central moment of the distribution.[18] This concept highlights how expected value acts as the central tendency from which variability is assessed, with higher variance indicating greater unpredictability in outcomes relative to the mean.[19] Covariance extends this idea to pairs of random variables, measuring the joint variability between them by assessing how deviations from their respective expected values tend to align. It is defined as \operatorname{Cov}(X, Y) = E[(X - E[X])(Y - E[Y])] for random variables X and Y, where positive values suggest that above-average occurrences in one variable correspond with above-average in the other, indicating positive association.[20] Conceptually, covariance links the expected values of X and Y to their shared fluctuations, providing insight into dependence without assuming linearity.[21] The moment-generating function (MGF) of a random variable X, denoted M_X(t) = E[e^{tX}], encapsulates all moments of the distribution, with the expected value E[X] corresponding to the first moment obtained by differentiating the MGF and evaluating [at t](/page/AT&T)=0. This relation underscores expected value as the foundational moment from which higher-order moments like variance derive.[22] In essence, the MGF provides a generating tool where expected value emerges as the primary derivative, facilitating analysis of distributional properties.[23] In statistics, the sample mean represents an empirical average computed from observed data, serving as an estimator of the theoretical expected value, which is the population parameter defined probabilistically. While the sample mean varies with each realization of the data, the expected value remains fixed as the long-term average under repeated sampling.[24] This distinction emphasizes that expected value is an intrinsic property of the random variable's distribution, whereas the sample mean approximates it through finite observations.[25] The law of large numbers conceptually ties these ideas together by stating that, under suitable conditions, the sample mean converges to the expected value as the number of independent observations increases, justifying the use of empirical averages to infer theoretical expectations. This convergence, often in probability or almost surely, illustrates how repeated sampling diminishes the influence of variability around the expected value.[26] Thus, it bridges the gap between the abstract expected value and practical statistical inference.[27]Core Definitions
Finite Discrete Random Variables
A finite discrete random variable X takes on a finite number of distinct values x_1, x_2, \dots, x_n in the real numbers, each occurring with probability P(X = x_i) = p_i > 0, where the probability mass function satisfies \sum_{i=1}^n p_i = 1. The expected value E[X], also known as the mean or first moment, is defined as the sum E[X] = \sum_{i=1}^n x_i p_i. This formulation arises in the axiomatic foundations of probability, where the expectation captures the center of mass of the distribution under a discrete uniform measure scaled by probabilities.[28][29] The expected value serves as a weighted average of the possible outcomes, with the probabilities p_i acting as weights that reflect their relative likelihoods; if all p_i = 1/n, it reduces to the arithmetic mean of the x_i. This interpretation aligns with the law of large numbers, indicating that the sample average from many independent repetitions of the experiment converges to E[X].[30] For a fair six-sided die, where X denotes the face value shown and each outcome from 1 to 6 has probability $1/6, the expected value is E[X] = \sum_{k=1}^6 k \cdot \frac{1}{6} = \frac{21}{6} = 3.5. This result implies that, over many rolls, the average outcome approaches 3.5, even though no single roll yields this value.[28] Consider a biased coin flip where X is the payoff: +5 for heads (with P(\text{heads}) = 0.6) and -5 for tails (with P(\text{tails}) = 0.4). The expected value is E[X] = 0.6 \cdot 5 + 0.4 \cdot (-5) = 3 - 2 = 1. In repeated plays, the average payoff would thus approach +1 per flip.[31]Countable Discrete Random Variables
For a countable discrete random variable X taking values in a countable set \{x_i : i \in \mathbb{Z}\}, the expected value is defined as E[X] = \sum_{i=-\infty}^{\infty} x_i P(X = x_i), provided the series converges absolutely, meaning \sum_{i=-\infty}^{\infty} |x_i| P(X = x_i) < \infty.[32] This absolute convergence ensures the sum is well-defined regardless of the enumeration of the support, distinguishing it from the finite case where simple summation always applies without convergence concerns.[32] The expectation exists and is finite if and only if \sum |x_i| P(X = x_i) < \infty, which is equivalent to both the positive part \sum_{x_i > 0} x_i P(X = x_i) < \infty and the negative part \sum_{x_i < 0} |x_i| P(X = x_i) < \infty.[32] A classic example is the geometric distribution, modeling the number of failures before the first success in independent Bernoulli trials with success probability p \in (0,1], where P(X = k) = p (1-p)^k for k = 0, 1, 2, \dots. Here, E[X] = \frac{1-p}{p}, and the series converges due to the exponential decay of probabilities.[33] Another is the Poisson distribution with parameter \lambda > 0, where P(X = k) = e^{-\lambda} \frac{\lambda^k}{k!} for k = 0, 1, 2, \dots, yielding E[X] = \lambda, with convergence assured by the factorial growth in the denominator.[34] The expectation may fail to exist for distributions with heavy tails, where probabilities decay too slowly, causing the series \sum |x_i| P(X = x_i) to diverge. For instance, consider P(X = n) = \frac{1}{n(n+1)} for n = 1, 2, \dots, which satisfies normalization but leads to E[|X|] = \sum_{n=1}^{\infty} \frac{n}{n(n+1)} = \sum_{n=1}^{\infty} \frac{1}{n+1} = \infty, rendering the expectation undefined.[32]Continuous Random Variables
For a continuous random variable X with probability density function f(x), the expected value E[X] is defined as the Lebesgue integral E[X] = \int_{-\infty}^{\infty} x f(x) \, dx, provided the integral exists.[35][36] This requires that f(x) \geq 0 for all x, \int_{-\infty}^{\infty} f(x) \, dx = 1, and absolute convergence of the integral, i.e., \int_{-\infty}^{\infty} |x| f(x) \, dx < \infty.[35][36] Without absolute convergence, the expected value is undefined, even if the principal value exists.[36] An equivalent expression for E[X] can be obtained using the cumulative distribution function F(x) = \int_{-\infty}^{x} f(t) \, dt, which facilitates computation in cases where differentiating the CDF to obtain f(x) is cumbersome: E[X] = \int_{0}^{\infty} [1 - F(x)] \, dx - \int_{-\infty}^{0} F(x) \, dx. This tail formula decomposes the expectation into contributions from the positive and negative parts of X, with each integral representing the expected contribution from the respective tail of the distribution.[37] A classic example is the uniform distribution on the interval [a, b], where a < b and the density is f(x) = \frac{1}{b-a} for x \in [a, b] and 0 otherwise. The expected value is E[X] = \int_{a}^{b} x \cdot \frac{1}{b-a} \, dx = \frac{a + b}{2}, the midpoint of the interval, reflecting the symmetry of the distribution.[35] For the exponential distribution with rate parameter \lambda > 0, the density is f(x) = \lambda e^{-\lambda x} for x \geq 0 and 0 otherwise. The expected value is E[X] = \int_{0}^{\infty} x \lambda e^{-\lambda x} \, dx = \frac{1}{\lambda}, which corresponds to the mean waiting time in a Poisson process with rate \lambda.[35][38] Using the tail formula, since F(x) = 1 - e^{-\lambda x} for x \geq 0, it simplifies to \int_{0}^{\infty} e^{-\lambda x} \, dx = \frac{1}{\lambda}, confirming the result without direct integration against the density.[37]Advanced Definitions
General Real-Valued Random Variables
In measure-theoretic probability, the expected value of a real-valued random variable X: \Omega \to \mathbb{R} defined on a probability space (\Omega, \mathcal{F}, P) is given by the Lebesgue integral E[X] = \int_{\Omega} X(\omega) \, dP(\omega), provided this integral exists.[39] This definition is equivalent to the integral with respect to the cumulative distribution function F_X of X, E[X] = \int_{-\infty}^{\infty} x \, dF_X(x), where the integral is understood in the Riemann–Stieltjes sense.[40][41] The expected value E[X] is said to exist (and be finite) if and only if E[|X|] < \infty, where E[|X|] = \int_{\Omega} |X(\omega)| \, dP(\omega). In cases where E[|X^+|] < \infty and E[|X^-|] = \infty (or vice versa), E[X] may be defined as +\infty or -\infty, but the absolute expectation is infinite./04:_Expected_Value/4.01:_Definitions_and_Basic_Properties) This measure-theoretic formulation unifies the cases of discrete and continuous random variables: for discrete X taking values in a countable set, the expectation reduces to an integral with respect to the counting measure on that set, recovering the summation form; for continuous X, it corresponds to integration with respect to Lebesgue measure weighted by the density (when it exists).[39] As an illustration, consider a general Bernoulli random variable X on (\Omega, \mathcal{F}, P) such that X(\omega) = 1 if \omega \in A \in \mathcal{F} with P(A) = p \in [0,1] and X(\omega) = 0 otherwise. Then E[X] = \int_{\Omega} X(\omega) \, dP(\omega) = 1 \cdot P(A) + 0 \cdot P(A^c) = p, and E[|X|] = p < \infty.Infinite Expected Values
In probability theory, the expected value E[X] of a real-valued random variable X is defined as E[X^+] - E[X^-], where X^+ = \max(X, 0) and X^- = -\min(X, 0) are the positive and negative parts, respectively. [42] If E[X^+] = +\infty and E[X^-] < \infty, then E[X] = +\infty; similarly, E[X] = -\infty if E[X^-] = +\infty and E[X^+] < \infty. [42] The expectation is undefined if both E[X^+] = +\infty and E[X^-] = +\infty. [42] A classic illustration of an infinite expected value is the St. Petersburg paradox, first posed by Nicolaus Bernoulli in a 1713 letter and later analyzed by Daniel Bernoulli in 1738. [43] In this game, a fair coin is flipped until the first heads appears on the k-th trial, yielding a payoff of $2^k units; the expected value is \sum_{k=1}^\infty 2^k \cdot (1/2)^k = \sum_{k=1}^\infty 1 = +\infty. [43] Despite this infinite expectation, rational agents typically value the game at only a finite amount, often due to considerations of utility or risk aversion rather than the raw expectation. [43] Examples of distributions with infinite or undefined expectations include the Cauchy distribution and certain Pareto distributions. For the standard Cauchy distribution with probability density function f(x) = \frac{1}{\pi(1 + x^2)} for x \in \mathbb{R}, the expectation is undefined because both \int_{-\infty}^0 |x| f(x) \, dx = +\infty and \int_0^\infty x f(x) \, dx = +\infty. /05%3A_Special_Distributions/5.14%3A_The_Cauchy_Distribution) Similarly, for a Pareto distribution with shape parameter \alpha \leq 1 and minimum value x_m > 0, the density is f(x) = \frac{\alpha x_m^\alpha}{x^{\alpha+1}} for x \geq x_m, and the expectation E[X] = +\infty since the integral \int_{x_m}^\infty x f(x) \, dx diverges. /05%3A_Special_Distributions/5.36%3A_The_Pareto_Distribution) Such infinite expectations have significant implications, particularly in limit theorems and applications. For instance, the strong law of large numbers fails to converge to a finite limit when the expectation is infinite; for nonnegative random variables with E[X] = +\infty, the sample average \bar{X}_n satisfies \bar{X}_n \to +\infty almost surely as n \to \infty. [42] In finance and risk management, distributions with infinite means, such as heavy-tailed Pareto models for losses or returns, challenge traditional risk measures like value-at-risk, as extreme events dominate and standard averaging breaks down, necessitating alternative approaches like tail dependence or infinite-mean estimators. [44]Properties
Basic Properties
The expected value, often denoted as E[X] for a random variable X, possesses several fundamental algebraic properties that underpin its utility in probability theory. These properties hold under minimal assumptions, such as the finiteness of the expected value, and apply to both discrete and continuous random variables. They are derived directly from the definitions of expected value as a sum or integral, leveraging the linearity of summation and integration. One of the most essential properties is linearity of expectation, which states that for any constants a and b and random variables X and Y (which may be dependent or independent), E[aX + bY] = a E[X] + b E[Y]. This holds regardless of the joint distribution of X and Y, making it particularly powerful for computations involving sums of random variables. The proof follows from the definition: for discrete cases, E[aX + bY] = \sum (a x_i + b y_i) P(X=x_i, Y=y_i) = a \sum x_i P(X=x_i, Y=y_i) + b \sum y_i P(X=x_i, Y=y_i) = a E[X] + b E[Y], using the linearity of finite sums; a similar argument applies to integrals for continuous cases. Another basic property is monotonicity: if X \leq Y almost surely (i.e., with probability 1), and both expected values are finite, then E[X] \leq E[Y]. This follows by applying linearity to E[Y - X] = E[Y] - E[X] and noting that Y - X \geq 0 almost surely, which implies E[Y - X] \geq 0 (see non-negativity below). For proof sketches, in the discrete case, the sum \sum (y_i - x_i) P(X=x_i, Y=y_i) \geq 0 since each term is non-negative; integration yields the continuous analog. Non-negativity asserts that if X \geq 0 almost surely, then E[X] \geq 0 (assuming finiteness). The proof is immediate from the definition, as the sum or integral of non-negative terms weighted by probabilities (which are non-negative) cannot be negative. This property extends naturally to the expected value of a constant: for any constant c, E = c, since the random variable is constant with probability 1, and the sum or integral simplifies directly to c. A useful consequence arises with indicator random variables. For an event A, the indicator $1_A (which equals 1 if A occurs and 0 otherwise) has E[1_A] = P(A), directly from the definition since E[1_A] = 1 \cdot P(A) + 0 \cdot (1 - P(A)) = P(A) in the discrete case, or by integration over the density in the continuous case. This connection highlights how expected value generalizes probability measures.Inequalities
Markov's inequality is a fundamental result in probability theory that bounds the tail probability of a non-negative random variable using its expected value. For a non-negative random variable X with finite expectation and any a > 0, P(X \geq a) \leq \frac{E[X]}{a}. This inequality holds under the assumption that E[X] < \infty, and it applies to both discrete and continuous random variables. The proof relies on the integral representation of the expectation for non-negative X: E[X] = \int_0^\infty P(X \geq t) \, dt. Since P(X \geq t) is non-increasing, the integral from a to \infty satisfies \int_a^\infty P(X \geq t) \, dt \geq a \cdot P(X \geq a), leading directly to the bound. For discrete cases, a similar summation argument yields E[X] = \sum_{k=1}^\infty P(X \geq k) \geq a \cdot P(X \geq a). Equality holds if P(X = 0) + P(X = a) = 1. Chebyshev's inequality extends Markov's result to bound deviations from the mean using the variance. For a random variable X with finite mean \mu = E[X] and variance \sigma^2 = \operatorname{Var}(X) < \infty, and for any k > 0, P(|X - \mu| \geq k \sigma) \leq \frac{1}{k^2}. This assumes the existence of the second moment E[X^2] < \infty. The inequality provides a distribution-free upper bound on the probability of large deviations. The proof follows by applying Markov's inequality to the non-negative random variable Y = (X - \mu)^2: P(|X - \mu| \geq k \sigma) = P(Y \geq k^2 \sigma^2) \leq E[Y] / (k^2 \sigma^2) = \sigma^2 / (k^2 \sigma^2) = 1/k^2. Equality occurs when P(|X - \mu| = k \sigma) = 1. Jensen's inequality relates the expected value of a function to the function of the expected value for convex functions. If \phi is a convex function and X is a random variable with finite expectation E[X] < \infty, then \phi(E[X]) \leq E[\phi(X)], provided E[|\phi(X)|] < \infty. For concave \phi, the inequality reverses. This holds for real-valued random variables where the relevant moments exist. The proof uses the definition of convexity: for any x, y and \lambda \in [0,1], \phi(\lambda x + (1-\lambda) y) \leq \lambda \phi(x) + (1-\lambda) \phi(y). Expressing E[X] as an integral or sum, the inequality follows by integrating the convexity condition with respect to the distribution of X. For twice-differentiable \phi, non-negativity of \phi'' implies convexity and supports the result via Taylor expansion. Equality holds if \phi is affine on the support of X or if X is constant almost surely. Hölder's inequality generalizes the Cauchy-Schwarz inequality to bound the expectation of products using conjugate exponents. For random variables X and Y with finite moments E[|X|^p] < \infty and E[|Y|^q] < \infty, where p > 1, q = p/(p-1) (so $1/p + 1/q = 1), |E[XY]| \leq \left( E[|X|^p] \right)^{1/p} \left( E[|Y|^q] \right)^{1/q}. This assumes the p-th and q-th moments exist and are finite. The case p = q = 2 recovers Cauchy-Schwarz. The proof employs Young's inequality for products: for a = |X|^p / p, b = |Y|^q / q, ab \leq a + b, leading to |XY| \leq |X|^p / p + |Y|^q / q. Taking expectations and optimizing yields the bound. Equality holds when |X|^p and |Y|^q are proportional almost surely.Convergence and Limits
In probability theory, the expected value of a sequence of random variables does not necessarily converge to the expected value of the limit under mere pointwise or probabilistic convergence, necessitating specific conditions to interchange limits and expectations. These conditions arise from measure-theoretic foundations and ensure the preservation of integrability and the validity of limit operations on expectations. The monotone convergence theorem provides one such condition for non-negative sequences. Specifically, if (X_n)_{n=1}^\infty is a sequence of non-negative random variables such that X_n \uparrow X almost surely (i.e., $0 \leq X_1(\omega) \leq X_2(\omega) \leq \cdots \leq X(\omega) for almost all \omega), then \mathbb{E}[X_n] \uparrow \mathbb{E}[X].[45] This theorem guarantees that the expectations increase monotonically to the expectation of the limit, allowing the interchange of limit and expectation under monotonicity. A more general result is the dominated convergence theorem, which relaxes the monotonicity requirement at the cost of an integrability bound. If X_n \to X almost surely, and there exists a random variable Y with \mathbb{E}[|Y|] < \infty such that |X_n| \leq Y almost surely for all n, then \mathbb{E}[X_n] \to \mathbb{E}[X] and \mathbb{E}[|X_n - X|] \to 0.[45] In probabilistic terms, the almost sure convergence can be weakened to convergence in probability under the same domination condition. This theorem is pivotal for establishing convergence of expectations in settings where sequences are bounded by an integrable dominator, such as in stochastic processes or limit theorems. Even without domination or monotonicity, uniform integrability offers a sufficient condition for interchanging limits and expectations. A sequence (X_n) is uniformly integrable if \lim_{c \to \infty} \sup_n \mathbb{E}[|X_n| \mathbf{1}_{|X_n| \geq c}] = 0. If X_n \to X almost surely, \mathbb{E}[|X_n|] < \infty for all n, and (X_n) is uniformly integrable, then \mathbb{E}[X] < \infty and \mathbb{E}[X_n] \to \mathbb{E}[X].[46] Uniform integrability controls the contribution of large tails uniformly across the sequence, ensuring L¹ convergence and thus the desired limit for expectations; it is equivalent to the condition that \mathbb{E}[|X_n - X|] \to 0 under almost sure convergence.[46] Fatou's lemma provides an inequality rather than equality, serving as a foundational tool for proving the above theorems. For a sequence of non-negative random variables X_n \geq 0, it states that \mathbb{E}[\liminf_{n \to \infty} X_n] \leq \liminf_{n \to \infty} \mathbb{E}[X_n].[45] This lower semicontinuity of the expectation functional holds without additional assumptions beyond non-negativity, bounding the expectation of the limit inferior by the limit inferior of the expectations. Convergence in probability alone does not suffice to preserve expectations, as illustrated by counterexamples where the mass of the distribution "escapes" to infinity. Consider a uniform random variable U on [0,1], and define X_n = n if U \leq 1/n and X_n = 0 otherwise. Then X_n \to 0 in probability, since \mathbb{P}(|X_n| > \epsilon) = 1/n \to 0 for any \epsilon > 0, but \mathbb{E}[X_n] = n \cdot (1/n) = 1 \not\to 0.[47] This "spiking" or "moving bump" phenomenon highlights the need for tail control, as the rare but large values prevent expectation convergence despite probabilistic convergence to zero.Expected Values of Distributions
Discrete Distributions
The expected value of a discrete random variable X with probability mass function p(x) is given by E[X] = \sum_x x \, p(x), where the sum is over the support of X. For the Bernoulli distribution, X takes values 0 or 1 with success probability p, so the PMF is p(0) = 1 - p and p(1) = p. The expected value is E[X] = 0 \cdot (1 - p) + 1 \cdot p = p.[48] The binomial distribution models the number of successes in n independent Bernoulli trials, each with success probability p. The PMF is p(x) = \binom{n}{x} p^x (1 - p)^{n - x} for x = 0, 1, \dots, n. The expected value follows from the linearity of expectation applied to the sum of n indicator variables, yielding E[X] = np.[48] The negative binomial distribution counts the number of failures before the r-th success in independent Bernoulli trials with success probability p. The PMF is p(x) = \binom{x + r - 1}{x} p^r (1 - p)^x for x = 0, 1, 2, \dots. The expected value is E[X] = r(1 - p)/p, derived by viewing X as the sum of r independent geometric random variables each counting failures before a success.[49] The Poisson distribution with parameter \lambda > 0 models the number of events in a fixed interval, with PMF p(k) = \frac{\lambda^k e^{-\lambda}}{k!} for k = 0, 1, 2, \dots. The expected value is E[Y] = \sum_{k=0}^\infty k \frac{\lambda^k e^{-\lambda}}{k!} = \lambda e^{-\lambda} \sum_{k=1}^\infty \frac{\lambda^{k-1}}{(k-1)!} = \lambda, recognizing the sum as e^\lambda.[50] The geometric distribution, in the convention of trials until the first success, has PMF p(x) = (1 - [p](/page/P′′))^{x-1} [p](/page/P′′) for x = [1](/page/1), 2, [3, \dots](/page/3_Dots), where [p](/page/P′′) is the success probability. The expected value is E[X] = \sum_{x=[1](/page/1)}^\infty x (1 - [p](/page/P′′))^{x-1} [p](/page/P′′) = \frac{[1](/page/1)}{[p](/page/P′′)}, obtained by differentiating the geometric series sum \sum_{x=[0](/page/0)}^\infty [q](/page/Q)^x = 1/(1 - [q](/page/Q)) for [q](/page/Q) = 1 - [p](/page/P′′).[51]| Distribution | Parameters | Expected Value E[X] |
|---|---|---|
| Bernoulli | [p](/page/P′′) \in (0,1) | [p](/page/P′′) |
| Binomial | n \in \mathbb{N}, [p](/page/P′′) \in (0,1) | n[p](/page/P′′) |
| Negative Binomial | r \in \mathbb{N}, [p](/page/P′′) \in (0,1) | r(1-[p](/page/P′′))/[p](/page/P′′) |
| Poisson | \lambda > 0 | \lambda |
| Geometric | [p](/page/P′′) \in (0,1) | [1](/page/1)/[p](/page/P′′) |
Continuous Distributions
For continuous random variables, the expected value is defined as the integral of the product of the variable and its probability density function (pdf) over the entire real line, ensuring the integral converges: E[X] = \int_{-\infty}^{\infty} x f(x) \, dx, where f(x) is the pdf.[52] This contrasts with discrete cases by replacing summation with integration, providing the long-run average value under the distribution.[52] Common continuous distributions have closed-form expected values derived through direct integration. For the uniform distribution on [a, b] with pdf f(x) = \frac{1}{b-a} for a \leq x \leq b (and 0 otherwise), the expected value is obtained by E[X] = \int_a^b x \cdot \frac{1}{b-a} \, dx = \frac{a + b}{2}.[52] For the exponential distribution with rate parameter \lambda > 0 and pdf f(x) = \lambda e^{-\lambda x} for x \geq 0 (and 0 otherwise), integration by parts yields E[X] = \int_0^{\infty} x \lambda e^{-\lambda x} \, dx = \frac{1}{\lambda}.[53] The normal distribution N(\mu, \sigma^2), with pdf f(x) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left( -\frac{(x - \mu)^2}{2\sigma^2} \right), has expected value E[X] = \mu, as the mean parameter directly locates the distribution's center, verifiable by symmetry or completing the square in the integral.[54] For the gamma distribution with shape \alpha > 0 and scale \beta > 0, pdf f(x) = \frac{x^{\alpha-1} e^{-x/\beta}}{\beta^\alpha \Gamma(\alpha)} for x > 0 (and 0 otherwise), the expected value integrates to E[X] = \alpha \beta.[55] Similarly, the beta distribution on [0, 1] with shape parameters \alpha > 0 and \beta > 0, pdf f(x) = \frac{x^{\alpha-1} (1-x)^{\beta-1}}{B(\alpha, \beta)} where B is the beta function, gives E[X] = \frac{\alpha}{\alpha + \beta} via beta function properties in the integral.[56] The following table summarizes the parameters and expected values for these distributions:| Distribution | Parameters | Expected Value E[X] |
|---|---|---|
| Uniform | a, b (a < b) | \frac{a + b}{2} |
| Exponential | \lambda > 0 | \frac{1}{\lambda} |
| Normal | \mu, \sigma^2 > 0 | \mu |
| Gamma | \alpha > 0, \beta > 0 | \alpha \beta |
| Beta | \alpha > 0, \beta > 0 | \frac{\alpha}{\alpha + \beta} |
Computation and Extensions
Numerical Computation
When closed-form expressions for the expected value of a random variable are unavailable or computationally intractable, numerical methods provide approximations by leveraging sampling, integration techniques, or series approximations. These approaches are essential in fields like finance, physics, and machine learning, where distributions may be complex or high-dimensional.[57] Monte Carlo simulation offers a straightforward way to estimate the expected value by generating independent samples from the underlying distribution. For a random variable X with distribution F, the estimator is the sample mean \hat{\mu} = \frac{1}{n} \sum_{i=1}^n x_i, where x_i are drawn from F, which converges to E[X] as n \to \infty by the law of large numbers. This method is unbiased and widely used for its simplicity in multidimensional settings.[57] Importance sampling enhances Monte Carlo estimation, particularly for rare events or expectations involving heavy-tailed distributions, by drawing samples from a proposal distribution g that is easier to sample from and reweighting them to match the target distribution f. The estimator becomes \hat{\mu} = \frac{1}{n} \sum_{i=1}^n x_i \frac{f(x_i)}{g(x_i)}, reducing variance when g approximates the behavior of f in regions of interest. This technique, rooted in variance reduction strategies, is crucial for efficient computation in risk analysis and particle physics simulations. For continuous random variables where the density f(x) is known, numerical integration approximates E[X] = \int x f(x) \, dx using quadrature rules. Methods like Simpson's rule divide the integration domain into subintervals and apply polynomial approximations, yielding high accuracy for smooth functions with error scaling as O(h^4), where h is the step size. Gaussian quadrature, which chooses optimal nodes and weights, is particularly effective for expectations over finite intervals, often exact for polynomials up to degree $2n-1 with n points. These techniques are implemented in libraries for reliable one-dimensional computations.[58] In discrete cases with countable support, the expected value is an infinite sum E[X] = \sum_{k=1}^\infty k p_k, which can be approximated by truncating at a finite N such that the tail \sum_{k=N+1}^\infty k p_k is bounded below a tolerance. Error bounds rely on tail estimates, such as geometric decay if probabilities decrease exponentially, ensuring the remainder is less than \epsilon with controlled truncation level N. Adaptive strategies adjust N dynamically based on partial sums to balance accuracy and efficiency. Software tools facilitate these computations in practice. In Python, libraries like NumPy provide functions such asnumpy.mean for Monte Carlo sample averages and scipy.integrate.quad for quadrature-based expectations. Similarly, R's base package includes mean for simulations and integrate for numerical integration, with extensions like mc2d for advanced Monte Carlo variance reduction. These implementations handle large-scale approximations efficiently without requiring custom code.
Error analysis is vital for assessing reliability. For the Monte Carlo estimator, the variance is \frac{\mathrm{Var}(X)}{n}, leading to a standard error of \sqrt{\frac{\mathrm{Var}(X)}{n}}; confidence intervals follow from the central limit theorem, approximating \hat{\mu} \pm z_{\alpha/2} \sqrt{\frac{s^2}{n}} where s^2 estimates \mathrm{Var}(X) and z_{\alpha/2} is the normal quantile. Importance sampling reduces this variance but requires checking effective sample size via weight diagnostics to avoid instability. Quadrature errors are deterministic and bounded by rule-specific formulas, while truncation errors use remainder theorems for guarantees. These metrics guide the choice of sample size or grid resolution to achieve desired precision.[57][58]