Probability space

A probability space is a fundamental mathematical structure in probability theory that formalizes the modeling of random phenomena, defined as a triple (\Omega, \mathcal{F}, P), where \Omega is the sample space representing all possible outcomes, \mathcal{F} is a \sigma-algebra of measurable events (subsets of \Omega), and P is a probability measure assigning non-negative probabilities to events in \mathcal{F} that sum to 1 for the entire space.^[1] This framework ensures a rigorous, axiomatic approach to probability, enabling the analysis of complex stochastic processes across fields like statistics, physics, and finance.^[2] The sample space \Omega captures the totality of outcomes in an experiment, such as drawing a ball from an urn where \Omega = \{\text{red}, \text{blue}\}, while the \sigma-algebra \mathcal{F} specifies the collection of events to which probabilities can be assigned, satisfying closure under complements and countable unions to handle infinite or continuous cases.^[1] The probability measure P: \mathcal{F} \to [0,1] quantifies the likelihood of events, adhering to Kolmogorov's axioms: non-negativity (P(A) \geq 0 for all A \in \mathcal{F}), normalization (P(\Omega) = 1), and countable additivity (P(\bigcup_{i=1}^\infty A_i) = \sum_{i=1}^\infty P(A_i) for disjoint events A_i).^[2] These components allow probability spaces to model both discrete and continuous distributions.^[1] This axiomatic foundation was established by Andrey Kolmogorov in his 1933 monograph Foundations of the Theory of Probability (originally published in German as Grundbegriffe der Wahrscheinlichkeitsrechnung), which unified disparate probabilistic concepts under measure theory, providing a logically consistent basis that resolved earlier inconsistencies in classical and frequentist approaches.^[3] Kolmogorov's work provided the modern axiomatic foundations of probability theory. In practice, probability spaces are central to probability theory and its applications.^[4]

Fundamentals

Introduction

A probability space forms the foundational structure in modern probability theory, providing a rigorous mathematical framework for modeling uncertainty and randomness. This concept originated with Andrey Kolmogorov's seminal 1933 work, Grundbegriffe der Wahrscheinlichkeitsrechnung, which axiomatized probability using measure theory to unify the previously disparate treatments of discrete and continuous probabilities.^[5]^[3] Prior to this, probability calculations often relied on ad hoc methods suited to specific cases, but Kolmogorov's approach established a general foundation that connected empirical observations of frequencies to abstract mathematical principles.^[5] The primary purpose of a probability space is to serve as a mathematical model for random experiments, enabling the clear distinction between individual sample outcomes, collections of such outcomes known as events, and the assignment of probabilities to those events.^[5] By formalizing these elements, it allows probabilists to analyze the likelihood of occurrences in a consistent manner, bridging intuitive notions of chance with precise computations applicable across diverse fields such as statistics, physics, and finance. This modeling capability ensures that probabilities reflect both long-run frequencies in repeated trials and the inherent unpredictability of single events.^[5] At its core, the intuition behind a probability space lies in conceptualizing the sample space as the collection of all possible outcomes from an experiment, with events represented as subsets of this space and probability functioning as a measure quantifying their relative likelihood.^[6] This structure assumes basic knowledge of set theory, including notions of sets and subsets, to build toward more advanced topics like the probability measure explored in subsequent sections.^[7]

Basic Components

A probability space consists of three fundamental components that provide the foundation for modeling uncertainty in random experiments: the sample space, the event space, and the probability assignment. These elements work together to describe all possible outcomes, the observable groupings of those outcomes, and the likelihoods associated with them, respectively.^[7] The sample space, denoted \Omega, is the set encompassing all possible outcomes of a random experiment. It represents the universal collection of results that could occur, and it may be finite, countably infinite, or uncountably infinite depending on the nature of the experiment.^[8] The event space, often denoted \Sigma or \mathcal{F}, is a collection of subsets of the sample space \Omega that correspond to the events of interest—measurable groupings of outcomes that can be observed or queried. This collection must be structured to allow logical combinations of events, being closed under complements and countable unions (and thus countable intersections); these properties ensure that if one event occurs, related events can also be meaningfully defined. Such a structure is known as a \sigma-algebra, though its formal properties are detailed later.^[7]^[8] The probability assignment, denoted P, is a function that maps each event in the event space to a real number between 0 and 1, indicating the likelihood or "degree of belief" in that event occurring. Informally, it satisfies key requirements: P(\Omega) = 1, reflecting certainty that some outcome in the sample space will occur; P(\emptyset) = 0, as the empty set (impossible event) has no chance of occurring; and for a countable collection of disjoint events (mutually exclusive), the probability of their union equals the sum of their individual probabilities, capturing countable additivity. These ensure the assignment behaves intuitively as a measure of chance.^[7]^[8]^[9]

Formal Framework

General Definition

A probability space is formally defined as a triple (\Omega, \Sigma, P), where \Omega is the sample space representing the set of all possible outcomes of a random experiment, \Sigma is a \sigma-algebra of subsets of \Omega (called events), and P: \Sigma \to [0,1] is a probability measure assigning probabilities to events.^[10] The \sigma-algebra \Sigma provides the structure for measurable events and is defined as a collection of subsets of \Omega that includes \Omega and the empty set \emptyset, and is closed under complements (if E \in \Sigma, then \Omega \setminus E \in \Sigma) and countable unions (if \{E_i\}_{i=1}^\infty \subseteq \Sigma, then \bigcup_{i=1}^\infty E_i \in \Sigma).^[10] It is also closed under countable intersections as a consequence of the closure under complements and unions.^[7] The probability measure P satisfies the Kolmogorov axioms, which form the foundational principles of modern probability theory:

\begin{align*} &(1) && P(E) \geq 0 && \text{for all } E \in \Sigma, \\ &(2) && P(\Omega) = 1, \\ &(3) && P\left( \bigcup_{i=1}^\infty E_i \right) = \sum_{i=1}^\infty P(E_i) && \text{for any countable collection of pairwise disjoint events } \{E_i\}_{i=1}^\infty \subseteq \Sigma. \end{align*}

^[10] This general framework applies to both discrete and continuous cases, as detailed in subsequent sections.^[7]

Probability Measure

In a probability space (\Omega, \Sigma, P), the probability measure P is a function that assigns to each event A \in \Sigma a real number P(A) between 0 and 1, representing the probability of A occurring. This measure is distinguished from a general measure by its normalization property: P(\Omega) = 1, ensuring the total probability over the sample space is unity.^[10] The core axiom defining P is countable additivity, which states that if \{E_n\}_{n=1}^\infty is a countable collection of pairwise disjoint events in \Sigma, then

P\left( \bigcup_{n=1}^\infty E_n \right) = \sum_{n=1}^\infty P(E_n).

This property extends finite additivity to infinite collections, allowing the measure to handle uncountably many outcomes in continuous spaces while maintaining consistency. Non-negativity, P(A) \geq 0 for all A \in \Sigma, and the normalization complete the axiomatic foundation established by Kolmogorov.^[10] From these axioms, several derived properties follow. Monotonicity holds: if A \subseteq B, then P(A) \leq P(B), as B = A \cup (B \setminus A) and the sets are disjoint. Subadditivity is also implied: for any A, B \in \Sigma,

P(A \cup B) \leq P(A) + P(B),

with equality if A and B are disjoint. These ensure the measure behaves intuitively for unions and inclusions.^[10] The construction of P often begins with finite additivity on a simpler collection, such as an algebra of sets in discrete cases where P counts outcomes proportionally. For general spaces, Carathéodory's extension theorem provides a method to extend a finitely additive, non-negative set function \mu on an algebra \mathcal{A} (with \mu(\Omega) = 1) to a countably additive probability measure on the generated \sigma-algebra \sigma(\mathcal{A}), via the outer measure \mu^*(E) = \inf \left\{ \sum \mu(A_i) : E \subseteq \bigcup A_i, A_i \in \mathcal{A} \right\} and identifying measurable sets. This theorem guarantees existence for probability measures like the Lebesgue measure on [0,1], starting from interval lengths.^[11] Uniqueness of P on \Sigma is ensured if it is specified on a \pi-system \mathcal{P} (closed under finite intersections) that generates \Sigma, by Carathéodory's extension theorem: any two probability measures agreeing on \mathcal{P} coincide on \sigma(\mathcal{P}). This relies on the \pi-\lambda theorem, showing the collection where measures agree forms a \lambda-system containing \mathcal{P}, hence equals \sigma(\mathcal{P}).^[12]

Special Cases

Discrete Probability Spaces

A discrete probability space is a specialization of the general probability space framework where the sample space \Omega is a countable set, either finite or countably infinite. This structure aligns with the axiomatic foundations of probability theory, where the sample space consists of discrete outcomes that can be enumerated.^[3] In such spaces, every subset of \Omega is measurable, so the \sigma-algebra \mathcal{F} is the power set of \Omega, comprising all possible subsets.^[1] The probability measure P on a discrete probability space assigns a non-negative probability p_\omega = P(\{\omega\}) \geq 0 to each singleton \{\omega\} for \omega \in \Omega, satisfying the normalization condition \sum_{\omega \in \Omega} p_\omega = 1. This measure extends additively to any event E \subseteq \Omega by P(E) = \sum_{\omega \in E} p_\omega, ensuring that the Kolmogorov axioms of non-negativity, normalization, and countable additivity are met automatically due to the discrete nature of the space.^[3]^[13] A classic example of constructing a discrete probability space is the uniform distribution on a finite sample space, such as the outcomes of rolling a fair six-sided die, where \Omega = \{1, 2, 3, 4, 5, 6\} and p_\omega = \frac{1}{6} for each \omega \in \Omega. For countably infinite spaces, the geometric distribution provides an illustration: let \Omega = \{0, 1, 2, \dots \} represent the number of failures before the first success in independent Bernoulli trials with success probability p \in (0,1), and set p_k = (1-p)^k p for k \in \Omega, which sums to 1 over the natural numbers.^[14] Discrete probability spaces offer the advantage of straightforward computability, as probabilities of events can be calculated using finite or convergent infinite sums without requiring integration or advanced measure-theoretic tools. This simplicity facilitates explicit calculations and simulations in applications like combinatorics and algorithm analysis.^[15]

Continuous Probability Spaces

In continuous probability spaces, the sample space \Omega is uncountable, such as the interval [0,1] or \mathbb{R}^n, representing outcomes that form a continuum rather than discrete points.^[16] The associated \sigma-algebra \mathcal{F} is typically the Borel \sigma-algebra generated by the open sets in the standard topology on \mathbb{R}^n, or the Lebesgue \sigma-algebra, which is its completion with respect to Lebesgue measure.^[16] The probability measure P: \mathcal{F} \to [0,1] is often taken to be absolutely continuous with respect to the Lebesgue measure \mu on \mathbb{R}^n. By the Radon-Nikodym theorem, under this absolute continuity, P admits a representation via a probability density function f: \mathbb{R}^n \to [0,\infty) that is measurable and integrable, such that

P(E) = \int_E f \, d\mu

for all E \in \mathcal{F}, with the normalization condition \int_{\mathbb{R}^n} f \, d\mu = 1.^[17] This density f uniquely determines P up to \mu-almost everywhere equivalence.^[17] A defining property of such spaces is the absence of point masses: for any singleton \{\omega\} \in \mathcal{F}, P(\{\omega\}) = 0, reflecting the diffuse nature of the measure across the continuum.^[16] Continuous probability spaces are thus non-atomic, meaning no single outcome carries positive probability.^[17]

Non-Atomic Probability Spaces

A non-atomic probability space, also known as an atomless probability space, is a probability space (\Omega, \mathcal{F}, P) where the probability measure P satisfies the condition that for every event A \in \mathcal{F} with P(A) > 0, there exists a subevent B \in \mathcal{F} such that B \subset A and $0 < P(B) < P(A).^[18] This property ensures that no indivisible "atoms" exist in the space, meaning the measure can be subdivided arbitrarily without concentrating positive probability on single points or irreducible sets.^[19] Non-atomic probability spaces are measure-theoretically isomorphic to the unit interval [0,1] equipped with the Lebesgue measure, up to a null set. This equivalence, known as Rokhlin's theorem, establishes that any separable, complete, non-atomic probability space can be mapped continuously onto [0,1] in a way that preserves the measure structure, generalizing the uniform distribution on the interval.^[20] A key result characterizing non-atomic spaces is the Lyapunov convexity theorem, which states that for a non-atomic vector measure taking values in a finite-dimensional Euclidean space, the range of the measure—namely, the set \{ \mu(E) : E \in \Sigma \}, where \Sigma is the \sigma-algebra—is compact and convex.^[21] This convexity property arises from the atomless nature of the underlying measure and has significant implications for optimization and control theory by ensuring that intermediate values in the range can be achieved through suitable partitions of sets.^[22] The canonical example of a non-atomic probability space is the unit interval [0,1] with the Borel \sigma-algebra and the Lebesgue measure, where subsets of any positive measure can be split into subintervals with measures filling the continuum between 0 and the original measure.^[19] More generally, any space with a measure absolutely continuous with respect to Lebesgue measure on \mathbb{R}^n, such as Gaussian distributions on \mathbb{R}, inherits this non-atomic structure, excluding discrete point masses.^[18] In applications, non-atomic probability spaces model infinite-player games in cooperative game theory, where players form a continuum without individual significance, as developed in the framework of non-atomic games.^[23] Here, coalitions are measurable sets in the space, and the atomless property ensures that no single player affects outcomes, facilitating the extension of value concepts like the Shapley value to such settings.^[24]

Advanced Properties

Completeness

A probability space (\Omega, \Sigma, P) is complete if, for every null set N \in \Sigma with P(N) = 0, every subset A \subset N belongs to \Sigma and satisfies P(A) = 0. This property guarantees that all negligible events—subsets of sets with probability zero—are treated as measurable and assigned zero probability, preventing subtle measurability issues in subsequent analyses.^[25]^[26] The completion of an arbitrary probability space (\Omega, \Sigma, P) involves constructing a larger σ-algebra \bar{\Sigma} that incorporates all subsets of null sets. Specifically, \bar{\Sigma} consists of all sets of the form B \Delta C, where B \in \Sigma and C is a subset of some null set N \in \Sigma with P(N) = 0, or equivalently, all unions B \cup D with B \in \Sigma and D \subset N for such an N. The probability measure is then extended to \bar{P}(B \cup D) = P(B), ensuring \bar{P} agrees with P on \Sigma. This augmentation results in the complete probability space (\Omega, \bar{\Sigma}, \bar{P}), where \bar{\Sigma} is a σ-algebra containing \Sigma.^[25]^[26] Every probability space admits a completion, which is unique up to sets of measure zero; this follows from standard measure-theoretic extensions that preserve the original measure on the initial σ-algebra. For instance, the Lebesgue measure on \mathbb{R}, defined on the Lebesgue σ-algebra (the completion of the Borel σ-algebra), is inherently complete, making (\mathbb{R}, \mathcal{L}, \lambda) a canonical example of a complete probability space when restricted to intervals of finite length.^[25]^[26] Completeness plays a vital role in ensuring the measurability of limits in sequences of events or functions, particularly in the context of almost sure convergence, where convergence holds except on sets of probability zero. Without completeness, limits might fail to be measurable even if they agree almost everywhere with measurable objects, undermining key results in stochastic processes and integration theory. Although the completion process enlarges the σ-algebra, potentially complicating explicit verification of measurability, the standard extension preserves σ-additivity and other measure properties, avoiding violations of the core axioms.^[25]^[26]

Standard Extensions

Standard extensions of probability spaces include constructions that combine multiple spaces or extend them to infinite dimensions while preserving key probabilistic properties. The product probability space for a finite collection of independent probability spaces (\Omega_i, \Sigma_i, P_i) for i = 1, \dots, n is defined on the Cartesian product \Omega = \prod_{i=1}^n \Omega_i, equipped with the product \sigma-algebra \bigotimes_{i=1}^n \Sigma_i generated by the measurable rectangles \prod_{i=1}^n A_i where A_i \in \Sigma_i. The product measure P = \prod_{i=1}^n P_i is the unique probability measure satisfying

P\left( \prod_{i=1}^n A_i \right) = \prod_{i=1}^n P_i(A_i)

for all such rectangles, and extends by \sigma-additivity to the full product \sigma-algebra; this construction assumes the spaces are \sigma-finite to ensure uniqueness.^[27] For infinite products, direct construction is more subtle due to potential inconsistencies, but the framework applies similarly when finite-dimensional marginals are consistent. The Kolmogorov extension theorem addresses infinite products by guaranteeing the existence and uniqueness of a probability measure under appropriate consistency conditions. Given a sequence of probability measures \{\mu_n\}_{n=1}^\infty on (\mathbb{R}^n, \mathcal{B}(\mathbb{R}^n)) that are consistent—meaning for every n \geq 1, k \geq 1, and Borel set E \subset \mathbb{R}^n, \mu_{n+k}(E \times \mathbb{R}^k) = \mu_n(E)—there exists a unique probability measure \mu on the infinite product space (\mathbb{R}^\infty, \mathcal{B}(\mathbb{R}^\infty)), where \mathcal{B}(\mathbb{R}^\infty) is the \sigma-algebra generated by cylinder sets, such that the finite-dimensional marginals satisfy \mu(E \times \mathbb{R}^\infty) = \mu_n(E) for all n and Borel E \subset \mathbb{R}^n.^[28] This theorem, in its basic form for \mathbb{R}-valued spaces, extends to more general settings and is pivotal for rigorous constructions beyond finite dimensions.^[29] Standard probability spaces provide a canonical form for many extensions, often isomorphic to the unit interval [0,1] equipped with Lebesgue measure or, more broadly, to Polish spaces (separable complete metric spaces) with their Borel \sigma-algebra and a Borel probability measure. A key result is the isomorphism theorem, which asserts that every separable, complete, non-atomic probability space—where separability means the \sigma-algebra is countably generated modulo null sets, completeness ensures all subsets of null sets are measurable, and non-atomicity means no atoms exist (sets of positive measure with no proper subsets of the same measure)—is isomorphic to ([0,1], \mathcal{B}([0,1]), m), with m the Lebesgue measure.^[30] The isomorphism is a measure-preserving bijection (modulo null sets) between the spaces, preserving the \sigma-algebra structure. These extensions find essential applications in stochastic processes, particularly for defining measures on path spaces. The Kolmogorov extension theorem enables the construction of processes in continuous time, such as Gaussian processes on \mathbb{R}_{\geq 0}-valued paths, by specifying consistent finite-dimensional distributions with given mean and covariance functions, yielding a unique probability measure on the space of cadlag functions or continuous paths.^[31] Similarly, standard space isomorphisms simplify the analysis of process realizations by mapping them to Lebesgue spaces, facilitating computations in areas like Markov processes and ergodic theory.^[30]

Illustrative Examples

Discrete Examples

A classic finite discrete probability space is provided by the experiment of flipping a fair coin once. The sample space is \Omega = \{H, T\}, where H denotes heads and T denotes tails. The \sigma-algebra \Sigma is the power set of \Omega, consisting of \emptyset, \{H\}, \{T\}, and \{H, T\}. The probability measure P is defined by P(\{H\}) = \frac{1}{2} and P(\{T\}) = \frac{1}{2}, which extends to all events in \Sigma via additivity, such as P(\{H, T\}) = 1.^[32] Another finite example arises from rolling a fair six-sided die. Here, \Omega = \{1, 2, 3, 4, 5, 6\}, with \Sigma again the power set of \Omega. The uniform probability measure assigns P(\{k\}) = \frac{1}{6} for each k \in \Omega, ensuring equal likelihood for each face and extending additively to subsets, for instance P(\{1, 2, 3\}) = \frac{1}{2}. This setup models scenarios with equally probable discrete outcomes.^[32] For a countably infinite discrete space, consider a sequence of independent Bernoulli trials, each with success probability p \in (0,1). The sample space is \Omega = \{0,1\}^{\mathbb{N}}, the set of all infinite sequences of 0s (failure) and 1s (success). The \sigma-algebra \Sigma is the product \sigma-algebra generated by cylinder sets, which are sets defined by fixing finitely many coordinates. The probability measure P is the infinite product of Bernoulli measures, where for a cylinder set specified by outcomes in the first n trials, P is the product \prod_{i=1}^n p^{x_i} (1-p)^{1-x_i} with x_i \in \{0,1\}, and extended to all of \Sigma via Kolmogorov's extension theorem to ensure consistency.^[7] A further countable example derives from the discretization of a Poisson point process, focusing on the number of arrivals in a fixed interval. The sample space is \Omega = \{0, 1, 2, \dots \}, the non-negative integers representing possible counts. The \sigma-algebra \Sigma is the power set of \Omega. The probability measure is given by

P(\{k\}) = e^{-\lambda} \frac{\lambda^k}{k!}, \quad k = 0,1,2,\dots,

where \lambda > 0 is the expected number of arrivals; this arises as the distribution of the count when interarrival times are independent exponential random variables with rate \lambda, summed to yield the total count.^[33] Each of these examples constitutes a valid probability space, as the measure [P](/page/P′′) satisfies normalization [P](/page/P′′)(\Omega) = 1 and countable additivity: for any countable collection of disjoint events \{A_i\}_{i=1}^\infty \subseteq \Sigma, [P](/page/P′′)\left(\bigcup_{i=1}^\infty A_i\right) = \sum_{i=1}^\infty [P](/page/P′′)(A_i). In the finite cases of the coin and die, additivity reduces to the finite case, while the countable cases of Bernoulli trials and the Poisson count require the full countable property.^[32]

Continuous Examples

Continuous probability spaces typically feature an uncountable sample space Ω, often a subset of the real line or higher-dimensional Euclidean space, equipped with the Borel σ-algebra generated by open sets, and a probability measure P defined via a probability density function with respect to Lebesgue measure. These spaces are non-atomic, meaning no single point has positive probability, which aligns with their continuous nature.^[34] A fundamental example is the uniform distribution on the unit interval, where the sample space is Ω = [0,1], the σ-algebra Σ is the Borel σ-algebra on [0,1], and the probability measure is P(E) = λ(E) for Borel sets E ⊆ [0,1], with λ denoting the Lebesgue measure normalized to total probability 1.^[34] This setup models scenarios requiring equal likelihood across a continuum, such as selecting a random point in a line segment. The density function is f(x) = 1 for x ∈ [0,1] and 0 otherwise, ensuring ∫_{[0,1]} f(x) dx = 1.^[35] Another canonical continuous space arises from the standard normal distribution, with sample space Ω = ℝ, the Borel σ-algebra on ℝ, and probability measure P(E) = ∫_E φ(x) dx, where the density is given by

\phi(x) = \frac{1}{\sqrt{2\pi}} \exp\left(-\frac{x^2}{2}\right)

for x ∈ ℝ.^[36] This distribution is central to many statistical applications due to its symmetry and the central limit theorem, which approximates sums of independent random variables by normals. The measure assigns probability via integration over intervals or sets, reflecting the bell-shaped density concentrated around the mean of 0 with variance 1.^[36] The exponential distribution provides a continuous space for modeling waiting times, particularly interarrival times in a Poisson process with rate λ > 0. Here, Ω = [0, ∞), Σ is the Borel σ-algebra on [0, ∞), and the density is f(x) = λ e^{-λx} for x ≥ 0, so P(E) = ∫_E λ e^{-λx} dx for Borel sets E.^[37] This memoryless property makes it ideal for renewal processes, where the probability of waiting beyond time t depends only on the time since the last event. The total probability integrates to 1, as ∫_0^∞ λ e^{-λx} dx = 1.^[37] For multivariate cases, the joint uniform distribution on the unit square exemplifies a product probability space. The sample space is Ω = [0,1] × [0,1], with the product Borel σ-algebra, and the measure is the two-dimensional Lebesgue measure (area) normalized to 1, so P(E) equals the area of Borel sets E ⊆ Ω.^[38] This construction extends the univariate uniform by independence, modeling pairs of independent uniform random variables, such as coordinates of a random point in a square. The joint density is f(x,y) = 1 for (x,y) ∈ [0,1] × [0,1].^[38] Probabilities in these spaces are computed through integrals of the density over events. For instance, in the uniform space on [0,1], the probability that a random variable X falls in (0.2, 0.8) is P(0.2 < X < 0.8) = ∫_{0.2}^{0.8} 1 dx = 0.6.^[34] Similarly, for the standard normal, P(-1 < X < 1) ≈ 0.6827 via numerical integration of φ(x), establishing about 68% probability within one standard deviation. For the exponential with λ=1, P(X > 1) = ∫_1^∞ e^{-x} dx = e^{-1} ≈ 0.3679, illustrating the tail decay. These integral computations highlight how continuous measures quantify likelihood without discrete enumeration.^[37]

Connections to Probability Theory

Random Variables

In the context of a probability space (\Omega, \Sigma, P), a random variable X is defined as a measurable function from \Omega to the real numbers \mathbb{R}, where measurability requires that for every Borel set B \subseteq \mathbb{R}, the preimage X^{-1}(B) = \{\omega \in \Omega : X(\omega) \in B\} belongs to the \sigma-algebra \Sigma.^[7] This ensures that events defined by the values of X are observable within the probability space. Typically, the codomain is equipped with the Borel \sigma-algebra generated by the open sets of \mathbb{R}, making X a bridge between the abstract sample space and concrete numerical outcomes. The random variable X induces a probability measure on \mathbb{R}, known as the pushforward or distribution measure P_X, defined by P_X(B) = P(X^{-1}(B)) for every Borel set B \subseteq \mathbb{R}.^[7] This measure P_X captures the probabilistic structure transferred from the original space to the range of X, allowing probabilities of intervals or sets in \mathbb{R} to be computed via the underlying probability P. The \sigma-algebra generated by X, denoted \sigma(X), is the smallest \sigma-algebra on \Omega that makes X measurable; it consists precisely of sets of the form X^{-1}(B) for Borel B \subseteq \mathbb{R}.^[7] This generated \sigma-algebra represents the information revealed by observing X. The expectation of a random variable X, assuming it exists (i.e., E[|X|] < \infty), is given by the Lebesgue integral

E[X] = \int_{\Omega} X(\omega) \, dP(\omega),

which generalizes the intuitive notion of average value. In discrete probability spaces, where \Omega is countable and P assigns masses to points, this reduces to a sum: E[X] = \sum_{\omega \in \Omega} X(\omega) P(\{\omega\}).^[39] For continuous spaces, such as those with Lebesgue measure, the expectation often takes the form of an integral over \mathbb{R} with respect to the induced density, emphasizing the shift from summation to integration based on the underlying space type. This integral definition unifies the treatment across different space types and underpins further probabilistic computations.

Probability Distributions

In a probability space (\Omega, \mathcal{F}, P), a random variable X: \Omega \to \mathbb{R} induces a probability distribution, which is the pushforward measure P_X(B) = P(X^{-1}(B)) for Borel sets B \subseteq \mathbb{R}. This distribution fully describes the probabilistic behavior of X on \mathbb{R}, independent of the underlying space \Omega.^[3] The cumulative distribution function (CDF) provides a complete characterization of the distribution via F_X(x) = P(X \leq x) for x \in \mathbb{R}. This function is right-continuous and non-decreasing, satisfying \lim_{x \to -\infty} F_X(x) = 0 and \lim_{x \to \infty} F_X(x) = 1.^[40] Distributions fall into discrete, continuous, singular continuous, and mixed types. For discrete distributions, the probability mass function satisfies p(x) = P(X = x), yielding a step-function CDF with jumps at support points. Continuous distributions admit a probability density function f(x) \geq 0 such that F_X(x) = \int_{-\infty}^x f(t) \, dt and \int_{-\infty}^\infty f(t) \, dt = 1. Singular continuous distributions feature continuous, strictly increasing CDFs supported on sets of Lebesgue measure zero, exemplified by the Cantor distribution whose CDF is the Cantor function, constant on the complement of the Cantor set but increasing thereon. Mixed distributions decompose into a combination of these components.^[40]^[41] The distribution uniquely determines the law of X. The Skorokhod representation theorem establishes this by showing that weak convergence of distributions \mu_n \to \mu on \mathbb{R} implies the existence of random variables X_n with laws \mu_n on some probability space such that X_n \to X almost surely, where X has law \mu.^[42] Key features like moments and the characteristic function derive directly from the distribution. The k-th moment is \mathbb{E}[X^k] = \int_{-\infty}^\infty x^k \, dF_X(x) when finite. The characteristic function is \phi_X(t) = \mathbb{E}[e^{itX}] = \int_{-\infty}^\infty e^{itx} \, dF_X(x) for t \in \mathbb{R}, uniquely determining F_X via continuity theorems.^[40] Convergence of distributions induced by sequences of random variables on a probability space often manifests as weak convergence, where \mu_n \to \mu if F_n(x) \to F(x) at continuity points of F, provided the sequence is tight to ensure a proper limit.^[42]

Event Relations

In a probability space (\Omega, \mathcal{F}, P), events are elements of the \sigma-algebra \mathcal{F}, and relations between them are defined using the probability measure P. The conditional probability of an event A \in \mathcal{F} given another event B \in \mathcal{F} with P(B) > 0 is given by

P(A \mid B) = \frac{P(A \cap B)}{P(B)},

which corresponds to restricting the probability measure to the subspace \Omega_B = \{\omega \in \Omega : \omega \in B\} and normalizing by P(B).^[3] Two events A, B \in \mathcal{F} are independent if their joint occurrence does not affect the individual probabilities, formally P(A \cap B) = P(A) P(B). This relation extends to collections of events or, more generally, to \sigma-algebras \mathcal{G}, \mathcal{H} \subseteq \mathcal{F} being independent if P(G \cap H) = P(G) P(H) for all G \in \mathcal{G}, H \in \mathcal{H}.^[3] Mutually exclusive events, also called disjoint events, satisfy A_i \cap A_j = \emptyset for i \neq j, and for a countable collection \{A_i\}_{i=1}^\infty \subseteq \mathcal{F}, the probability of their union is the sum of the individual probabilities: P\left(\bigcup_{i=1}^\infty A_i\right) = \sum_{i=1}^\infty P(A_i), following from the countable additivity axiom of P.^[3] Bayes' theorem follows directly from the definition of conditional probability: for events A, B \in \mathcal{F} with P(B) > 0,

P(A \mid B) = \frac{P(B \mid A) P(A)}{P(B)},

allowing inversion of conditional probabilities.^[3] Filtrations provide a framework for event relations in time-dependent settings, defined as an increasing family of sub-\sigma-algebras \{\mathcal{F}_t\}_{t \geq 0} of \mathcal{F} such that \mathcal{F}_s \subseteq \mathcal{F}_t for s < t, representing evolving information available for conditioning events up to time t. Basic event conditioning with respect to a filtration involves restricting probabilities to \mathcal{F}_t-measurable events.