Fact-checked by Grok 2 weeks ago

Probability

Probability is a branch of mathematics and statistics that quantifies the likelihood of events occurring, providing a numerical measure of ranging from 0 (impossible) to 1 (certain). This measure enables the analysis of random phenomena, such as the outcomes of experiments or games, by assigning probabilities to possible events within a defined . The concept originated in the amid efforts to resolve disputes, notably through the 1654 correspondence between French mathematicians and , who developed methods for fairly dividing stakes in interrupted games of chance. The rigorous mathematical foundation of probability theory was established in 1933 by Russian mathematician Andrey Kolmogorov, who axiomatized it using measure theory to create a consistent framework applicable to both finite and infinite sample spaces. Kolmogorov's three axioms define a probability measure P on a sigma-algebra of events: (1) non-negativity, where P(E) ≥ 0 for any event E; (2) normalization, where P(Ω) = 1 for the entire sample space Ω; and (3) countable additivity, where for mutually exclusive events E₁, E₂, ..., P(∪ Eᵢ) = Σ P(Eᵢ). These axioms underpin core concepts like conditional probability, which updates likelihoods based on new evidence (P(A|B) = P(A ∩ B)/P(B)), and random variables, functions mapping outcomes to real numbers whose distributions describe probabilistic behavior. Probability theory has profoundly influenced diverse fields, including statistics for and hypothesis testing, physics for modeling and , for and option pricing, and for predictive algorithms. Interpretations vary, with the frequentist view treating probability as long-run relative frequency in repeated trials, and the Bayesian approach viewing it as a subjective degree of belief updated via (P(H|E) = [P(E|H) P(H)] / P(E)). Central limit theorems demonstrate how sums of random variables approximate distributions, facilitating practical approximations in large-scale .

Historical Development

Early Concepts and Games of Chance

The roots of probabilistic reasoning emerged from ancient games of , where participants grappled intuitively with through and similar implements. In and , polyhedral objects resembling , including cubes and astragali (knucklebones from sheep), date back to the third millennium BCE and were employed in both recreational gaming and divinatory practices, introducing early notions of uncertain outcomes. Similarly, in , tesserae—bone or with marked faces—facilitated widespread in social settings from the Republican era onward, with literary references in works like those of highlighting the cultural acceptance of such risks. These activities, while not formalized mathematically, fostered practical awareness of and fairness in wagering, setting the stage for later analytical developments. A pivotal advancement occurred in the 16th century with Gerolamo Cardano's Liber de Ludo Aleae, composed in the mid-16th century (completed around 1564) but published posthumously in 1663. This treatise provided the first systematic analysis of gambling probabilities, focusing on dice and card games to compute odds by enumerating favorable versus total outcomes—for instance, determining the chances of throwing specific sums with three dice. Cardano distinguished between games of pure chance and those involving skill, and he derived ratios such as the 1:1 odds for even sums in a single die roll, emphasizing equiprobable cases as a core principle. His work, influenced by his own gambling experiences, marked a shift from anecdotal observations to methodical enumeration, though it remained unpublished during his lifetime due to concerns over promoting vice. The 1654 exchange of letters between and addressed the "," a longstanding dilemma concerning the equitable division of stakes in an unfinished game. Posed to Pascal by the gambler Chevalier de Méré, the issue involved scenarios like two players nearing victory in a point-based contest when external interruption occurs, requiring a fair split based on remaining chances. Fermat proposed enumerating all possible future plays to weight divisions proportionally, while Pascal refined this with recursive methods, such as solving for cases where one player needs two points and the other three in a best-of-five game. Their correspondence, spanning July to August 1654, resolved the problem through combinatorial enumeration, establishing precedents for handling incomplete trials in chance events. Christiaan Huygens extended these ideas in his 1657 pamphlet De Ratiociniis in Ludo Aleae, which formalized as a tool for evaluating fairness in games. Huygens posited that the value of a chance equals the of possible payoffs weighted by their likelihoods, exemplified by his for a game yielding 3, 4, or 5 units with equal probability: the expectation is \frac{3+4+5}{3} = 4. Applied to dice throws and the , this concept quantified long-run averages, declaring a game fair if the expectation is zero for both parties. Huygens' tract, appended to a probability text, synthesized prior analyses into a coherent framework for moral and practical decision-making in uncertain ventures.

Foundations in the 17th and 18th Centuries

The foundations of as a mathematical solidified in the 17th and 18th centuries through pioneering works that introduced key theorems and analytical tools, building on earlier ideas from games of chance to establish rigorous probabilistic laws. Jacob Bernoulli's posthumously published (1713) marked a seminal advancement by formalizing the , which states that for independent trials each with success probability p, the sample proportion of successes converges to p as the number of trials increases to . This theorem provided the first mathematical justification for using empirical frequencies to estimate true probabilities, demonstrating with explicit error bounds derived from combinatorial arguments. Abraham de Moivre further advanced probabilistic approximations in his Doctrine of Chances (1738 edition), where he derived the normal curve as an asymptotic approximation to the binomial distribution for large n. This work enabled efficient computation of tail probabilities in repeated trials, approximating the binomial probability mass function P(K = k) = \binom{n}{k} p^k (1-p)^{n-k} via the integral of a Gaussian density centered at np with variance np(1-p). De Moivre's approximation, refined using Stirling's formula for factorials, laid groundwork for central limit theorem developments and practical applications in annuities and risk assessment. Thomas Bayes's essay, published posthumously in 1763, introduced the concept of , providing a precursor to the modern Bayesian formula P(A|B) = \frac{P(B|A) P(A)}{P(B)} through proportional reasoning on conditional chances. In addressing how to infer the probability of an event from observed outcomes, Bayes framed the problem using uniform priors over possible success probabilities, yielding a method to update beliefs based on data via ratios of likelihoods to marginal probabilities. This approach shifted focus from direct forward probabilities to inferential reasoning, influencing later . Pierre-Simon Laplace's Théorie Analytique des Probabilités (1812) synthesized and expanded these ideas by developing generating functions—power series representations of probability distributions—and asymptotic methods for approximating complex integrals in probabilistic sums. Laplace employed generating functions to solve recurrence relations for and multinomial coefficients, deriving expansions that facilitated large-sample approximations beyond de Moivre's normal curve. His asymptotic techniques, including for integrals, provided error estimates for these approximations, establishing probability as a branch of with tools for and error theory.

Axiomatic Formalization in the 20th Century

In the late 19th century, Georg Cantor's development of provided essential tools for handling infinite collections, which began to intersect with probability by enabling precise definitions of event spaces. Paradoxes like , introduced by in 1889 to challenge uniform probability assignments in continuous settings, exposed limitations in classical approaches and spurred the need for rigorous foundations. Émile Borel advanced this by applying measure theory to probability around 1905, linking sets to measurable —such as assigning probabilities proportional to intervals—and addressing skepticism toward continuous probabilities through concepts like denumerable probabilities and almost sure . This measure-theoretic groundwork, building briefly on 18th-century laws of large numbers for empirical justification, culminated in Andrey Kolmogorov's 1933 monograph Foundations of the Theory of Probability, which axiomatized the field by defining probability as a non-negative, normalized measure on a sigma-algebra of subsets in a . Kolmogorov's framework resolved earlier ambiguities by treating probabilities as measures, allowing theorems on limits and convergence to be derived deductively, and it remains the cornerstone of modern . Parallel developments in processes enriched the axiomatic base: in the 1920s, rigorously constructed as a continuous-time process with independent Gaussian increments, proving its existence and paving the way for modeling random paths in physics and beyond. In the 1930s, Paul Lévy extended this by introducing martingale conditions for dependent random variables, as detailed in his 1935 and 1937 works, which formalized convergence theorems essential for analyzing sums and paths in stochastic settings. Kolmogorov's axioms facilitated broader applications in , notably through the Neyman-Pearson lemma of 1933, which identifies the as uniformly most powerful for simple hypothesis testing under controlled error rates, influencing advancements in inference and .

Interpretations of Probability

Classical Interpretation

The classical interpretation of probability defines it as the ratio of the number of favorable outcomes to the total number of possible outcomes in a of equally likely alternatives. This approach assumes a symmetric where each has the same probability of occurrence, providing an objective measure based on combinatorial structure rather than empirical observation. For instance, the probability of obtaining heads when flipping a is \frac{1}{2}, since there is one favorable outcome out of two equally possible results. This interpretation traces its formal origin to the work of Pierre-Simon Laplace in the late 18th and early 19th centuries, particularly in his Théorie Analytique des Probabilités (1812), where he articulated probability as arising from equally possible cases in games of chance. Laplace built on earlier ideas from mathematicians like Abraham de Moivre but emphasized the principle of insufficient reason, positing that in the absence of information favoring one outcome over another, all possibilities should be deemed equiprobable. He applied this to analyze uncertainties in lotteries and astronomical predictions, establishing it as a foundational tool for deterministic yet probabilistic reasoning. Common examples illustrate the method's reliance on enumeration. In rolling a fair six-sided die, the probability of landing on an even number (2, 4, or 6) is \frac{3}{6} = \frac{1}{2}, as three outcomes are favorable out of six total. For the sum of two fair dice equaling 7, there are six favorable combinations (1+6, 2+5, 3+4, 4+3, 5+2, 6+1) out of 36 possible outcomes, yielding P = \frac{6}{36} = \frac{1}{6}. Similarly, drawing a specific from a has probability \frac{13}{52} = \frac{1}{4}, assuming uniform . Despite its elegance for symmetric scenarios, the classical interpretation has significant limitations when outcomes lack inherent equiprobability. It cannot apply to cases like a loaded die, where is absent and probabilities deviate from equality, nor to phenomena such as weather prediction, which involve complex, non-combinatorial factors without a of equally likely states. This reliance on a priori often leads to circularity in defining "equal likelihood," restricting its utility beyond idealized games.

Frequentist Interpretation

The frequentist interpretation defines the probability of an as the limiting relative with which that occurs in an of trials under fixed conditions. Formally, if an experiment is repeated indefinitely, the probability p is given by \lim_{n \to \infty} \frac{k}{n} = p, where n is the number of trials and k is the number of times the occurs. This approach emphasizes an objective, empirical basis for probability, derived solely from observable data rather than subjective beliefs. This interpretation was notably advanced by John Venn in his 1866 work The Logic of Chance, where he argued for probability as the frequency of attributes in a series of instances, drawing on empirical observations to ground probabilistic reasoning. Richard von Mises further formalized it in 1919 by introducing axioms for random sequences, known as "collectives," which ensure that relative frequencies converge in any subsequence selected by a place-selection rule, providing a rigorous foundation for randomness in infinite trials. These developments built on earlier ideas, linking to the law of large numbers for justification of convergence. In practice, this applies to repeatable experiments, such as flips, where the proportion of heads approaches 0.5 as the number of tosses increases, illustrating to the true probability. Similarly, in statistical sampling, confidence intervals estimate population parameters based on the long-run , meaning that if the sampling process is repeated many times, the interval will contain the true value in approximately the stated proportion of cases (e.g., 95%). Critics argue that the frequentist view is impractical because it relies on an unattainable number of trials, making the limit a theoretical construct rather than something directly observable. It also struggles with non-repeatable or unique events, such as the probability of , where no long-run frequency can be established, rendering the inapplicable to many real-world scenarios.

Bayesian Interpretation

The Bayesian interpretation treats probability as a measure of subjective degree of or credence in a , which can be updated rationally in light of new evidence. This view contrasts with objective interpretations by emphasizing personal judgments, while still adhering to the formal rules of probability to ensure . It posits that probabilities represent an agent's state of knowledge, which evolves through iterative updates rather than fixed empirical frequencies. This subjective approach was rigorously formalized in the early through arguments linking probabilities to under . Frank P. Ramsey, in his 1926 essay "Truth and Probability," proposed that degrees of belief could be quantified by considering an individual's willingness to bet on propositions, using utility theory to derive numerical probabilities from coherent preferences. Building on this, in his 1937 work "Foresight: Its Logical Laws, Its Subjective Sources" employed arguments to demonstrate that any set of beliefs violating the axioms of probability would allow a to construct a series of bets guaranteeing a loss regardless of outcomes, thus compelling subjective probabilities to conform to mathematical probability. The core process of belief updating in the Bayesian framework is governed by , which combines beliefs with observed to yield posterior beliefs. Formally, derives from the definition of : for events H () and E (), P(H \cap E) = P(H|E) P(E) = P(E|H) P(H), so rearranging gives P(H|E) = \frac{P(E|H) P(H)}{P(E)}, where P(H) is the of the , P(E|H) is the likelihood (probability of given the ), and P(E) is the total probability of the , often computed as P(E) = P(E|H) P(H) + P(E|\neg H) P(\neg H). This can equivalently be expressed in form for intuitive updating: the posterior \frac{P(H|E)}{P(\neg H|E)} = \frac{P(E|H)}{P(E|\neg H)} \times \frac{P(H)}{P(\neg H)}, where the are multiplied by the likelihood \frac{P(E|H)}{P(E|\neg H)}, a factor that quantifies how much the favors H over its alternative. A classic example illustrates this updating in a medical testing scenario. Suppose a rare disease affects 1% of the population (P(D) = 0.01), a diagnostic test has 99% sensitivity (P(+|D) = 0.99), and 95% specificity (P(-|\neg D) = 0.95, so P(+|\neg D) = 0.05). For a positive test result (+), the posterior probability P(D|+) = \frac{0.99 \times 0.01}{0.99 \times 0.01 + 0.05 \times 0.99} \approx 0.166, or about 16.6%. In odds terms, the prior odds are 1:99; the likelihood ratio is \frac{0.99}{0.05} = 19.8, yielding posterior odds of approximately 1:5, reflecting how the low disease prevalence tempers the test's evidential strength. This demonstrates Bayesian updating's ability to revise beliefs about a hypothesis, such as disease presence, after observing data like a test outcome. Modern developments in address subjectivity concerns through objective Bayes methods, which seek non-informative or reference priors to minimize prior influence on . introduced such priors in his 1939 book Theory of Probability, proposing priors proportional to the square root of the to achieve approximate invariance under parameter reparameterization, thereby providing a foundation for objective Bayesian analysis in scientific .

Mathematical Foundations

Kolmogorov Axioms

The Kolmogorov axioms form the foundational framework for modern , providing a rigorous that abstracts probability from specific interpretations and applies universally. These three axioms define the P on a collection of events within a , ensuring that probability behaves as a countably additive with values in the unit interval. The first states that for any A, P(A) \geq 0. The bound P(A) \leq 1 follows from the other axioms, since P(A) + P(A^c) = P(\Omega) = 1 and P(A^c) \geq 0, establishing that probabilities are non-negative and bounded above by certainty. The second requires that P(\Omega) = 1, where \Omega denotes the entire sample space, reflecting the certainty that the sample space encompasses all possible outcomes. The third specifies countable additivity: for any countable collection of pairwise disjoint events \{A_i\}_{i=1}^\infty, the probability of their union is the sum of their individual probabilities, i.e., P\left( \bigcup_{i=1}^\infty A_i \right) = \sum_{i=1}^\infty P(A_i). This axiom extends finite additivity to infinite collections, enabling the handling of continuous spaces and limits in probability calculations. From these axioms, several basic properties can be derived. The probability of the is zero: P(\emptyset) = 0, obtained by noting that \emptyset is disjoint from itself and applying the third axiom to the countable union of empty sets. The probability of the complement of an event A, denoted A^c, satisfies P(A^c) = 1 - P(A), following from the additivity of A and A^c covering \Omega. Monotonicity holds as well: if A \subseteq B, then P(A) \leq P(B), since B = A \cup (B \setminus A) with A and B \setminus A disjoint, and P(B \setminus A) \geq 0. These derivations confirm the coherence of the for finite and countable cases. The Kolmogorov axioms are unique in their , providing a consistent mathematical foundation that accommodates diverse interpretations of probability, such as frequentist and subjective views, without favoring any particular one. This neutrality ensures that remains a unified branch of measure theory, applicable across mathematical and applied contexts.

Probability Spaces and Events

In , the foundational structure for modeling is provided by a , formally defined as a triple (\Omega, \mathcal{F}, P), where \Omega represents the , \mathcal{F} is a \sigma-algebra of subsets of \Omega serving as the event space, and P: \mathcal{F} \to [0,1] is a satisfying the Kolmogorov axioms. This framework, introduced by in 1933, enables the rigorous assignment of probabilities to events while ensuring consistency with measure-theoretic principles. The sample space \Omega consists of all possible outcomes of a random experiment or process, ranging from discrete sets like the faces of a die to uncountable continua such as real numbers representing measurement errors. The \sigma-algebra \mathcal{F}, also called the event algebra, is a collection of subsets of \Omega that includes \emptyset and \Omega itself and is closed under complementation (if A \in \mathcal{F}, then \Omega \setminus A \in \mathcal{F}) and countable unions (if A_n \in \mathcal{F} for n \in \mathbb{N}, then \bigcup_{n=1}^\infty A_n \in \mathcal{F}). These closure properties ensure that \mathcal{F} is also closed under countable intersections and differences, allowing for the formation of complex events from simpler ones while maintaining the structure needed for probability assignments. Events are precisely the measurable sets in \mathcal{F}, to which the probability measure P assigns values between 0 and 1, with P(\Omega) = 1 and P(\emptyset) = 0. Specific choices of \mathcal{F} depend on the nature of \Omega. In discrete probability spaces, where \Omega is finite or countably infinite—such as the set \{1, 2, 3, 4, 5, 6\} for a fair die roll—the power set of \Omega (all possible subsets) serves as \mathcal{F}, enabling probabilities to be defined directly on every subset. For continuous spaces, such as \Omega = \mathbb{R} modeling a continuous , \mathcal{F} is typically the Borel \sigma-algebra \mathcal{B}(\mathbb{R}), generated by the open intervals of \mathbb{R} (i.e., the smallest \sigma-algebra containing all open sets). This construction ensures measurability for intervals and their limits, crucial for integrating probability densities over the real line. A key concept within this structure is the independence of \sigma-algebras, which extends the notion of independent events to subcollections of events. Two sub-\sigma-algebras \mathcal{F}_1 and \mathcal{F}_2 of \mathcal{F} are independent if, for every A \in \mathcal{F}_1 and B \in \mathcal{F}_2, the probability measure satisfies P(A \cap B) = P(A) P(B). This property captures the lack of informational overlap between the substructures, allowing joint probabilities to factorize and facilitating the analysis of composite systems. The probability space framework builds directly on the Kolmogorov axioms by applying them to this structured setup of outcomes and events.

Random Variables and Expectation

In , a is a X: \Omega \to \mathbb{R} defined on a (\Omega, \mathcal{F}, P), where for every a, the preimage set \{\omega \in \Omega \mid X(\omega) \leq a\} belongs to the sigma-algebra \mathcal{F}. This measurability ensures that probabilities can be assigned to events defined by the values of X, such as the distribution function F_X(a) = P(X \leq a), which is non-decreasing, right-continuous, and satisfies \lim_{a \to -\infty} F_X(a) = 0 and \lim_{a \to \infty} F_X(a) = 1. Random variables are categorized as discrete if their range is countable, meaning X takes on a finite or countably of values, or continuous if the is absolutely continuous with respect to , admitting a f_X. In the discrete case, probabilities are assigned via a p_X(x_i) = P(X = x_i) for each possible value x_i; in the continuous case, the density f_X(x) satisfies \int_{-\infty}^{\infty} f_X(x) \, dx = 1 and P(a < X \leq b) = \int_a^b f_X(x) \, dx. The expected value, or expectation, of a random variable X, denoted E[X], quantifies its average or mean value and is defined generally as the Lebesgue integral E[X] = \int_{\Omega} X(\omega) \, dP(\omega), provided the integral exists (i.e., E[|X|] < \infty). Equivalently, using the distribution function, E[X] = \int_{-\infty}^{\infty} x \, dF_X(x). For a discrete random variable with possible values x_i and probabilities p_i = P(X = x_i), this simplifies to E[X] = \sum_i x_i p_i. For a continuous random variable with density f_X, it becomes the Lebesgue integral E[X] = \int_{-\infty}^{\infty} x f_X(x) \, dx. These forms align through the Stieltjes integral representation, bridging discrete and continuous cases. Key properties of expectation include linearity: for constants a, b \in \mathbb{R} and random variables X, Y (with finite expectations), E[aX + bY] = a E[X] + b E[Y]. This holds regardless of dependence between X and Y. Additionally, if X \geq 0 almost surely (i.e., P(X < 0) = 0), then E[X] \geq 0, reflecting the non-negativity of the integral measure. The variance of X, denoted \operatorname{Var}(X), measures the spread of X around its mean \mu = E[X] and is defined as \operatorname{Var}(X) = E[(X - \mu)^2] = E[X^2] - (E[X])^2, assuming E[X^2] < \infty. This expression derives from the second moment E[X^2], with the alternative form highlighting the difference between the expected squared value and the square of the expectation. Variance is always non-negative, \operatorname{Var}(X) \geq 0, and equals zero if and only if X is constant almost surely.

Core Probability Rules

Addition and Union Rules

The addition rule provides a fundamental method for calculating the probability of the union of events, which represents the occurrence of at least one of the events. For two events A and B in a probability space, the probability of their union is expressed as P(A \cup B) = P(A) + P(B) - P(A \cap B). This formula corrects for the overlap by subtracting the probability of the intersection, as the individual probabilities P(A) and P(B) would otherwise double-count the cases where both events occur simultaneously. When the events are mutually exclusive, meaning their intersection is empty (A \cap B = \emptyset), the intersection probability is zero, simplifying the addition rule to P(A \cup B) = P(A) + P(B). This case aligns directly with the countable additivity axiom in , which extends to any finite or countably infinite collection of pairwise disjoint events A_i, stating that P\left(\bigcup_{i=1}^n A_i\right) = \sum_{i=1}^n P(A_i) for finite n, where A_i \cap A_j = \emptyset for all i \neq j. For non-disjoint events, the inclusion-exclusion principle generalizes the addition rule to the union of n events A_1, A_2, \dots, A_n. The probability is given by P\left(\bigcup_{i=1}^n A_i\right) = \sum_{i=1}^n P(A_i) - \sum_{1 \leq i < j \leq n} P(A_i \cap A_j) + \sum_{1 \leq i < j < k \leq n} P(A_i \cap A_j \cap A_k) - \cdots + (-1)^{n+1} P\left(\bigcap_{i=1}^n A_i\right). This alternating sum systematically adds the probabilities of single events, subtracts pairwise intersections to correct for overcounting, adds triple intersections, and continues up to the full intersection, ensuring accurate computation of the union measure. The principle originates from early combinatorial work but applies rigorously to probability measures. A practical example illustrates the inclusion-exclusion principle: consider drawing two cards sequentially without replacement from a standard 52-card deck, and compute the probability of at least one ace. Let A_1 be the event that the first card is an ace (P(A_1) = 4/52) and A_2 the event that the second card is an ace. Then P(A_2) = P(A_2 \mid A_1^c) \cdot P(A_1^c) + P(A_2 \mid A_1) \cdot P(A_1) = (4/51) \cdot (48/52) + (3/51) \cdot (4/52) = 192/2652 + 12/2652 = 204/2652 = 17/221, and P(A_1 \cap A_2) = (4/52) \cdot (3/51) = 12/2652 = 1/221. Applying the addition rule, P(A_1 \cup A_2) = \frac{4}{52} + \frac{17}{221} - \frac{1}{221} = \frac{1}{13} + \frac{16}{221} = \frac{221 + 208}{2873} = \frac{429}{2873} \approx 0.149. This matches the complementary calculation $1 - P(\text{no aces}) = 1 - (48/52) \cdot (47/51) = 1 - 2256/2652 = 396/2652 = 33/221 \approx 0.149, confirming the rule's validity for small unions.

Multiplication and Independence

The multiplication rule expresses the probability of the joint occurrence of two events in terms of a marginal probability and a . For events A and B in a probability space, where P(A) > 0, the probability of their is given by P(A \cap B) = P(A) \cdot P(B \mid A). This relation follows directly from the definition of within the axiomatic framework of . The rule extends to the of multiple events A_1, A_2, \dots, A_n as P(A_1 \cap A_2 \cap \dots \cap A_n) = P(A_1) \cdot P(A_2 \mid A_1) \cdot P(A_3 \mid A_1 \cap A_2) \cdots P(A_n \mid A_1 \cap \dots \cap A_{n-1}), allowing sequential of probabilities through . Statistical independence arises as a special case of the multiplication rule when the equals the unconditional probability, simplifying joint probability calculations. Two events A and B are independent if P(A \cap B) = P(A) \cdot P(B). This equivalence holds because P(B \mid A) = P(B) whenever P(A) > 0, indicating that the occurrence of A provides no information about B. The concept extends to collections of events A_1, A_2, \dots, A_n, which are mutually independent if the probability of the of any subcollection equals the product of their individual probabilities. For random variables, independence is defined analogously: random variables X and Y are independent if their joint cumulative distribution function satisfies F_{X,Y}(x,y) = F_X(x) \cdot F_Y(y) for all x, y, or equivalently for their probability density or mass functions. Independence implies uncorrelatedness, meaning \operatorname{Cov}(X,Y) = 0 and E[XY] = E[X]E[Y], but the converse does not hold—uncorrelated random variables may still be dependent. For example, let X be uniformly distributed on [-1, 1] and Y = X^2; then \operatorname{Cov}(X,Y) = 0 because E[XY] = E[X^3] = 0 = E[X]E[Y], yet X and Y are dependent since Y is a deterministic function of X. Distinguishing pairwise independence from mutual independence is crucial for multiple events, as pairwise independence (every pair satisfies the independence condition) does not imply mutual independence for the entire collection. Consider three Bernoulli random variables X, Y, Z each with parameter $1/2, where X and Y are independent, and Z = X \oplus Y (exclusive or); the pairs (X,Y), (X,Z), and (Y,Z) are independent, but X, Y, Z are not mutually independent because P(X = Y = Z = 1) = 0 \neq (1/2)^3. Illustrative examples highlight these concepts in practice. Successive fair coin flips represent independent events, as the outcome of one flip does not influence the next, so the probability of heads on the second flip remains $1/2 regardless of the first. In contrast, drawing balls without replacement from an urn introduces dependence: the probability of drawing a red ball on the second draw depends on the color of the first, violating independence.

Conditional Probability

Conditional probability is a fundamental concept in that measures the probability of an occurring given that another has already occurred, providing a way to update probabilities based on new . Formally, the conditional probability of event A given event B, denoted P(A \mid B), is defined as P(A \mid B) = \frac{P(A \cap B)}{P(B)}, provided that P(B) > 0. This definition arises from the need to restrict the to the outcomes where B occurs, normalizing the joint probability by the marginal probability of B to ensure the resulting measure is a valid over the conditioned space. The properties of conditional probability extend this definition to more complex scenarios. One key property is the chain rule, which generalizes the joint probability of multiple events through successive conditioning. For three events A, B, and C, the chain rule states P(A \cap B \cap C) = P(A) \cdot P(B \mid A) \cdot P(C \mid A \cap B). This can be derived by applying the definition iteratively: first, P(A \cap B) = P(A) \cdot P(B \mid A), and then P(A \cap B \cap C) = P(A \cap B) \cdot P(C \mid A \cap B), substituting the prior expression yields the full form. The chain rule is essential for modeling sequential dependencies in probability spaces and forms the basis for algorithms in statistical inference. A classic example illustrating is the Monty Hall problem, named after the host of the game show . In this setup, a contestant chooses one of three doors, behind one of which is a (prize) and behind the others goats. The host, knowing what's behind each door, opens a goat door among the unchosen ones. The that the car is behind the originally chosen door, given the host's reveal, is $1/3, while switching to the remaining door yields $2/3. This counterintuitive result stems from updating the initial uniform probabilities with the host's action, which provides information favoring the unchosen doors. In diagnostic testing, quantifies the reliability of medical tests. Consider a test for a with P(D) = 0.01, P(+ \mid D) = 0.99 (true positive rate), and specificity P(- \mid \neg D) = 0.95 (true negative rate). The probability of having the given a positive result, P(D \mid +), is approximately 0.167, calculated via the as P(D \mid +) = \frac{P(+ \mid D) P(D)}{P(+)}, where P(+) = P(+ \mid D) P(D) + P(+ \mid \neg D) P(\neg D) = 0.99 \times 0.01 + 0.05 \times 0.99 = 0.0594. This example underscores how low can lead to high false positive rates, emphasizing the need for conditional analysis in interpreting test outcomes.

Advanced Probability Concepts

Bayes' Theorem

provides a fundamental method for updating the probability of a based on new , by inverting conditional probabilities. Formally, for events A and B where P(B) > 0, the theorem states that P(A \mid B) = \frac{P(B \mid A) P(A)}{P(B)}, where P(A) is the of A, P(B \mid A) is the likelihood of observing B given A, and P(B) is the marginal probability of B. When A is partitioned into mutually exclusive and exhaustive events A_i (i.e., \bigcup A_i = \Omega and A_i \cap A_j = \emptyset for i \neq j), the denominator expands via the as P(B) = \sum_i P(B \mid A_i) P(A_i). The proof follows directly from the definition of conditional probability. Since P(A \mid B) = \frac{P(A \cap B)}{P(B)} and P(B \mid A) = \frac{P(A \cap B)}{P(A)}, rearranging the latter yields P(A \cap B) = P(B \mid A) P(A). Substituting into the former gives the 's formula. An equivalent form expresses the in terms of ratios: the posterior of A given B, defined as \frac{P(A \mid B)}{P(\neg A \mid B)}, equal the \frac{P(A)}{P(\neg A)} multiplied by the likelihood ratio \frac{P(B \mid A)}{P(B \mid \neg A)}. This form highlights how evidence adjusts initial beliefs multiplicatively. Although named after Thomas Bayes, who outlined an early version in an unpublished essay communicated posthumously in 1763, the theorem received its full mathematical development and broader application by in his 1774 memoir on . Laplace's work formalized the inversion of conditional probabilities for , establishing the theorem's role in scientific reasoning. A classic application arises in medical testing, where the theorem computes the probability of having a given a positive test result. Suppose a affects 1% of the population (P(D) = 0.01), a test has 99% (P(+ \mid D) = 0.99) and 95% specificity (P(- \mid \neg D) = 0.95), so the false positive rate is P(+ \mid \neg D) = 0.05. The P(D \mid +) is then \frac{0.99 \times 0.01}{0.99 \times 0.01 + 0.05 \times 0.99} \approx 0.17, indicating only about 17% chance of despite a positive test, due to the low . In spam email filtering, Bayes' theorem underpins probabilistic classifiers that estimate the likelihood an email is spam given its words. For instance, if 40% of emails are spam (P(S) = 0.4), the word "free" appears in 60% of spam (P(F \mid S) = 0.6) but only 2% of non-spam (P(F \mid \neg S) = 0.02), then P(S \mid F) = \frac{0.6 \times 0.4}{0.6 \times 0.4 + 0.02 \times 0.6} \approx 0.96. This high posterior supports flagging the email as spam, with real systems extending this to multiple words via naive independence assumptions.

Law of Total Probability

The law of total probability provides a method to compute the unconditional probability of an event by conditioning on a partition of the sample space. Specifically, if \{B_i\}_{i=1}^n is a partition of the sample space \Omega (meaning the B_i are mutually exclusive and their union is \Omega), and A \subseteq \Omega is an event, then P(A) = \sum_{i=1}^n P(A \mid B_i) P(B_i). This formula holds because the events A \cap B_i are disjoint and their union is A, so P(A) = \sum P(A \cap B_i) = \sum P(A \mid B_i) P(B_i) by the multiplication rule. In the continuous case, suppose B is a continuous with f_B(b), and A is an whose occurrence depends on b. The law extends to P(A) = \int_{-\infty}^{\infty} P(A \mid B = b) f_B(b) \, db. This integral form arises analogously by replacing the discrete with an over the support of B, using the continuous definition of . The law derives from the chain rule of probability, which states P(A \cap B_i) = P(A \mid B_i) P(B_i), combined with the additivity for disjoint events: P(A) = \sum P(A \cap B_i). This connection highlights how the law bridges conditional and unconditional probabilities without assuming . A practical example involves predicting tomorrow based on seasonal s. Suppose the seasons form a : winter (B_1, probability 0.25, rain probability 0.6), spring (B_2, probability 0.25, rain 0.4), summer (B_3, probability 0.25, rain 0.1), and fall (B_4, probability 0.25, rain 0.3). Then, P(\text{rain}) = (0.6)(0.25) + (0.4)(0.25) + (0.1)(0.25) + (0.3)(0.25) = 0.35. In mixture models, the law computes the marginal density of an observation x as f(x) = \sum_k \pi_k f_k(x), where \{\pi_k\} are mixing weights (probabilities of component k) and f_k are component densities; this sums conditional densities weighted by priors to yield the overall . The law also underpins marginalization in joint distributions. For discrete random variables X and Y, the marginal p_X(x) = \sum_y p_{X,Y}(x,y) = \sum_y p_{X \mid Y}(x \mid y) p_Y(y), which applies the law by partitioning over Y's values. Similarly, for continuous variables, f_X(x) = \int f_{X,Y}(x,y) \, dy = \int f_{X \mid Y}(x \mid y) f_Y(y) \, dy. This process "sums out" the conditioning variable to obtain the marginal.

Central Limit Theorem

The Central Limit Theorem (CLT) asserts that, under suitable conditions, the sum of a large number of independent random variables, when appropriately standardized, converges in distribution to a , regardless of the underlying distribution of the individual variables. This result underpins much of by justifying the use of normal approximations for sums and averages. Formally, consider a sequence of independent and identically distributed random variables X_1, X_2, \dots, X_n each with finite \mu = \mathbb{E}[X_i] and positive variance \sigma^2 = \mathrm{Var}(X_i) < \infty. Let S_n = \sum_{i=1}^n X_i denote their sum. Then, the standardized random variable Z_n = \frac{S_n - n\mu}{\sigma \sqrt{n}} converges in distribution to a standard normal random variable Z \sim N(0,1) as n \to \infty, meaning that for any continuity point x of the standard normal cumulative distribution function \Phi, \mathbb{P}(Z_n \leq x) \to \Phi(x). A standard proof of the CLT relies on characteristic functions, which are the Fourier transforms of probability distributions and facilitate analysis of convolutions for sums of independent variables. The characteristic function of each X_i is \phi(t) = \mathbb{E}[e^{itX_i}], and for the centered and scaled X_i' = (X_i - \mu)/\sigma, it admits a Taylor expansion \phi(t) = 1 - t^2/2 + o(t^2) near t=0 due to the finite variance. The characteristic function of Z_n is then [\phi(t/\sqrt{n})]^n, which, upon taking the logarithm and expanding, converges pointwise to e^{-t^2/2}, the characteristic function of N(0,1). By Lévy's continuity theorem, this implies convergence in distribution. For broader applicability beyond identical distributions, the Lindeberg-Feller theorem extends the CLT to independent random variables X_{n,i} (for i=1,\dots,n) with \mathbb{E}[X_{n,i}] = 0 and \mathrm{Var}(X_{n,i}) = \sigma_{n,i}^2, where the total variance s_n^2 = \sum_{i=1}^n \sigma_{n,i}^2 \to \infty. The Lindeberg condition requires that for every \varepsilon > 0, \frac{1}{s_n^2} \sum_{i=1}^n \mathbb{E}\left[ X_{n,i}^2 \mathbf{1}_{\{|X_{n,i}| > \varepsilon s_n\}} \right] \to 0 as n \to \infty, ensuring no single variable dominates the sum. Under this condition and the uniform asymptotic negligibility \max_i \sigma_{n,i}^2 / s_n^2 \to 0, the normalized sum \sum_{i=1}^n X_{n,i} / s_n converges in distribution to N(0,1). This condition captures the intuitive notion that contributions from large deviations must become negligible relative to the overall scale. The CLT has profound implications for sampling distributions in . For instance, the of the sample \bar{X}_n = S_n / n from a with \mu and variance \sigma^2 is approximately with \mu and variance \sigma^2 / n for large n, enabling approximations for confidence intervals even when the is unknown. In polling, this justifies modeling the in estimated proportions (e.g., voter support) as normally distributed, where the \sqrt{p(1-p)/n} (with p the true proportion) quantifies uncertainty, allowing predictions of election outcomes within a . To quantify the in the CLT, the provides a uniform bound on the difference between the F_n of Z_n and \Phi. For i.i.d. random variables with \mathbb{E}[|X_i|^3] = \rho < \infty, there exists a universal constant C > 0 (e.g., C \approx 0.56) such that \sup_x |F_n(x) - \Phi(x)| \leq \frac{C \rho}{\sigma^3 \sqrt{n}}. This bound, of order O(1/\sqrt{n}), indicates that the normal approximation improves with sample size, with the third moment \rho influencing the speed; tighter constants and extensions exist for non-i.i.d. cases under Lindeberg conditions. The theorem builds on the notions of expectation and variance for random variables, providing an asymptotic link to the normal distribution for their linear combinations.

Applications of Probability

Statistics and Data Analysis

In inferential statistics, probability provides the foundation for drawing conclusions from sample data about broader populations. Hypothesis testing evaluates whether observed data are consistent with a proposed null hypothesis H_0, often using the p-value, which Ronald A. Fisher defined as the probability of obtaining data at least as extreme as those observed, assuming H_0 is true. This measure quantifies the evidence against H_0, with small p-values indicating incompatibility, though Fisher emphasized its role in assessing evidential strength rather than strict decision-making. The Neyman-Pearson framework complements this by focusing on decision procedures that control the Type I error rate (false positive probability) at a fixed level \alpha while maximizing power against specific alternatives. Their 1933 lemma establishes that the likelihood ratio test is uniformly most powerful for simple hypotheses, providing a probabilistic criterion for test optimality in binary decision contexts. Confidence intervals extend this probabilistic reasoning by constructing intervals around point estimates that contain the true parameter with a specified . formalized this in 1937, defining a (1 - \alpha)-confidence interval as a random whose probability of covering the unknown equals or exceeds $1 - \alpha over repeated samples from the population. For large samples, normal approximations—often justified by the —enable practical computation, such as the \bar{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}} for a , where z_{\alpha/2} is the standard normal . This frequentist guarantee ensures long-run reliability without assigning probability to the parameter itself, distinguishing it from Bayesian credible intervals. Regression analysis employs probabilistic models to quantify relationships between variables while accounting for uncertainty. In linear regression, the model posits Y = X\beta + \epsilon, where \epsilon represents errors independently distributed as \epsilon \sim \mathcal{N}(0, \sigma^2 I), assuming homoscedasticity and normality. This Gaussian error structure, first justified by in 1809 as the distribution maximizing the probability of observed under least-squares , allows maximum likelihood on coefficients and predictions. The facilitates tests for and prediction intervals, with the normality enabling exact t-distributions for small samples under additional conditions. Post-2020 advancements in probabilistic graphical models have strengthened their application to in . Bayesian networks, directed acyclic graphs encoding conditional dependencies among variables, facilitate probabilistic reasoning about interventions via do-calculus. Recent reviews highlight integrations with scalable algorithms for high-dimensional data, enabling robust causal effect estimation in observational studies, such as quantifying treatment impacts while adjusting for confounders. These developments, building on Pearl's foundational work, emphasize hybrid approaches combining exact with approximations for real-world . As of 2025, key trends include integration of causal with large language models and advancements in causal inference for dynamic data environments.

Physics and Quantum Mechanics

In , probability plays a foundational role in , where it underpins the description of macroscopic phenomena emerging from the behavior of large numbers of microscopic particles. developed this framework in the late 19th century, introducing the concept of microstates—distinct configurations of a system consistent with given macroscopic constraints. The probability of a particular macrostate is proportional to the number of microstates W that realize it, assuming all microstates are equally likely due to the . This probabilistic interpretation resolves the apparent irreversibility of thermodynamic processes, such as the second law, by linking it to the overwhelming likelihood of entropy-increasing transitions in . A key outcome of Boltzmann's approach is the entropy formula S = k \ln W, where S is the , k is Boltzmann's constant, and W is the multiplicity of microstates. This equation quantifies the disorder or uncertainty in a : higher W corresponds to greater and thus higher probability for the macrostate. Derived from combinatorial considerations of gas molecules' positions and velocities, it establishes probability as the bridge between deterministic microscopic dynamics and statistical macroscopic laws, influencing fields from to . In , probability assumes a more intrinsic and counterintuitive form, diverging from classical notions through the , proposed by in 1926. The rule states that upon of an , the probability P of obtaining a particular outcome corresponding to an eigenvector of the operator is given by the square of the of the of the system's \psi onto that eigenvector: P = |\langle \phi | \psi \rangle|^2, where \phi is the eigenvector. This probabilistic interpretation resolved the issue of continuous wave functions yielding discrete results, transforming quantum theory from a deterministic wave evolution to a fundamentally process. Unlike classical probabilities, quantum probabilities arise from non-commutative operators in , leading to phenomena like where joint probabilities cannot be factored as in commutative classical algebras. The paradox, articulated in 1935, highlighted tensions between quantum probability and classical intuitions of locality and realism. Einstein, Podolsky, and Rosen considered entangled particle pairs where measuring one instantly determines the other's state, regardless of distance, seemingly implying "spooky " that violates relativistic causality. They argued this incompleteness of , as the probabilistic fails to assign definite values to all observables simultaneously, necessitating hidden variables to restore . John in 1964 formalized this critique by deriving inequalities that any must satisfy for correlated measurements on entangled systems. violates these Bell inequalities, as confirmed experimentally, demonstrating that quantum probabilities exhibit non-local correlations incompatible with classical independence assumptions—outcomes on separated particles are not statistically independent. Recent developments in the 2020s have extended quantum probability to open systems, where interactions with the environment introduce decoherence, suppressing quantum superpositions and yielding classical-like probabilities. In open quantum systems, described by master equations like the Lindblad form, probability distributions evolve under non-unitary dynamics, capturing dissipation and noise. Decoherence theory, advanced through environmental monitoring models, explains the emergence of classical reality from quantum probabilities without wave function collapse, with rates determined by system-bath couplings. These frameworks, applied in quantum computing and sensing, reveal how quantum probabilities in open settings enable robust information processing despite environmental interference. In 2025, breakthroughs such as Quantinuum's Helios quantum computer have improved error correction in open systems, enhancing probabilistic modeling for scalable quantum AI applications.

Machine Learning and AI

Probability forms the cornerstone of and , enabling models to handle , make predictions under incomplete information, and optimize decisions in environments. In these fields, probabilistic frameworks quantify the likelihood of outcomes, allowing algorithms to learn from data distributions rather than deterministic rules. This approach underpins supervised, , and paradigms, where probability distributions model data generation, parameter , and state transitions. Probabilistic models, such as the , exemplify probability's role in classification tasks. The classifier computes the of a class given features using the formula P(\text{class} \mid \text{features}) \propto P(\text{features} \mid \text{class}) P(\text{class}), assuming feature independence conditional on the class to simplify computation. This draws from and has proven robust even when independence assumptions are violated, achieving near-optimal performance under zero-one loss in many scenarios. Despite its simplicity, remains a benchmark for text classification and spam detection due to its efficiency and interpretability. In , probability addresses uncertainty through Bayesian neural networks, which treat network weights as random variables with prior distributions, yielding predictive distributions rather than point estimates. This framework mitigates by integrating evidence from data to update posteriors, as pioneered in early works on Bayesian learning for neural networks. Scalable approximations like further enable this by optimizing a lower bound on the posterior via methods, allowing application to large-scale models. For instance, variational uses mini-batches to approximate posteriors efficiently, facilitating in tasks like image recognition. Reinforcement learning relies on probability via Markov decision processes (MDPs), which model environments as sequences of states, actions, and rewards with transition probabilities P(s' \mid s, a) defining the likelihood of moving to a next state s' from state s under action a. Formulated in the mid-20th century, MDPs provide a foundation for value iteration and policy optimization, enabling agents to maximize expected cumulative rewards in uncertain settings like and game playing. Recent advances in generative AI, particularly diffusion models up to 2025, leverage stochastic processes for high-fidelity data synthesis. These models employ a forward that gradually adds to data, followed by a reverse process trained to denoise and reconstruct samples from a simple . Denoising diffusion probabilistic models, a seminal variant, have driven breakthroughs in and video , outperforming prior generative approaches in sample and while maintaining probabilistic guarantees. In 2025, diffusion models have demonstrated superiority over autoregressive models in data-constrained settings, maintaining across high repetition levels without .

Philosophical and Conceptual Relations

Probability and Randomness

Probability serves as a mathematical framework for quantifying about future events or unknown states, providing a measure of likelihood that ranges from 0 (impossible) to 1 (certain). In contrast, refers to the apparent lack of in outcomes, where events occur without predictable patterns or causes, challenging the notion of a fully predictable . This distinction is epitomized in Pierre-Simon Laplace's of , known as : an intellect that, knowing the precise positions and momenta of all particles at one instant, could compute the entire past and future, rendering illusory under strict . However, critiques of this view highlight that even ' determinism is undermined by practical limits on knowledge and computation, introducing effective where perfect prediction becomes infeasible. Ontologically, interpretations of probability diverge on whether it reflects objective physical properties or subjective epistemic states. Karl Popper's propensity interpretation, introduced in , posits probability as an objective, dispositional tendency inherent in physical situations, akin to a physical propensity for outcomes rather than mere or . This contrasts with epistemic interpretations, which view probability as a measure of rational or evidential support, subjective to the knower's and updated via , as in Bayesian frameworks. Propensity theory aims to ground probability in reality independently of observers, addressing single-case events like a specific die roll's outcome as having an intrinsic tendency, while epistemic views emphasize due to incomplete knowledge. Chaos theory illustrates how deterministic systems can exhibit seemingly random, probabilistic behavior, blurring the line between order and indeterminacy. In chaotic dynamics, nonlinear equations govern evolution predictably in principle, yet sensitive dependence on initial conditions amplifies tiny perturbations into vastly different outcomes, rendering long-term prediction impossible. A classic example is weather modeling, as demonstrated by Edward Lorenz in 1963: minor rounding errors in computational inputs lead to divergent forecasts, mimicking despite underlying . This "deterministic " shows probabilistic descriptions are often pragmatic necessities for complex systems, even without true . Modern philosophical debates explore and probability's roles in and , questioning whether indeterminacy enables or merely unpredictability. In discussions, randomness—whether from quantum propensities or chaotic amplification—challenges by suggesting non-determined choices, yet critics argue true requires more than , as random events lack intentional . science extends this to emergent phenomena in self-organizing systems, where probabilistic models capture macro-level randomness arising from deterministic micro-interactions, influencing views on and without resolving determinism's tension. These debates underscore probability's dual role as both a tool for modeling and a lens for ontological inquiries into reality's fabric.

Probability in Decision Theory

In decision theory, probability plays a central role in modeling choices under by quantifying beliefs about outcomes and enabling the computation of expected utilities to guide rational action. The expected utility framework posits that a decision-maker selects the action that maximizes the of a utility function over possible outcomes, weighted by their probabilities. This approach formalizes how individuals or agents should behave when facing risky prospects, assuming defined by consistency in preferences. The foundational axiomatization of expected utility theory was provided by and in their 1944 work, where they derived it from four axioms: , , , and of preferences over lotteries. Under these axioms, the utility of an action a is given by the formula U(a) = \sum_i p_i \, u(o_i), where p_i are the probabilities of outcomes o_i, and u is a von Neumann-Morgenstern utility function unique up to positive affine transformations. This representation ensures that preferences over lotteries (probabilistic mixtures of outcomes) align with maximizing expected , providing a normative standard for rational under , where probabilities are objective. Leonard J. Savage extended this framework in 1954 to handle uncertainty, where probabilities are not given but must be elicited as subjective degrees of belief, linking decision theory to . Savage's axioms—such as the , which requires preferences to depend only on relevant states—yield subjective expected utility maximization, where acts (mappings from states to outcomes) are evaluated via U(a) = \sum_s \pi(s) \, u(a(s)), with \pi(s) as subjective probabilities over states s. This bridges objective risk and subjective uncertainty, positing that rational agents update beliefs via Bayes' rule and act to maximize subjective expected utility. A classic illustration of expected utility's role is the resolution of the , posed in 1713, where a game with infinite expected monetary value (fair coin flips until heads, payoff $2^k for k tails) leads to finite when is logarithmic in wealth, as proposed by in 1738. For initial wealth w, the expected is \sum_{k=1}^\infty (1/2^k) \ln(w + 2^k), which converges to a finite value, explaining why rational agents reject infinite bets due to diminishing of wealth. This demonstrates how probability, combined with concave functions, resolves apparent irrationalities in valuing infinite-expectation gambles. While expected utility theory provides a normative ideal, empirical deviations prompted behavioral extensions, notably prospect theory by Daniel Kahneman and Amos Tversky in 1979, which incorporates probability weighting to better describe observed choices. In prospect theory, decision weights \pi(p_i) distort objective probabilities p_i, overweighting low probabilities and underweighting moderate ones, such that value is computed as \sum \pi(p_i) v(x_i), where v is a value function with reference dependence and loss aversion. This model captures phenomena like the Allais paradox, where individuals violate independence, highlighting probability's distorted role in actual decision-making under risk.

References

  1. [1]
    A brief introduction to probability - PMC - NIH
    This is a mathematical model that is able to link every value of a variable to the probability that this value may be actually observed.Probability · Figure 1 · Table 2. Table Of The Most...Missing: authoritative | Show results with:authoritative
  2. [2]
    [PDF] FERMAT AND PASCAL ON PROBABILITY - University of York
    The problem was proposed to Pascal and Fermat, probably in 1654, by the Chevalier de. Méré, a gambler who is said to have had unusual ability “even for the ...
  3. [3]
    [PDF] FOUNDATIONS THEORY OF PROBABILITY - University of York
    THEORY OF PROBABILITY. BY. A.N. KOLMOGOROV. Second English Edition. TRANSLATION EDITED BY. NATHAN MORRISON. WITH AN ADDED BIBLIOGRPAHY BY. A.T. BHARUCHA-REID.
  4. [4]
    Kolmogorov axioms of probability - The Book of Statistical Proofs
    Jul 30, 2021 · We introduce three axioms of probability: P(E)∈R,P(E)≥0,for all E∈E. (1) P(Ω)=1. (2) Third axiom: The probability of any countable sequence of disjoint (ie ...
  5. [5]
    [PDF] Grinstead and Snell's Introduction to Probability
    This text is designed for an introductory probability course taken by sophomores, juniors, and seniors in mathematics, the physical and social sciences, ...Missing: authoritative | Show results with:authoritative
  6. [6]
    GAMING DICE AND DICE FOR PROGNOSTICATION IN THE ... - jstor
    Polyhedrons in general and the cube in particular have been found in Mesopotamia, Egypt and the Levant dating from the beginning of the third millennium BCE ...Missing: probability scholarly
  7. [7]
    The Origin of Probability and The Problem of Points
    In ancient times, ankle bones of animals, or astragali, were used by children as dice are used today. The irregular shapes of the bones provided an element of ...
  8. [8]
    Decoding Cardano's Liber de Ludo Aleae - ScienceDirect.com
    Written in the 16th century, Cardano's Liber de Ludo Aleae was, in its time, an advanced treatment of the probability calculus.
  9. [9]
    [PDF] Some Laws and Problems of Classical Probability and How ...
    Cardano's works on probability were published post- humously in the famous 15–page Liber de Ludo Aleae (The. Book on Games of Chance) consisting of 32 small ...
  10. [10]
    July 1654: Pascal's Letters to Fermat on the "Problem of Points"
    Jul 1, 2009 · In 1654, a French essayist and amateur mathematician named Antoine Gombaud, who was fond of gambling, found himself pondering what is known as “the problem of ...
  11. [11]
    [PDF] The Pascal-Fermat Correspondence
    The August 24, 1654, letter from Pascal to Fer- mat is particularly well suited for exposing high school and college-level students to the process of actual ...
  12. [12]
    [PDF] CHRISTIANI HUGENII LIBELLUS DE RATIOCINIIS IN LUDO ALEAE ...
    Christiaan Huygens' De Ratiociniis in ... If I have 3 Expectations of 13 and 2 Expectations of 8, the value of my Expectations wou'd by this Rule be 11.
  13. [13]
    Christiaan Huygens (1629-1695)
    Likely, Huygens' largest contribution to the development of probability theory was his 1657 tract, De ratiociniis in ludo aleae. It included the problem of ...
  14. [14]
    [PDF] Jakob Bernoulli On the Law of Large Numbers Translated into ...
    His Ars Conjectandi (1713) (AC) was published posthumously with a Foreword by his nephew, Niklaus Bernoulli (English translation: David (1962, pp. 133 – 135); ...
  15. [15]
    [PDF] De Moivre on the Law of Normal Probability - University of York
    This paper gave the first statement of the formula for the “normal curve,” the first method of finding the probability of the occurrence of an error of a given ...
  16. [16]
    LII. An essay towards solving a problem in the doctrine of chances ...
    An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, FRS communicated by Mr. Price, in a letter to John Canton, AMFR S.
  17. [17]
    The Early Development of Set Theory
    Apr 10, 2007 · First Cantor, and then Russell, discovered the paradoxes in set theory. Cantor was led to the paradoxes by having introduced the “conceptual ...Missing: probability | Show results with:probability
  18. [18]
    Bertrand's Paradox and the Principle of Indifference
    Jan 1, 2022 · Bertrand advances a probability problem, now known as his paradox, to which the principle is supposed to apply; yet, just because the problem is ...
  19. [19]
    Borel and the Emergence of Probability on the Mathematical Scene ...
    Dec 19, 2022 · For Borel's generation, Georg Cantor played a specific role among the German mathematicians with their treatment of sets and their studies of ...
  20. [20]
    [PDF] The Axiomatic melting pot - arXiv
    Jul 8, 2006 · Macák's lectures on the development of probability theory which he had read in the previous ten years. Quite surprisingly, the book finishes ...
  21. [21]
    [PDF] BROWNIAN MOTION 1.1. Wiener Process
    ... Brownian motion – that is to say, the existence of a stochastic process with the properties listed in Definition 1 – was first proved by Wiener in about 1920.
  22. [22]
    [PDF] How Paul L´evy saw Jean Ville and Martingales - jehps
    In the present paper, we consider how Paul Lévy used martingale-type conditions for his studies on sums of dependent random variables during the 1930s. In a ...
  23. [23]
    [PDF] Some History of Optimality - Rice Statistics
    Neyman and Pearson (1933) implemented the above program by seeking, for any given situation, the test which, among all those controlling the probability of ...
  24. [24]
    Interpretations of Probability - Stanford Encyclopedia of Philosophy
    Oct 21, 2002 · It is the concept of the probability of something given or in the light of some piece of evidence or information. Indeed, some authors take ...3. The Main Interpretations · 3.2 The Logical/evidential... · 3.3 The Subjective...Missing: authoritative | Show results with:authoritative
  25. [25]
    The Interpretation of Probability: Still an Open Issue? 1 - MDPI
    Aug 29, 2017 · The so-called 'classical theory' is usually associated with the French mathematician, physicist and astronomer Pierre Simon de Laplace (1749– ...
  26. [26]
    Classical Probability - an overview | ScienceDirect Topics
    The classical interpretation, however, has limitations. For example, it would be impossible to apply it to the case of a loaded die. Here obviously there is ...
  27. [27]
    John Venn - The Information Philosopher
    John Venn introduced the frequency interpretation of probability in his The Logic of Chance in 1866. Venn said that his work was inspired by John Stuart Mill's ...
  28. [28]
    [PDF] "Truth and Probability" (1926)
    Note on this Electronic Edition: the following electronic edition of Frank Ramsey's famous essay. "Truth and Probability" (1926) is adapted from Chapter VII of ...
  29. [29]
    [PDF] BRUNO DE FINETTI - Foresight: Its Logical Laws, Its Subjective ...
    The distinction between the two concepts is made throughout the translation; the word "sub- jectivist" is reserved to mean "one who holds a subjectivistic ...
  30. [30]
    Bayes' Theorem > Examples, Tables, and Proof Sketches (Stanford ...
    Example 1: Random Drug Testing. Joe is a randomly chosen member of a large population in which 3% are heroin users. Joe tests positive for heroin in a drug test ...Examples, Tables, And Proof... · Example 3: An Illustration... · Example 4: An Illustration...
  31. [31]
    Harold Jeffreys's Theory of Probability Revisited - Project Euclid
    In this paper we point out the fundamental aspects of this reference work, especially the thorough coverage of testing problems and the construction of both ...
  32. [32]
    [PDF] Theory of Probability - University of Texas at Austin
    Definition 1.9 (Borel σ-algebras) If (S, τ) is a topological space, then the σ-algebra σ(τ), generated by all open sets, is called the Borel σ-algebra on (S, τ) ...
  33. [33]
    [PDF] Foundations of the theory of probability - Internet Archive
    The theory of probability, as a mathematical discipline, can and should be developed from axioms in exactly the same way as Geometry and Algebra.<|separator|>
  34. [34]
    Multi-event Probability: Addition Rule - Data Science Discovery
    The addition rule is used to calculate the probability that either (or both) of 2 events will happen.
  35. [35]
    Probability Axioms -- from Wolfram MathWorld
    Given an event E in a sample space S which is either finite with N elements or countably infinite with N=infty elements, then we can write S=( union ...
  36. [36]
    Inclusion-Exclusion Principle -- from Wolfram MathWorld
    The principle of inclusion-exclusion was used by Nicholas Bernoulli to solve the recontres problem of finding the number of derangements.
  37. [37]
    [PDF] Reminder No. 1: Uncorrelated vs. Independent
    Feb 27, 2013 · Figure 1: An example of two random variables which are uncorrelated but strongly dependent. The grey “rug plots” on the axes show the ...
  38. [38]
    [PDF] Pairwise vs. Three-way Independence
    Example 1. We throw two dice. Let A be the event “the sum of the points is 7”, B the event “die #1 came up 3”, and C the event “die #2 came up 4”.
  39. [39]
    [PDF] Introduction to Probability Theory and Its Applications
    FELLER · An Introduction to Probability Theory and Its Applications, ... notion of random variables usually introduced at the outset. This book goes ...
  40. [40]
    Bayes' Rule - Probability Course
    For any two events A and B, where P(A)≠0, we have P(B|A)=P(A|B)P(B)P(A). · If B1,B2,B3,⋯ form a partition of the sample space S, and A is any event with P(A)≠0, ...
  41. [41]
    6. Odds and Addends — Think Bayes
    This is Bayes's Rule, which says that the posterior odds are the prior odds times the likelihood ratio. Bayes's Rule is convenient for computing a Bayesian ...
  42. [42]
    Bayes or Laplace? An examination of the origin and early ...
    Archive for History of Exact Sciences; Article. Bayes or Laplace? An examination of the origin and early applications of Bayes' theorem. Published: March 1982.
  43. [43]
    Screening Test Errors (Bayes' Theorem) - StatsDirect
    Sensitivity is the ability of the test to pick up what you are looking for and specificity is the ability of the test to reject what you are not looking for.
  44. [44]
    [PDF] Notes on Naive Bayes Classifiers for Spam Filtering - Washington
    Consider the following problem involving Bayes' Theorem: 40% of all emails are spam. 10% of spam emails contain the word “viagra”, while only 0.5% of nonspam.
  45. [45]
    [PDF] Conditional Probability, Independence and Bayes' Theorem Class 3 ...
    The bottom equation (3) is called the law of total probability. It is just a rewriting of the top equation using the multiplication rule.
  46. [46]
    [PDF] Bayesian updating with continuous priors Class 13, 18.05 Jeremy ...
    The law of total probability for continuous probability distributions is essentially the same as for discrete distributions. We replace the prior pmf by a ...
  47. [47]
    [PDF] Chapter 2. Discrete Probability 2.2: Conditional Probability
    This is exactly what the law of total probability lets us do! Example(s) Misfortune struck us and we ended up failing chemistry class.
  48. [48]
    [PDF] Mixture Models and the EM Algorithm
    The sum over k above is in effect just an application of the law of total probability where we are summing out over the random variable zi (but with some ...
  49. [49]
    Probability Reference Sheet
    Marginalization. Marginalization uses the law of total probability to “sum out" variables from a joint distribution. ... rule is a smaller instance of chain rule ...
  50. [50]
    Central Limit Theorem - StatLect
    Central Limit Theorems (CLT) state conditions that are sufficient to guarantee the convergence of the sample mean to a normal distribution as the sample size ...
  51. [51]
    [PDF] Two Proofs of the Central Limit Theorem
    The theorem says that under rather gen- eral circumstances, if you sum independent random variables and normalize them accordingly, then at the limit (when you ...
  52. [52]
    [PDF] A Probabilistic Proof of the Lindeberg-Feller Central Limit Theorem
    In essence, the Central Limit Theorem states that the normal dis- tribution applies whenever one is approximating probabilities for a quantity which is a sum of ...
  53. [53]
    [PDF] Central Limit Theorem and the Law of Large Numbers Class 6 ...
    The central limit theorem allows us to approximate a sum or average of i.i.d random vari- ables by a normal random variable. This is extremely useful because it ...
  54. [54]
    [PDF] Berry–Esseen Bounds for Independent Random Variables
    In this chapter we illustrate some of the main ideas of the Stein method by proving the classical Lindeberg central limit theorem and the Berry–Esseen ...
  55. [55]
    Statistical Method For Research Workers : Fisher, R. A
    Jan 23, 2017 · PDF download · download 1 file · PDF WITH TEXT download · download 1 file · SINGLE PAGE PROCESSED JP2 ZIP download · download 1 file · TORRENT ...
  56. [56]
    P values and Ronald Fisher - Brereton - Analytical Science Journals
    May 14, 2020 · The historic origins of the concept of P values are described together with its mathematical and statistical definition.Abstract · FISHER AND THE LADY... · NULL HYPOTHESIS · CONSEQUENCES
  57. [57]
    IX. On the problem of the most efficient tests of statistical hypotheses
    The problem of testing statistical hypotheses is an old one. Its origin is usually connected with the name of Thomas Bayes.
  58. [58]
    [PDF] Outline of a Theory of Statistical Estimation Based on the Classical ...
    0 (E), the confidence interval corresponding to the confidence coefficient ct. It is obvious that the form of the functions 0 (E) and 0 (E) must depend upon the.Missing: Jerzy | Show results with:Jerzy
  59. [59]
    Gauss on least-squares and maximum-likelihood estimation
    Apr 2, 2022 · Gauss was primarily interested in the justification of least squares, not in pushing the normal distribution.Abstract · Notes · References · Author informationMissing: paper | Show results with:paper
  60. [60]
    Full article: Introducing Causal Inference Using Bayesian Networks ...
    Sep 27, 2022 · We present an instructional approach to teaching causal inference using Bayesian networks and do-Calculus, which requires less prerequisite knowledge of ...
  61. [61]
    Bayesian causal inference: a critical review
    May 15, 2023 · This paper provides a critical review of the Bayesian perspective of causal inference based on the potential outcomes framework.<|control11|><|separator|>
  62. [62]
    Boltzmann's Work in Statistical Physics
    Nov 17, 2004 · The celebrated formula S = k logW, expressing a relation between entropy S and probability W has been engraved on his tombstone (even though ...
  63. [63]
    Large deviations and the Boltzmann entropy formula - Project Euclid
    The famous equation S = k lnW usually attributed to Boltzmann, actually written in this final form by Planck on his route to the quantum hypothesis, was ...
  64. [64]
    [PDF] Quantum mechanics of collision processes
    38 (1927), 803-827. Quantum mechanics of collision processes (. 1. ). By Max Born in Göttingen. ... 14 (1926), 664. Page 2. Born – Quantum mechanics of collision ...
  65. [65]
    [PDF] Can Quantum-Mechanical Description of Physical Reality Be
    A. EINSTEIN, B. PODOLSKY AND N. ROSEN, Institute for Advanced Study, Princeton, New Jersey. (Received March 25, 1935). In a complete theory there is an ...
  66. [66]
    [PDF] ON THE EINSTEIN PODOLSKY ROSEN PARADOX*
    THE paradox of Einstein, Podolsky and Rosen [1] was advanced as an argument that quantum mechanics could not be a complete theory but should be supplemented ...
  67. [67]
    Open Systems, Quantum Probability, and Logic for Quantum-like ...
    This is a review of quantum-like modeling and its applications, with an emphasis on the role of theory in open quantum systems. Such modeling is built on the ...Missing: 2020s | Show results with:2020s
  68. [68]
    [PDF] A philosophical essay on probabilities
    Marks, notations and other marginalia present in the original volume will appear in this file - a reminder of this book's long journey from the publisher to ...
  69. [69]
    Laplace's Demon - The Information Philosopher
    So we now know that a Laplace Demon is impossible, and for two distinct reasons. The old reason was that modern quantum physics is inherently indeterministic.Missing: critique randomness
  70. [70]
    [PDF] The Propensity Interpretation of Probability - Pasquale Cirillo
    KARL R. POPPER then present, as point (3), a certain difficulty which the interpretation has to face, though it does not, when first like a serious ...
  71. [71]
    Karl Popper: Philosophy of Science
    1959. “The Propensity Interpretation of Probability.” The British Journal for the Philosophy of Science 10 (37): 25–42. 1963. Conjectures and Refutations ...
  72. [72]
    Chaos - Stanford Encyclopedia of Philosophy
    Jul 16, 2008 · In addition to exhibiting sensitive dependence, chaotic systems are deterministic and nonlinear and exhibit aperiodic behavior (Lorenz 1963).
  73. [73]
    Introduction to chaos, predictability and ensemble forecasts | ECMWF
    Chaos theory describes unpredictable behavior. Ensemble prediction uses multiple forecasts from different initial conditions to account for uncertainties.
  74. [74]
    Randomness and nondeterminism: from genes to free will with ...
    Randomness and selection are fundamental processes rooted in the very basis of life, as postulated by the theory of evolution.
  75. [75]
    Chance versus Randomness - Stanford Encyclopedia of Philosophy
    Aug 18, 2010 · Randomness, as we ordinarily think of it, exists when some outcomes occur haphazardly, unpredictably, or by chance.
  76. [76]
    Theory of Games and Economic Behavior: 60th Anniversary ... - jstor
    Theory of Games and Economic Behavior: 60th Anniversary Commemorative Edition. John von Neumann. Oskar Morgenstern. With an introduction by Harold W. Kuhn.
  77. [77]
    [PDF] <em>The Foundations of Statistics</em> (Second Revised Edition)
    revised and enlarged version of the work originally published by John Wiley & Sons in 1954. International Standard Book Number: 0-486-62349-1. Library of ...
  78. [78]
    Exposition of a New Theory on the Measurement of Risk - jstor
    EVER SINCE mathematicians first began to study the measurement of risk there has been general agreement on the following proposition: Expected values.
  79. [79]
    Prospect Theory: An Analysis of Decision under Risk - jstor
    BY DANIEL KAHNEMAN AND AMOS TVERSKY'. This paper presents a critique of expected utility theory as a descriptive model of decision making under risk, and ...