Fact-checked by Grok 2 weeks ago

Probability distribution

A probability distribution is a mathematical function that assigns probabilities to the possible outcomes of a random variable, quantifying the likelihood of each outcome occurring in a probabilistic experiment.^[1] For discrete random variables, which take on countable values such as integers, the distribution is described by a probability mass function p(x) where p(x) \geq 0 for all x and \sum p(x) = 1 over all possible values, ensuring the total probability sums to unity.^[1] In contrast, for continuous random variables, which can take any value in a continuum, the distribution is given by a probability density function f(x) where f(x) \geq 0 and \int_{-\infty}^{\infty} f(x) \, dx = 1, with probabilities computed as integrals over intervals rather than at single points.^[1] Probability distributions form the foundation of statistical inference and modeling uncertainty across diverse fields, enabling predictions and decision-making under randomness.^[2] Common discrete distributions include the binomial distribution, which models the number of successes in a fixed number of independent Bernoulli trials with success probability p, having mean np and variance np(1-p), often applied to scenarios like quality control or voting outcomes.^[3] The Poisson distribution describes the count of rare events in a fixed interval, with parameter \mu (mean rate), mean \mu, and variance \mu, widely used in queueing theory, reliability engineering, and modeling arrivals such as customer traffic or accidents.^[3] Among continuous distributions, the normal distribution (or Gaussian) is paramount due to the central limit theorem, which states that the sum of many independent random variables approximates a normal distribution regardless of their original forms; it features a bell-shaped density f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}, with mean \mu and variance \sigma^2, and applies to phenomena like measurement errors, stock returns, and biological traits.^[3] The exponential distribution models the time between independent events in a Poisson process, with density f(x) = \lambda e^{-\lambda x} for x \geq 0, rate parameter \lambda, mean $1/\lambda, and variance $1/\lambda^2, commonly used for lifetimes, service times, and inter-arrival durations in telecommunications or manufacturing.^[3]^[4] Other notable distributions, such as the uniform (equal probability over an interval) and gamma (generalizing exponential for sums of waiting times), further extend modeling capabilities in simulations, risk assessment, and scientific data analysis.^[3]

Fundamentals

Introduction

A probability distribution is a mathematical function that describes the possible outcomes of a random variable and assigns probabilities to those outcomes, providing a complete characterization of the uncertainty inherent in random processes.^[1] This framework allows for the quantification of likelihoods, enabling predictions about the behavior of systems influenced by chance, from simple experiments to complex natural phenomena.^[5] The origins of probability distributions trace back to the 17th century, when mathematicians Blaise Pascal and Pierre de Fermat exchanged correspondence in 1654 to resolve problems arising from games of chance, such as dividing stakes in interrupted dice games.^[6] Their work laid the groundwork for systematic approaches to calculating odds and expectations in gambling scenarios, marking the birth of probability as a mathematical discipline.^[6] The field was later formalized in a rigorous axiomatic framework by Andrey Kolmogorov in his 1933 monograph Foundations of the Theory of Probability, which defined probability measures on abstract spaces and unified disparate ideas into a coherent theory.^[7] Probability distributions play a central role across diverse fields by modeling randomness in real-world data and processes. In statistics, they underpin inference, hypothesis testing, and estimation techniques essential for drawing conclusions from samples.^[1] In physics, distributions describe particle behaviors and thermodynamic systems, such as the Maxwell-Boltzmann distribution for molecular speeds.^[5] Finance relies on them for risk assessment and option pricing, as seen in models like the Black-Scholes framework that assume log-normal asset returns.^[8] In machine learning, probabilistic distributions form the basis for algorithms in supervised and unsupervised learning, facilitating tasks like generative modeling and uncertainty quantification.^[9] Distributions are broadly classified into discrete and continuous types, reflecting the nature of the random variable's possible values. Discrete distributions apply to scenarios with countable outcomes, such as the number of heads in a series of coin flips, where each specific count has a nonzero probability.^[1] Continuous distributions, in contrast, handle uncountable outcomes over intervals, like human heights measured in real numbers, where probabilities are assigned to ranges rather than exact points.^[1] The cumulative distribution function serves as a fundamental tool for unifying these cases, capturing the probability that the random variable falls below a given value.^[1]

Definition

In probability theory, a random variable is a measurable function X: \Omega \to \mathbb{R} defined on a probability space (\Omega, \mathcal{F}, P), where \Omega is the sample space, \mathcal{F} is a \sigma-algebra of events, and P is a probability measure, such that for every real number a, the set \{\omega \in \Omega : X(\omega) < a\} belongs to \mathcal{F}.^[10] This measurability ensures that probabilities of events defined in terms of X can be consistently assigned. The probability distribution of a random variable X is the induced probability measure \mu on the Borel \sigma-algebra of \mathbb{R}, defined by \mu(B) = P(X^{-1}(B)) for every Borel set B \subseteq \mathbb{R}, which assigns probabilities to the possible outcomes or ranges of X.^[10] This distribution satisfies Kolmogorov's axioms: non-negativity, meaning \mu(B) \geq 0 for all Borel sets B; additivity for disjoint countable unions, \mu\left(\bigcup_{n=1}^\infty B_n\right) = \sum_{n=1}^\infty \mu(B_n) if the B_n are disjoint; and normalization, \mu(\mathbb{R}) = 1.^[10] In general, the probability distribution describes the law of X, where for discrete random variables it is given by the probabilities P(X = x) at each point x in the support, and for continuous random variables by a density function f such that probabilities are obtained via integration over intervals.^[10] The total probability over the support satisfies

\int P(X \in \, dx) = 1,

ensuring the measure is normalized across all possible outcomes.^[10]

Terminology

A random variable is a function that assigns a real number to each outcome in a probability space, mapping the sample space to the real numbers.^[11] Random variables are classified as discrete if their possible values form a countable set, such as the integers, or continuous if they can take any value in a continuous interval of the real numbers.^[12]^[13] The support of a probability distribution is the smallest closed set of points such that the probability of the random variable taking a value outside this set is zero, representing the set where the distribution assigns positive probability.^[14]^[15] Parameters of a probability distribution are numerical characteristics that define its shape and location, such as the mean and variance, which respectively indicate the central tendency and spread of the distribution.^[16]^[17] For discrete random variables, the probability mass function (PMF) is the function that assigns to each possible value the probability that the random variable equals that value.^[18]^[19] For continuous random variables, the probability density function (PDF) is a non-negative function whose integral over any interval gives the probability that the random variable falls within that interval; such distributions are absolutely continuous with respect to the Lebesgue measure, meaning the cumulative distribution function is the integral of the PDF.^[20] The expectation, also known as the mean, of a random variable is the weighted average of its possible values, where the weights are the probabilities.^[21]^[22] The variance measures the expected squared deviation of the random variable from its mean, quantifying the dispersion of the distribution.^[23]^[24] Two random variables are independent if the occurrence of one does not affect the probability distribution of the other, formally meaning that the joint probability is the product of the marginal probabilities for all pairs of values.^[25]^[26] The cumulative distribution function serves as a unifying concept that defines the probability that the random variable is less than or equal to a given value, applicable to both discrete and continuous cases.^[13]

Cumulative Distribution Function

Properties

The cumulative distribution function (CDF) of a random variable X, denoted F_X(x), is defined as F_X(x) = P(X \leq x) for x \in \mathbb{R}, mapping to the interval [0, 1].^[27] This function encapsulates the probability that X takes a value less than or equal to x, providing a complete probabilistic description of the distribution.^[28] Key properties of the CDF include non-decreasing monotonicity, right-continuity, and specific boundary behaviors. Specifically, F_X(x) is non-decreasing, meaning that if x_1 < x_2, then F_X(x_1) \leq F_X(x_2), reflecting the accumulation of probability as x increases.^[27] It is right-continuous at every point, so \lim_{y \to x^+} F_X(y) = F_X(x).^[28] The limits satisfy \lim_{x \to -\infty} F_X(x) = 0 and \lim_{x \to \infty} F_X(x) = 1, ensuring the total probability sums to 1 over the entire real line.^[27] For discrete distributions, the CDF exhibits jumps at points where the random variable has positive probability mass, with the jump size equal to P(X = x); in contrast, for continuous distributions, the CDF is continuous everywhere.^[28] The explicit form of the CDF depends on whether the distribution is discrete or continuous. For a discrete random variable with probability mass function p_k = P(X = k), the CDF is given by

F_X(x) = \sum_{k \leq x} p_k,

summing the probabilities up to x.^[27] For a continuous random variable with probability density function f(t), it is

F_X(x) = \int_{-\infty}^x f(t) \, dt,

representing the integral of the density from negative infinity to x.^[27] The CDF uniquely determines the probability distribution of X, as any two random variables with the same CDF induce identical probability measures.^[29] This uniqueness theorem ensures that the CDF serves as the canonical representation for characterizing distributions in probability theory.^[30]

Relation to Other Functions

The cumulative distribution function (CDF) F_X(x) = P(X \leq x) serves as a foundational tool for deriving other key functions that characterize the distribution of a random variable X. For continuous random variables, the probability density function (PDF) f_X(x) is directly related to the CDF through differentiation, where f_X(x) = \frac{d}{dx} F_X(x), assuming the CDF is absolutely continuous and differentiable.^[27] This relationship implies that the CDF can be recovered by integrating the PDF: F_X(x) = \int_{-\infty}^x f_X(t) \, dt.^[31] For discrete random variables, the probability mass function (PMF) p_X(x) = P(X = x) is obtained via finite differences of the CDF, specifically p_X(x) = F_X(x) - \lim_{t \uparrow x} F_X(t), which for support on integers simplifies to p_X(x) = F_X(x) - F_X(x-1).^[19] Conversely, the CDF is the cumulative sum of the PMF: F_X(x) = \sum_{t \leq x} p_X(t).^[27] Another important function derived from the CDF is the quantile function, defined as the generalized inverse Q_X(p) = F_X^{-1}(p) = \inf \{ x : F_X(x) \geq p \} for p \in (0,1), which provides the value below which a proportion p of the distribution lies.^[32] This function is particularly useful for computing percentiles and in simulation methods, such as inverse transform sampling, where uniform random variables are transformed to follow the target distribution. The quantile function inverts the CDF in the sense that F_X(Q_X(p)) \geq p and Q_X(F_X(x)) \leq x, with equality holding under continuity and strict monotonicity.^[32] The survival function, often denoted S_X(x) = 1 - F_X(x), represents the probability P(X > x) and is widely used in reliability engineering and survival analysis to model the probability of an event not occurring by time x.^[33] It complements the CDF by focusing on tail probabilities and is non-increasing, with S_X(x) approaching 0 as x goes to infinity. Additionally, the CDF enables straightforward computation of probabilities over intervals: for any real numbers a < b, P(a < X \leq b) = F_X(b) - F_X(a), which holds for both continuous and discrete cases due to the right-continuity of the CDF.^[34] This property underscores the CDF's role in bounding and calculating distributional intervals efficiently.^[27]

Discrete Probability Distributions

Definition and Examples

A discrete probability distribution describes the probabilities associated with a random variable whose possible values form a countable set, such as the integers or a finite list. In this framework, the probability that the random variable X takes a specific value x is given by P(X = x) = p(x), where p is the probability mass function satisfying p(x) \geq 0 for all x and \sum p(x) = 1 over the support, ensuring the total probability sums to unity.^[35]^[18] These distributions assign positive probability only to countable points, with zero probability for intervals between points. The cumulative distribution function is a step function, jumping at each point with positive probability.^[36] Prominent examples include the Bernoulli distribution, which models a single trial with success probability p (e.g., coin flip, where P(X=1) = p and P(X=0) = 1-p), and the discrete uniform distribution on \{1, 2, \dots, n\}, which assigns equal probability $1/n to each integer (e.g., die roll).^[37]^[38]

Probability Mass Function

The probability mass function (PMF) of a discrete random variable X is defined as the function p(x) = P(X = x), which assigns to each possible value x in the support of X the probability that X equals x.^[18] This function fully characterizes the distribution of X, providing the probabilities for all discrete outcomes.^[39] The PMF satisfies two fundamental properties: p(x) \geq 0 for all x in the sample space, ensuring non-negative probabilities, and \sum_{x} p(x) = 1, guaranteeing that the total probability over all possible outcomes is unity.^[18] The support of the PMF consists of the set of all x where p(x) > 0, which may be finite, countably infinite, or a subset of the integers.^[19] Key properties of the PMF enable the computation of important distributional characteristics. The expected value, or mean, of X is given by E[X] = \sum_{x} x \, p(x), representing the long-run average value of the random variable. The variance, which measures the spread of the distribution, is \operatorname{Var}(X) = E[X^2] - (E[X])^2, where E[X^2] = \sum_{x} x^2 \, p(x).^[23] To compute the PMF for specific distributions, standard formulas are applied. For the binomial distribution with parameters n (number of trials) and p (success probability), the PMF is p(k) = \binom{n}{k} p^k (1-p)^{n-k} for k = 0, 1, \dots, n. For the Poisson distribution with parameter \lambda (average rate), it is p(k) = e^{-\lambda} \frac{\lambda^k}{k!} for k = 0, 1, 2, \dots .^[40] The PMF relates to the probability generating function (PGF) of X, defined as G(s) = \sum_{x} p(x) s^x = E[s^X], which encodes the probabilities as coefficients in its power series expansion and facilitates calculations for sums of independent random variables.^[41]

Continuous Probability Distributions

Definition and Examples

A continuous probability distribution, specifically an absolutely continuous one, describes the probabilities associated with a random variable whose possible values form an uncountable set, such as an interval on the real line. In this framework, the probability that the random variable X falls within an open interval (a, b) is computed as the integral \int_a^b f(x) \, dx, where f is the probability density function satisfying f(x) \geq 0 for all x and \int_{-\infty}^{\infty} f(x) \, dx = 1.^[42]^[43] These distributions exhibit absolute continuity with respect to the Lebesgue measure, implying no point masses: the probability assigned to any single point is zero, P(X = x) = 0 for all x./03:_Distributions/3.13:_Absolute_Continuity_and_Density_Functions) The cumulative distribution function for such a distribution arises as the integral of the density function up to a given point.^[43] Prominent examples include the uniform distribution on the interval [a, b], which assigns equal probability to every point within that bounded range, modeling scenarios like random selection from a continuous uniform source.^[44] The exponential distribution, parameterized by a rate \lambda > 0, captures waiting times between independent events occurring at a constant average rate, such as inter-arrival times in a Poisson process.^[45] The normal distribution, defined by mean \mu \in \mathbb{R} and standard deviation \sigma > 0, produces the characteristic bell-shaped curve and serves as a foundational model for phenomena where values cluster symmetrically around the center, underpinning much of inferential statistics.

Probability Density Function

For a continuous random variable X with support over the real numbers, the probability density function (PDF), denoted f(x), is a non-negative function such that the probability that X falls within an interval [a, b] is given by the integral \int_a^b f(x) \, dx, rather than the value of the function at a specific point.^[31] Unlike probabilities in discrete distributions, the PDF value f(x) at any point does not represent a probability and can exceed 1, as it measures density rather than likelihood at a point; the probability of X equaling exactly any single value is zero.^[31] The interpretation of the PDF emphasizes that probabilities are determined by the area under the curve over an interval, providing a geometric understanding of continuous outcomes.^[46] Key properties of the PDF include normalization, where \int_{-\infty}^{\infty} f(x) \, dx = 1, ensuring the total probability over the entire support is unity, and non-negativity, f(x) \geq 0 for all x.^[46] The expected value (mean) \mu is computed as \mu = \int_{-\infty}^{\infty} x f(x) \, dx, and the variance \sigma^2 as \sigma^2 = \int_{-\infty}^{\infty} (x - \mu)^2 f(x) \, dx, which quantify central tendency and spread using weighted integrals over the density.^[47] A classic example is the uniform distribution on the interval [a, b], where the PDF is constant:

f(x) = \begin{cases} \frac{1}{b - a} & a \leq x \leq b, \\ 0 & \text{otherwise}. \end{cases}

This reflects equal likelihood across the interval, with the height \frac{1}{b - a} ensuring the area integrates to 1.^[31] Another fundamental case is the exponential distribution with rate parameter \lambda > 0, modeling waiting times or lifetimes, with PDF:

f(x) = \begin{cases} \lambda e^{-\lambda x} & x \geq 0, \\ 0 & \text{otherwise}. \end{cases}

Here, the density decays exponentially, capturing memoryless properties in processes like radioactive decay.^[47]

Other Types of Distributions

Singular Distributions

Singular distributions, also known as singular continuous distributions, are probability distributions that are neither discrete nor absolutely continuous with respect to the Lebesgue measure.^[48] Their cumulative distribution function (CDF) is continuous and non-decreasing but lacks a probability density function (PDF), as the distribution assigns positive probability to sets of Lebesgue measure zero while having no point masses.^[49] This contrasts with absolutely continuous distributions, where the CDF is the integral of a density function. A key property of singular distributions is that the derivative of their CDF is zero almost everywhere with respect to the Lebesgue measure, yet the CDF still increases over intervals, concentrating probability on "pathological" sets like fractals. These distributions are mutually singular with the Lebesgue measure, meaning there exists a set of measure zero that carries all the probability mass.^[50] Unlike discrete distributions, they have no atoms, ensuring the CDF has no jumps. The Cantor distribution provides a canonical example of a singular continuous distribution. It is supported on the ternary Cantor set in [0,1], a compact set of Lebesgue measure zero constructed by iteratively removing middle-third intervals.^[51] The CDF of the Cantor distribution, known as the Cantor-Lebesgue function or devil's staircase, is constant on the removed intervals and increases continuously from 0 to 1 over the Cantor set, resulting in a continuous but nowhere differentiable function except at countably many points. This function maps the unit interval onto [0,1] in a measure-preserving way, illustrating how probability can be distributed without density. In general, any probability distribution on the real line can be uniquely decomposed into a mixture of a discrete component (with point masses), an absolutely continuous component (with a PDF), and a singular continuous component, as per the Lebesgue decomposition theorem.^[49] The singular part captures distributions that evade both atomic and density-based descriptions, highlighting the richness of measure-theoretic probability.^[48]

Multivariate Distributions

A multivariate probability distribution describes the joint behavior of multiple random variables, extending the univariate case to vector-valued outcomes. For a random vector \mathbf{X} = (X_1, \dots, X_n) taking values in \mathbb{R}^n, the joint cumulative distribution function (CDF) is defined as F(x_1, \dots, x_n) = P(X_1 \leq x_1, \dots, X_n \leq x_n), which fully characterizes the distribution and is non-decreasing in each argument with limits F(-\infty, \dots, x_i, \dots, -\infty) = 0 and approaching 1 as all arguments go to \infty.^[52] For discrete random vectors, the joint probability mass function (PMF) p(x_1, \dots, x_n) = P(X_1 = x_1, \dots, X_n = x_n) specifies probabilities at each point in the support, summing to 1 over all possible outcomes. In the continuous case, the joint probability density function (PDF) f(x_1, \dots, x_n) satisfies \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} f(x_1, \dots, x_n) \, dx_1 \cdots dx_n = 1, and the joint CDF relates to it via F(x_1, \dots, x_n) = \int_{-\infty}^{x_1} \cdots \int_{-\infty}^{x_n} f(u_1, \dots, u_n) \, du_1 \cdots du_n.^[52] Marginal distributions are derived from the joint by eliminating variables not of interest, providing the univariate or lower-dimensional laws. For a continuous bivariate case with joint PDF f_{X,Y}(x,y), the marginal PDF of X is f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) \, dy, assuming the integral exists; similarly for discrete cases, summation replaces integration.^[53] This process generalizes to higher dimensions by integrating or summing over the unwanted coordinates, yielding the marginal CDF or PMF for the retained variables. Marginals capture individual behaviors but lose information on dependencies among the variables. Independence between random variables implies no influence between their outcomes, formalized such that the joint distribution factors into marginals. Specifically, X_1, \dots, X_n are mutually independent if the joint CDF equals the product of marginal CDFs: F(x_1, \dots, x_n) = F_1(x_1) \cdots F_n(x_n), or equivalently for PMFs/PDFs: p(x_1, \dots, x_n) = p_1(x_1) \cdots p_n(x_n) or f(x_1, \dots, x_n) = f_1(x_1) \cdots f_n(x_n).^[54] This property simplifies computations, as expectations and variances of functions separate additively. Prominent examples include the multivariate normal distribution, which generalizes the univariate normal to vectors with mean vector \boldsymbol{\mu} and covariance matrix \boldsymbol{\Sigma}, capturing linear correlations through its elliptical contours and central limit theorem applicability in high dimensions.^[55] Copulas provide a flexible framework for modeling dependence separately from marginals, as per Sklar's theorem, which states that any joint CDF F can be expressed as F(x_1, \dots, x_n) = C(F_1(x_1), \dots, F_n(x_n)), where C is a copula—a multivariate CDF with uniform [0,1] marginals—allowing construction of diverse dependence structures like tail dependence in finance or risk assessment.^[56]

Advanced Characterizations

Kolmogorov Axioms

The Kolmogorov axioms form the rigorous mathematical foundation of modern probability theory, establishing it as a branch of measure theory and providing the basis for defining probability distributions. These axioms ensure that probabilities behave consistently as a countably additive measure on a structured space of events, allowing for the precise modeling of uncertainty in both discrete and continuous settings. By abstracting probability from empirical frequencies to an axiomatic system, they enable the derivation of all key properties of distributions without reliance on specific interpretations of probability. A probability space, the fundamental structure underlying this theory, consists of a triple (\Omega, \mathcal{F}, P), where \Omega is the sample space representing all possible outcomes, \mathcal{F} is a \sigma-algebra of subsets of \Omega (known as events), and P: \mathcal{F} \to [0, 1] is a probability measure that assigns a non-negative real number to each event, with P(\Omega) = 1.^[57] The \sigma-algebra \mathcal{F} ensures closure under countable unions, intersections, and complements, providing a complete framework for defining events and their probabilities. Random variables are then introduced as measurable functions X: \Omega \to \mathbb{R}, meaning that for every Borel set B \subseteq \mathbb{R}, the preimage X^{-1}(B) \in \mathcal{F}.^[57] The probability measure P satisfies three axioms:

Non-negativity: For every event A \in \mathcal{F}, P(A) \geq 0.
This ensures probabilities represent non-negative extents of possibility.^[58]
Normalization: P(\Omega) = 1.
This normalizes the total probability of the entire sample space to unity.^[58]
Countable additivity: If \{A_i\}_{i=1}^\infty \subseteq \mathcal{F} is a countable collection of pairwise disjoint events (i.e., A_i \cap A_j = \emptyset for i \neq j), then $P\left( \bigcup_{i=1}^\infty A_i \right) = \sum_{i=1}^\infty P(A_i).$ This axiom extends finite additivity to countable collections, crucial for handling infinite sample spaces in continuous distributions.^[58]

These axioms directly extend to probability distributions: for a random variable X on the probability space (\Omega, \mathcal{F}, P), the induced distribution is the probability measure P_X on (\mathbb{R}, \mathcal{B}(\mathbb{R})), the Borel \sigma-algebra on the real line, defined by P_X(B) = P(X^{-1}(B)) for every B \in \mathcal{B}(\mathbb{R}). This P_X satisfies the Kolmogorov axioms as a probability measure, unifying discrete, continuous, and singular distributions under a common measure-theoretic framework.^[57] The axioms were formalized by Andrey Kolmogorov in his seminal 1933 treatise Grundbegriffe der Wahrscheinlichkeitsrechnung (Foundations of the Theory of Probability), which provided the first axiomatic treatment of probability independent of classical or frequentist interpretations.^[58]

Moment-Generating Functions

The moment-generating function (MGF) of a random variable X is defined as M_X(t) = \mathbb{E}[e^{tX}], where the expectation is taken with respect to the distribution of X. For a discrete random variable with probability mass function p(x), this expands to M_X(t) = \sum_x e^{tx} p(x); for a continuous random variable with probability density function f(x), it is M_X(t) = \int_{-\infty}^{\infty} e^{tx} f(x) \, dx. The MGF is said to exist if it is finite for all t in some open interval containing 0.^[59]^[60] Key properties of the MGF include its ability to generate moments through differentiation. Specifically, the n-th derivative of M_X(t) evaluated at t = 0 yields the n-th raw moment: M_X^{(n)}(0) = \mathbb{E}[X^n]. For instance, the first derivative at 0 gives the mean M_X'(0) = \mathbb{E}[X] = \mu, and the second derivative provides \mathbb{E}[X^2], from which the variance can be computed as M_X''(0) - [M_X'(0)]^2. Additionally, if the MGF exists in a neighborhood of 0, it uniquely determines the distribution of X, meaning two random variables with the same MGF have identical distributions.^[59]^[60]^[61] The MGF facilitates important operations on distributions, particularly for sums of independent random variables. If X and Y are independent, then the MGF of their sum is the product of their individual MGFs: M_{X+Y}(t) = M_X(t) M_Y(t). This property extends to any finite sum of independent variables, enabling the derivation of the distribution of convolutions without direct integration. For distribution identification, comparing MGFs can confirm equality; for example, the MGF of a normal distribution with mean \mu and variance \sigma^2 is M(t) = \exp(\mu t + \sigma^2 t^2 / 2), which uniquely characterizes it among common distributions.^[59]^[61]^[60] Despite these advantages, MGFs have limitations: not all distributions possess an MGF, particularly those with heavy tails where \mathbb{E}[e^{tX}] diverges for any t \neq 0. A classic example is the Cauchy distribution, for which the MGF does not exist, necessitating alternatives like characteristic functions. This restriction means MGFs are most useful for distributions with finite moments of all orders, such as those in the exponential family.^[59]^[60]

Computation and Generation

Random Number Generation

Random number generation for probability distributions, also known as random variate generation, is a fundamental computational technique used to simulate samples from specified distributions, enabling Monte Carlo simulations, statistical modeling, and other numerical methods. These simulations rely on generating sequences of pseudorandom numbers that approximate true randomness, typically starting from a uniform distribution on [0,1) as the foundational source. The process transforms these uniform variates into samples from the target distribution using algorithmic methods that ensure the generated values follow the desired probability density or mass function.^[62] Pseudorandom number generators (PRNGs) produce deterministic sequences that appear random and uniformly distributed, serving as the basis for all variate generation. A classic example is the linear congruential generator (LCG), defined by the recurrence X_{n+1} = (a X_n + c) \mod m, where a, c, and m are chosen parameters that determine the period and statistical properties of the sequence; the output is typically scaled to [0,1) as U_n = X_n / m. LCGs are simple and fast but exhibit detectable patterns if parameters are poorly selected, limiting their use in high-dimensional applications. For improved quality, the Mersenne Twister algorithm generates uniform pseudorandom numbers with a very long period of $2^{19937} - 1 and excellent equidistribution properties across up to 623 dimensions, making it widely adopted in software libraries for its balance of speed and reliability.^[63] One general method for generating variates from a continuous distribution with cumulative distribution function (CDF) F is inverse transform sampling, which exploits the probability integral transform: generate U \sim \text{[Uniform](/page/Uniform)}(0,1), then set X = F^{-1}(U), where F^{-1} is the quantile function (generalized inverse of the CDF). This approach is exact when the inverse exists in closed form but can be computationally intensive otherwise, often requiring numerical inversion techniques.^[62] For the exponential distribution with rate parameter \lambda > 0, whose CDF is F(x) = 1 - e^{-\lambda x} for x \geq 0, the inverse yields the simple transformation X = -\frac{\ln(U)}{\lambda}, providing an efficient way to simulate interarrival times in Poisson processes.^[62] Rejection sampling offers a versatile alternative for distributions where the inverse is unavailable or inefficient, by proposing candidates from an easily sampled distribution g (proposal density) and accepting them with probability proportional to the target density f. Specifically, choose a constant M \geq \sup_x f(x)/g(x), generate X \sim g and U \sim \text{Uniform}(0,1), and accept X if U \leq f(X)/(M g(X)); rejected proposals are discarded and the process repeats until acceptance. This method, originally proposed for generating normal variates, guarantees correct sampling from f at the cost of potential inefficiency if the acceptance rate $1/M is low.^[64] For the standard normal distribution, the Box-Muller transform provides an exact and efficient method using two independent uniform variates: generate U_1, U_2 \sim \text{Uniform}(0,1), then compute Z_1 = \sqrt{-2 \ln U_1} \cos(2\pi U_2) and Z_2 = \sqrt{-2 \ln U_1} \sin(2\pi U_2), yielding a pair of independent standard normal variates. This polar form avoids trigonometric function evaluations in some implementations and is particularly useful for generating Gaussian noise in simulations.^[65]

Fitting Distributions to Data

Fitting a probability distribution to data is a fundamental process in statistics that involves selecting an appropriate distributional form and estimating its parameters to best represent the observed sample. This enables modeling of underlying phenomena, hypothesis testing, and prediction based on empirical evidence. The choice of distribution often relies on domain knowledge or exploratory data analysis, while parameter estimation and validation use rigorous statistical procedures to ensure the fit is reliable and not due to chance.^[66] One of the most widely used methods for parameter estimation is maximum likelihood estimation (MLE), introduced by Ronald A. Fisher in 1922. MLE seeks to find the parameter values \theta that maximize the likelihood function, defined as the product of the probability density (or mass) functions evaluated at the observed data points:

L(\theta) = \prod_{i=1}^n f(x_i \mid \theta),

where f(x_i \mid \theta) is the pdf or pmf of the distribution, and n is the sample size. To simplify computation, especially for large n, the log-likelihood \ell(\theta) = \sum_{i=1}^n \log f(x_i \mid \theta) is maximized instead, as the logarithm is monotonically increasing. Under regularity conditions, MLE estimators are consistent, asymptotically normal, and efficient, making them a cornerstone of modern statistical inference.^[66] An alternative approach is the method of moments, pioneered by Karl Pearson in 1895. This technique equates the population moments (theoretical expectations derived from the distribution) to the corresponding sample moments and solves the resulting system of equations for the parameters. For a distribution with k parameters, the first k moments are typically used; for instance, the first moment sets the sample mean \bar{x} = \mu(\theta), and the second moment sets the sample variance s^2 = \mathbb{E}[(X - \mu)^2]. While simpler to compute than MLE, especially for non-differentiable likelihoods, method-of-moments estimators can be less efficient, particularly for small samples or asymmetric distributions.^[67] After estimating parameters, goodness-of-fit tests assess whether the fitted distribution adequately describes the data. The Kolmogorov-Smirnov (KS) test, developed by Andrey Kolmogorov in 1933 and extended by Nikolai Smirnov in 1939, evaluates the maximum deviation between the empirical cumulative distribution function (ECDF) \hat{F}_n(x) = \frac{1}{n} \sum_{i=1}^n \mathbf{1}\{x_i \leq x\} and the theoretical CDF F(x \mid \hat{\theta}):

D_n = \sup_x |\hat{F}_n(x) - F(x \mid \hat{\theta})|.

The test statistic D_n is compared to critical values from its asymptotic distribution under the null hypothesis of a good fit; small p-values indicate rejection. For discrete or binned data, the chi-squared goodness-of-fit test, proposed by Karl Pearson in 1900, partitions the data into categories and computes

\chi^2 = \sum_{j=1}^m \frac{(O_j - E_j)^2}{E_j},

where O_j are observed frequencies and E_j = n F(k_j \mid \hat{\theta}) - n F(k_{j-1} \mid \hat{\theta}) are expected frequencies under the fitted distribution, with m bins. The statistic follows a chi-squared distribution with m - k - 1 degrees of freedom asymptotically, allowing assessment of fit adequacy. These tests help identify mismatches, such as in tail behavior or multimodality, guiding model refinement.^[68]^[69] A practical example is fitting the normal distribution to data, where MLE provides closed-form solutions: the parameter estimates are the sample mean \hat{\mu} = \bar{x} and the sample variance \hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2. These match the method-of-moments estimates for the mean but differ for the variance (which uses n-1 in the unbiased version). Subsequent KS or chi-squared tests can then verify if the normal assumption holds, as deviations might suggest alternatives like a t-distribution for heavier tails. This approach is common in quality control and finance for modeling symmetric, bell-shaped data.^[66]

Common Distributions and Applications

Bernoulli and Binomial

The Bernoulli distribution models the outcome of a single binary experiment, where success occurs with probability p (and failure with probability $1-p), taking the value 1 for success and 0 for failure.^[70] Named after Swiss mathematician Jacob Bernoulli, it serves as the foundational discrete distribution for binary events.^[2] The probability mass function (PMF) is given by

P(X = x) = p^x (1-p)^{1-x}, \quad x \in \{0, 1\},

where $0 < p < 1.^[71] The expected value (mean) is E[X] = p, and the variance is \operatorname{Var}(X) = p(1-p).^[70] The binomial distribution extends the Bernoulli to the number of successes in n independent Bernoulli trials, each with the same success probability p, and arises as the sum of n independent and identically distributed (i.i.d.) Bernoulli random variables.^[72] Its PMF is

P(K = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0, 1, \dots, n,

where \binom{n}{k} denotes the binomial coefficient.^[70] The mean is E[K] = np and the variance is \operatorname{Var}(K) = np(1-p), reflecting the additive properties of the underlying Bernoullis.^[70] For large n, the binomial distribution approximates a normal distribution with matching mean and variance, enabling simpler computations for probabilities via the central limit theorem when np \geq 10 and n(1-p) \geq 10.^[73] Applications of the Bernoulli and binomial distributions commonly arise in scenarios involving repeated binary outcomes with fixed trials. For instance, the number of heads in n fair coin flips follows a binomial distribution with p = 0.5.^[74] In quality control, it models the count of defective items in a batch of n inspected products, where each has a defect probability p.^[75] The moment-generating function (MGF) of the binomial distribution is

M_K(t) = (1 - p + p e^t)^n,

which facilitates derivation of moments and cumulants.^[61]

Poisson and Exponential

The Poisson distribution models the number of times an independent event occurs within a fixed interval of time or space, particularly when these events are rare and occur with a known constant mean rate. It is a discrete probability distribution defined for non-negative integers k = 0, 1, 2, \dots, with probability mass function

p(k) = \frac{e^{-\lambda} \lambda^k}{k!},

where \lambda > 0 is the rate parameter representing the average number of events in the interval.^[76] The expected value (mean) of a Poisson random variable X is \mathbb{E}[X] = \lambda, and its variance is \mathrm{Var}(X) = \lambda, indicating that the distribution is equidispersed with spread proportional to the mean.^[77] This makes it suitable for approximating scenarios like the number of defects in manufacturing or arrivals at a service point, where events are independent and the probability of more than one event in a small subinterval is negligible.^[76] Closely related, the exponential distribution describes the time between consecutive events in a Poisson process, serving as its continuous counterpart for interarrival times. It is defined for x \geq 0 with probability density function

f(x) = \lambda e^{-\lambda x},

where \lambda > 0 is the rate parameter, and the mean is \mathbb{E}[X] = 1/\lambda.^[78] A defining feature is its memoryless property: the conditional probability that the waiting time exceeds s + t given it has already exceeded s equals the unconditional probability of exceeding t, formally P(X > s + t \mid X > s) = P(X > t) for s, t \geq 0.^[79] This implies that the distribution of remaining time is independent of elapsed time, which distinguishes it from other continuous distributions and facilitates modeling of systems without "wear-out" effects.^[78] The Poisson and exponential distributions are fundamentally linked through the Poisson process, a counting process where the number of events up to time t follows a Poisson distribution with parameter \lambda t, and the interarrival times between events are independent and exponentially distributed with rate \lambda.^[80] In this framework, if N(t) denotes the number of events by time t, then the waiting time until the next event after time s is exponential with the same rate, leveraging the memoryless property to ensure stationarity.^[81] These distributions find wide application in modeling rare or random events across fields. In queueing theory, the Poisson distribution captures customer arrivals at service facilities, while the exponential models service or interarrival times, enabling analysis of system performance like wait times in banks or call centers.^[82] Radioactive decay processes are often Poisson, counting particle emissions over time, with exponential interdecay intervals reflecting the probabilistic nature of atomic instability. Similarly, website traffic can be approximated as Poisson for visitor counts per unit time, aiding in server capacity planning, though real-world deviations may require extensions for burstiness.^[83] The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution that arises frequently in natural and social sciences due to its symmetry and bell-shaped curve. It is parameterized by two values: the mean \mu, which determines the location of the peak, and the variance \sigma^2, which controls the spread. The probability density function (PDF) of a normal random variable X \sim N(\mu, \sigma^2) is given by

f(x) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left( -\frac{(x - \mu)^2}{2\sigma^2} \right),

for x \in (-\infty, \infty). This formula ensures the total probability integrates to 1, with the exponential term creating the characteristic tails that decay rapidly away from \mu. The mean of the distribution is \mu, and the variance is \sigma^2, making it the first distribution encountered in many statistical contexts for modeling symmetric data.^[84] A key property of the normal distribution is the empirical rule, which quantifies the concentration of probability around the mean: approximately 68% of the data falls within one standard deviation (\mu \pm \sigma), 95% within two standard deviations (\mu \pm 2\sigma), and 99.7% within three standard deviations (\mu \pm 3\sigma). This rule stems from the cumulative distribution function and provides a quick way to assess outliers or expected ranges without integration. For instance, in a standard normal distribution (\mu = 0, \sigma = 1), these intervals highlight the distribution's bounded yet infinite support.^[85] Several important distributions are derived from the normal, forming a family used in hypothesis testing and confidence intervals. The chi-squared distribution with k degrees of freedom arises as the sum of squares of k independent standard normal random variables, i.e., if Z_1, \dots, Z_k \sim N(0,1) independently, then \sum_{i=1}^k Z_i^2 \sim \chi^2_k. It is right-skewed, supported on [0, \infty), and serves as a building block for variance estimation. The Student's t-distribution with r degrees of freedom is the ratio of a standard normal to the square root of an independent chi-squared divided by its degrees of freedom: T = Z / \sqrt{U/r}, where Z \sim N(0,1) and U \sim \chi^2_r. It has heavier tails than the normal, approaching normality as r increases, and is crucial for small-sample inference. The F-distribution with parameters m and n is the ratio of two independent chi-squared variables divided by their degrees of freedom: F = (U/m) / (V/n), where U \sim \chi^2_m and V \sim \chi^2_n; it is also right-skewed and used to compare variances.^[86]^[87]^[88] The central limit theorem (CLT) underpins the ubiquity of the normal distribution: for independent and identically distributed (i.i.d.) random variables X_1, \dots, X_n with finite mean \mu and variance \sigma^2 > 0, the standardized sum ( \sum_{i=1}^n X_i - n\mu ) / (\sigma \sqrt{n}) converges in distribution to a standard normal as n \to \infty. This convergence holds regardless of the underlying distribution of the X_i, explaining why sample means often approximate normality even for non-normal data. The theorem's proof relies on characteristic functions or moment-generating functions, but its practical impact is in enabling normal approximations for large samples.^[89] Applications of the normal distribution abound in fields requiring symmetric modeling. In measurement sciences, it describes errors in instruments or observations, assuming deviations are equally likely positive or negative with constant variance, as seen in precision engineering contexts. IQ scores are standardized to follow N(100, 15^2), allowing percentile rankings via z-scores for about 68% of scores between 85 and 115. In finance, daily or monthly portfolio returns are often approximated as normal for risk assessment, though real data show fatter tails; this assumption facilitates value-at-risk calculations and option pricing under the Black-Scholes model.^[17]^[85]^[90]

Uniform and Others

The continuous uniform distribution models scenarios where all outcomes within a finite interval are equally likely. Its probability density function is given by f(x) = \frac{1}{b - a}, \quad a \leq x \leq b, where a and b are the lower and upper bounds of the interval, respectively. The mean is \mu = \frac{a + b}{2} and the variance is \sigma^2 = \frac{(b - a)^2}{12}.^[91] This distribution is foundational for assuming uniformity in bounded continuous spaces, such as modeling random selections from a fixed range. A key application of the uniform distribution lies in random number generation and Monte Carlo simulations, where it serves as the basis for sampling to approximate integrals, estimate expectations, or model uncertainty in computational experiments.^[92]^[93] The gamma distribution generalizes waiting time models beyond the exponential case, with shape parameter \alpha > 0 and rate parameter \beta > 0. Its probability density function is f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta x}, \quad x > 0, yielding mean \mu = \frac{\alpha}{\beta} and variance \sigma^2 = \frac{\alpha}{\beta^2}.^[94] When \alpha = 1, it reduces to the exponential distribution. It is particularly useful for modeling aggregate waiting times, such as the total time until multiple events occur in a Poisson process.^[95] The beta distribution is defined on the interval [0, 1] and is ideal for modeling proportions or probabilities. With shape parameters \alpha > 0 and \beta > 0, its probability density function is f(x) = \frac{1}{B(\alpha, \beta)} x^{\alpha - 1} (1 - x)^{\beta - 1}, \quad 0 < x < 1, where B(\alpha, \beta) is the beta function; the mean is \mu = \frac{\alpha}{\alpha + \beta} and variance is \sigma^2 = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}.^[96] In Bayesian statistics, it acts as a conjugate prior for binomial likelihoods, enabling efficient posterior updates for parameters representing success probabilities.^[97] The log-normal distribution arises in contexts of multiplicative processes, where the logarithm of the variable follows a normal distribution. If Y = e^X with X \sim \mathcal{N}(\mu, \sigma^2), then Y is log-normal with parameters \mu and \sigma^2, mean e^{\mu + \sigma^2 / 2}, and variance e^{2\mu + \sigma^2} (e^{\sigma^2} - 1). It commonly models phenomena like stock prices, where returns accumulate multiplicatively over time.^[98]

References

[1]
1.3.6.1. What is a Probability Distribution
Discrete Distributions, The mathematical definition of a discrete probability function, p(x), is a function that satisfies the following properties.
[2]
Probability distributions - Foundations in Data Science
Types of Probability Distributions. There are two main types of probability distributions: discrete and continuous. Discrete probability distributions apply ...
[3]
None
### Summary of Important Probability Distributions
[4]
[PDF] 3 Probability Distributions
The family of exponential distributions provides probability models that are very widely used in engineering and science disciplines to describe time-to-event ...
[5]
Probability Distributome: A Web Computational Infrastructure for ...
Probability distributions are useful for modeling, simulation, analysis, and inference on varieties of natural processes and physical phenomena.Missing: credible | Show results with:credible
[6]
[PDF] FERMAT AND PASCAL ON PROBABILITY - University of York
The problem was proposed to Pascal and Fermat, probably in 1654, by the Chevalier de. Méré, a gambler who is said to have had unusual ability “even for the ...
[7]
Foundations of the Theory of Probability : A. N. Kolmogorov
Dec 14, 2021 · The purpose of this monograph is to give an axiomatic foundation for the theory of probability. The author set himself the task of putting in their natural ...
[8]
Current and Emerging Research Opportunities in Probability
The application of probability to finance has revolutionized an industry. In ... sources had led to increased opportunities for the application of probability ...Missing: credible | Show results with:credible
[9]
[PDF] Machine Learning: A Probabilistic Perspective. - Noiselab@UCSD
This book, 'Machine Learning: A Probabilistic Perspective', covers supervised and unsupervised learning, and basic concepts in machine learning.
[10]
Foundations of the theory of probability : Kolmogorov, A. N
Apr 26, 2013 · download 1 file · PDF download ... download 1 file · SINGLE PAGE ORIGINAL JP2 TAR download ... download 1 file · SINGLE PAGE PROCESSED JP2 ZIP ...
[11]
[PDF] Probability Theory
Definition: Discrete Random Variable. A random variable X : Ω → S is said to be discrete if S is finite or countable. If X : Ω → S is discrete, then the ...
[12]
Lesson 7: Discrete Random Variables | STAT 414
To learn the formal definition of a discrete random variable. To learn the formal definition of a discrete probability mass function.
[13]
[PDF] Review of Probability Theory - CS229
In the case of discrete random variable, we use the notation V al(X) for the set of possible values that the random variable X may assume. For example, if X(ω) ...<|control11|><|separator|>
[14]
[PDF] Review of Random Variables
Support: Definition. • Definition: The support of a random variable X is defined as the set of numbers that are possible values of the random variable.
[15]
[PDF] Chapter 3 Review of Probability - Statistics & Data Science
For any random variable, the term support is used to refer to the set of possible real numbers defined by the mapping from the physical experimental outcomes to ...
[16]
[PDF] Common Probability Distributions
In general, the CDF can take any form as long as it defines a valid probability statement, such that. 0 ≤ F(x) ≤ 1 for any x ∈ S and F(a) ≤ F(b) for all a ≤ b.
[17]
The Normal Distribution - Utah State University
A normal distribution has two parameters, the mean μ μ , and the variance σ2 σ 2 . The mean can be any real number and the variance can be any non-negative ...
[18]
7.2 - Probability Mass Functions | STAT 414 - STAT ONLINE
The probability mass function, P ( X = x ) = f ( x ) , of a discrete random variable X is a function that satisfies the following properties:.
[19]
[PDF] Mass Functions and Density Functions - Arizona Math
Definition 2. The (probability) mass function of a discrete random variable X is fX(x) = P{X = x}. The mass function has two basic properties: • fX(x) ≥ 0 for ...
[20]
[PDF] Continuous and Absolutely Continuous Random Variables
Definition: A random variable X is continuous if Pr(X=x) = 0 for all x. Definition: A random variable X is absolutely continuous if there exists a function f(x ...
[21]
[PDF] 6.042J Chapter 18: Expectation - MIT OpenCourseWare
Roughly, the expectation is the average value of the random variable where each value is weighted according to its probability. Formally, the expected value ( ...
[22]
[PDF] Expectation & Variance 1 Expectation - cs.Princeton
May 5, 2025 · The expectation of a random variable is its average value, where each value is weighted according to the probability that it comes up.
[23]
8.4 - Variance of X | STAT 414 - STAT ONLINE
The expected variation in the random variable, as quantified by its variance and standard deviation, is much larger than the expected variation in the random ...
[24]
[PDF] 18.05 S22 Reading 5a: Variance of Discrete Random Variables
Definition: If X is a random variable with mean E[X] = 𝜇, then the variance of X is defined by. Var(X) = E[(X − 𝜇)2]. 1. Page 2. 2. 18.05 Class 5, Variance of ...
[25]
[PDF] Independent Random Variables
Jul 24, 2017 · Symmetry of Independence. Independence is symmetric. That means that if random variables X and Y are independent, X is independent of Y and Y ...
[26]
Lesson 24: Several Independent Random Variables | STAT 414
And, since X ¯ , as defined above, is a function of those independent random variables, it too must be a random variable with a certain probability distribution ...
[27]
[PDF] Random Variables and Probability Distributions - Kosuke Imai
Feb 22, 2006 · A distribution function. F(x) of a random variable X satisfies the following properties. 1. limx→−∞ F(x)=0 and limx→∞ F(x)=1. 2. F is increasing ...
[28]
[PDF] Topic 7 Random Variables and Distribution Functions - Arizona Math
A distribution function FX has the property that it is right continuous, starts at 0, ends at 1, and does not decrease with increasing values of x. In ...<|control11|><|separator|>
[29]
[PDF] Lecture 3: Random Variables & CDFs
Jan 28, 2020 · From Proposition 3.1 and Carathéodory's extension theorem it follows that the CDF FX uniquely defines. PX the probability measure induced by X.
[30]
[PDF] Econ 2110, fall 2016, Part Ib Review of Probability Theory
▷ The cumulative distribution function (CDF) of a random variable X is ... ! ▷ The CDF uniquely determines the probability measure PX. 16 / 55. Page 17 ...
[31]
14.1 - Probability Density Functions | STAT 414 - STAT ONLINE
A probability density function (p.d.f.) is a curve for continuous random variables, where the area under the curve represents probability. It is an integrable ...
[32]
[PDF] STA 611: Introduction to Mathematical Statistics Lecture 3 - Stat@Duke
The Cumulative distribution function (cdf) of a random variable X is. F(x) ... We define the quantile function of X as. F. −1. (p) = the smallest x such ...
[33]
[PDF] 23.0 Survival Analysis - Stat@Duke
The survival function is S(t)=1 − F(t), or the probability that a person or machine or a business lasts longer than t time units. Here F(t) is the usual.
[34]
Continuous Random Variables - Probability - Utah State University
→ The support of a random variable consists of the interval or intervals where the density function is non-zero. Example: Log Ride. Suppose the distribution of ...
[35]
Continuous random variable | Definition, examples, explanation
A random variable X is said to be continuous if and only if the probability that it will belong to an interval $left[ a,b ight] $ can be expressed as an ...<|control11|><|separator|>
[36]
3. Continuous Distributions - Random Services
If c ∈ ( 0 , ∞ ) then f defined by f ( x ) = 1 c g ( x ) for x ∈ S defines a probability density function for an absolutely continuous distribution on ( S , S ) ...
[37]
Uniform Distribution | Definition - Probability Course
In particular, we have the following definition: A continuous random variable X is said to have a Uniform distribution over the interval [a,b], shown as X∼ ...
[38]
Exponential Distribution | Definition | Memoryless Random Variable
The exponential distribution is one of the widely used continuous distributions. It is often used to model the time elapsed between events.
[39]
[PDF] 18.05 S22 Reading 4a: Discrete Random Variables
Definition: The probability mass function (pmf) of a discrete random variable is the function 𝑝(𝑎) = 𝑃 (𝑋 = 𝑎). Note: 1. We always have 0 ≤ 𝑝(𝑎) ≤ 1.
[40]
12.3 - Poisson Properties | STAT 414 - STAT ONLINE
Theorem. The probability mass function: f ( x ) = e − λ λ x x ! for a Poisson random variable is a valid p.m.f.<|control11|><|separator|>
[41]
[PDF] Our Friends: Transforms - Columbia University
Given a random variable X with a probability mass function ... the probability generating function of X (really of its probability distribution) is the generating.
[42]
[PDF] Probability density functions - Properties - MIT OpenCourseWare
Properties. Examples. • Expectation and its properties. The expected value rule. Linearity. • Variance and its properties. • Uniform and exponential random ...
[43]
[PDF] Continuous Distributions - Stanford University
Jul 17, 2017 · To preserve the axioms that guarantee P(a ≤ X ≤ b) is a probability, the following properties must also hold: 0 ≤ P(a ≤ X ≤ b) ≤ 1. P(−∞ < X < ∞) ...
[44]
[PDF] A Summary of Random Variables1 - USC Dornsife
• a continuous random variable ξ is called absolutely continuous if there exists an non-negative. integrable function fξ = fξ(x), called the probability ...
[45]
[PDF] Theory of Probability - University of Texas at Austin
Definition 6.13 (Singular distributions) A distribution which has no atoms and is singular with respect to the Lebesgue measure is called singular. Example ...
[46]
[PDF] Unit 6: Distribution Functions
Example of a singular continuous distribution. • The Cantor distribution on [0,1] supported on the Cantor middle third. Its. CDF is called the Cantor staircase.
[47]
[PDF] Probability Models with Discrete and Continuous Parts
Feb 21, 2022 · It is singular continuous, because it assigns probability one to the Cantor set, which has Lebesgue measure, i.e. length, zero. The devil's ...
[48]
Random vectors - StatLect
Definition A random vector is continuous (or absolutely continuous) if and only if. its support is not countable; there is a function , called the joint ...
[49]
Marginal probability density function | Definition, derivation, examples
This is called marginal probability density function, to distinguish it from the joint probability density function, which depicts the multivariate distribution ...Missing: cumulative | Show results with:cumulative
[50]
Independent random variables - StatLect
Y to be independent, their joint distribution function must be equal to the product of their marginal distribution functions: [eq74]. Exercise 2. Let [eq75] ...
[51]
Multivariate normal distribution | Properties, proofs, exercises
In its general form, it describes the joint distribution of a random vector that can be represented as a linear transformation of a standard MV-N vector. The ...Missing: cumulative | Show results with:cumulative
[52]
[PDF] An Introduction to Copulas
Sep 12, 2005 · Copulas are functions that enable us to separate the marginal distributions from the dependency structure of a given multivariate distribution.
[53]
[PDF] Kolmogorov and Probability Theory - CORE
More precisely, a probability space is a measure space with total mass equal to one and a random variable is a real-valued measurable function: (i) A ...
[54]
[PDF] FOUNDATIONS THEORY OF PROBABILITY - University of York
THEORY OF PROBABILITY. BY. A.N. KOLMOGOROV. Second English Edition. TRANSLATION EDITED BY. NATHAN MORRISON. WITH AN ADDED BIBLIOGRPAHY BY. A.T. BHARUCHA-REID.
[55]
[PDF] Moment Generating Functions - MIT OpenCourseWare
Moment generating functions, and their close relatives (probability gener- ating functions and characteristic functions) provide an alternative way of rep-.Missing: applications | Show results with:applications
[56]
Moment generating function | Definition, properties, examples
The moment generating function (mgf) is a function often used to characterize the distribution of a random variable.Definition · Deriving moments with the mgf · Characterization of a... · More details
[57]
Lesson 9: Moment Generating Functions | STAT 414 - STAT ONLINE
Special functions, called moment-generating functions can sometimes make finding the mean and variance of a random variable simpler. In this lesson, we'll first ...
[58]
[PDF] Non- Uni form - Random Variate Generation - FSU Computer Science
Library of Congress Cataloging in Publication Data. Devroye, Luc. Non-uniform random variate generation. Bibliography: p. Includes index. 1. Random variables.
[59]
[PDF] Mersenne Twister: A 623-dimensionally equidistributed uniform ...
In this paper, a new algorithm named Mersenne Twister (MT) for generating uniform pseudoran- dom numbers is proposed. For a particular choice of parameters, ...
[60]
[PDF] Various Techniques Used in Connection With Random Digits - MCNP
Various Techniques Used in Connection With. Random Digits. By John von Neumann. SUJnJnary written by George E. Forsythe. In manual computing methods today ...
[61]
A Note on the Generation of Random Normal Deviates - Project Euclid
June, 1958 A Note on the Generation of Random Normal Deviates. G. E. P. Box, Mervin E. Muller · DOWNLOAD PDF + SAVE TO MY LIBRARY. Ann. Math. Statist.
[62]
On the mathematical foundations of theoretical statistics - Journals
Fisher R. A.. 1922On the mathematical foundations of theoretical statisticsPhilosophical Transactions of the Royal Society of London. Series A, Containing ...
[63]
[PDF] Contributions to the Mathematical Theory of Evolution. II. Skew ...
By KARL PEARSON, University College, London. Communicated by Professor HENRhcI, F.R.S.. Received December 19, 1894,-Read January 24, 1895. PLATES 7-16 ...
[64]
Kolmogorov, A. (1933) Sulla determinazione empirica di una legge ...
Jul 25, 2019 · Kolmogorov, A. (1933) Sulla determinazione empirica di una legge di distribuzione. Giornale dell'Istituto Italiano degli Attuari, 4, 83-91.
[65]
The significance probability of the smirnov two-sample test
In 1939 N. V. Smirnov proposed the following rank-order test for the two-sample problem. Let x 1 .... , xm and Yl ..... Yn be samples of independent ...
[66]
[PDF] Bernoulli and Binomial Random Variables
Jul 10, 2017 · A Bernoulli random variable is the simplest kind of random variable. It can take on two values,. 1 and 0. It takes on a 1 if an experiment with ...
[67]
[PDF] ECE 302: Lecture 3.6 Bernoulli Random Variables
Definition. Let X be a Bernoulli random variable. Then, the PMF of X is. pX (0) = 1 − p, pX (1) = p, where 0 < p < 1 is called the Bernoulli parameter. We ...<|separator|>
[68]
Bernoulli & Binomial Random Variables - Data Science Discovery
In other words, the Binomial Distribution is the sum of n independent Bernoulli random variables. Just like a Bernoulli random variable, random variables ...
[69]
28.1 - Normal Approximation to Binomial | STAT 414
We will now focus on using the normal distribution to approximate binomial probabilities. The Central Limit Theorem is the tool that allows us to do so.
[70]
[PDF] Bernoulli Experiments, Binomial Distribution
Flip a coin 12 times, count the number of heads. Here n = 12. Each flip is a trial. It is reasonable to assume the trials are independent. Each trial has two ...
[71]
[PDF] Lecture 9: Commonly Used Distributions - Columbia University
. Examples: ◦ Flipping a fair coin and X = 1 represents the heads. p = 1/2. ◦ Quality control: Defective/Good. ◦ Clinical trial: Survival/death.
[72]
[PDF] Chapter 4 The Poisson Distribution
The mean of the Poisson is its parameter θ; i.e. µ = θ. This can be proven using calculus and a similar argument shows that the variance of a Poisson is ...
[73]
Lesson 12: The Poisson Distribution - STAT ONLINE
To explore the key properties, such as the moment-generating function, mean and variance, of a Poisson random variable. ... 12.2 - Finding Poisson Probabilities.
[74]
[PDF] 1 Review of the exponential distribution
As n → ∞, Yn. converges in distribution to a r.v. Y having the exponential distribution with rate λ (we use. the tail probabilities): P(Yn > x) = P(Kn > nx)
[75]
[PDF] Theorem The exponential distribution has the memoryless ...
The exponential distribution has the memoryless property, meaning P(X>s + t | X>t) = P(X>s) or P(X>s + t) = P(X>s)P(X>t).
[76]
[PDF] 1 IEOR 6711: Notes on the Poisson Process
But since the interarrival times have an exponential distribution, they have the memoryless property and thus your waiting time, A(s) = tN(s)+1 − s, until the ...
[77]
[PDF] ECE 302: Lecture 4.6 Exponential Random Variable
An exponential random variable is the inter-arrival time between two consecutive Poisson events. That is, how much time it takes to go from N Poisson counts to.
[78]
[PDF] queueing theory with applications and special consideration to ...
What are examples of Poisson processes? Radioactive decay is one example. If you take the transformation of one of the atoms in the radioactive sample as an ...
[79]
[PDF] Stat03.wxmx Poisson(λ) Distribution - CSULB
Feb 23, 2024 · It is computed using the formula μ=Σx P(x). The variance σ^2 and standard deviation σ of a discrete random variable X are numbers that indicate ...
[80]
Normal Distribution | Gaussian | Normal random variables | PDF
If Z is a standard normal random variable and X=σZ+μ, then X is a normal random variable with mean μ and variance σ2, i.e, X∼N(μ,σ2). Conversely, if X∼N(μ ...Missing: authoritative | Show results with:authoritative
[81]
2.2.7 - The Empirical Rule | STAT 200
The 95% Rule states that approximately 95% of observations fall within two standard deviations of the mean on a normal distribution.
[82]
[PDF] I. Chi-squared Distributions
Definition: The chi-squared distribution with k degrees of freedom is the distribution of a random variable that is the sum of the squares of k independent.
[83]
26.4 - Student's t Distribution | STAT 414 - STAT ONLINE
Definition. If Z ∼ N ( 0 , 1 ) and U ∼ χ 2 ( r ) are independent, then the random variable: T = Z U / r. follows a t -distribution with r degrees of freedom ...
[84]
4.2 - The F-Distribution | STAT 415
The confidence interval for the ratio of two variances requires the use of the probability distribution known as the F-distribution.
[85]
[PDF] Central Limit Theorem
Aug 7, 2017 · In summary, the central limit theorem explains that both the average of IID random variables and the sum of IID random variables are normal.
[86]
[PDF] Chapter 1 Asset Returns - Princeton University
The histograms also show that the distribution for the monthly returns is closer to a normal distribution than those for the weekly returns and the daily ...
[87]
14.6 - Uniform Distributions | STAT 414 - STAT ONLINE
A continuous random variable has a uniform distribution if its probability density function is defined by two constants, and the most common case is when and.
[88]
1.3.6.6.2. Uniform Distribution
One of the most important applications of the uniform distribution is in the generation of random numbers. That is, almost all random number generators generate ...
[89]
[PDF] 33. MONTE CARLO TECHNIQUES
Sampling the uniform distribution. Most Monte Carlo sampling or integration techniques assume a “random number generator,” which generates uniform ...
[90]
8.1.6.5. Gamma - Information Technology Laboratory
The following plots give examples of gamma PDF, CDF and failure rate shapes. Shapes for gamma data, Plot of gamma PDF's with different shape parameters. Gamma ...
[91]
[PDF] Parameter Estimation Fitting Probability Distributions Method of ...
Gamma Distribution as Sum of IID Random Variables. The Gamma distribution models the total waiting time for k successive events where each event has a waiting ...
[92]
1.3.6.6.17. Beta Distribution - Information Technology Laboratory
The beta function has the formula. B ( α , β ) = ∫ 0 1 t α − 1 ( 1 − t ) β − 1 d t. The case where a = 0 and b = 1 is called the standard beta distribution.
[93]
[PDF] STAT 535: Chapter 3: The Beta-Binomial Bayesian Model
The Beta-Binomial Bayesian model uses a prior distribution for parameters, and the posterior distribution is calculated using Bayes' rule. The prior for π is a ...
[94]
[PDF] Lognormal Model for Stock Prices - UCSD Math
I . What follows is a simple but important model that will be the basis for a later study of stock prices as a geometric Brownian motion.

Probability distribution

Fundamentals

Introduction

Definition

Terminology

Cumulative Distribution Function

Properties

Relation to Other Functions

Discrete Probability Distributions

Definition and Examples

Probability Mass Function

Continuous Probability Distributions

Definition and Examples

Probability Density Function

Other Types of Distributions

Singular Distributions

Multivariate Distributions

Advanced Characterizations

Kolmogorov Axioms

Moment-Generating Functions

Computation and Generation

Random Number Generation

Fitting Distributions to Data

Common Distributions and Applications

Bernoulli and Binomial

Poisson and Exponential

Normal and Related

Uniform and Others

References