Fact-checked by Grok 2 weeks ago

Probability mass function

A probability mass function (PMF), also known as a probability function or function, is a mathematical that describes the of a random variable by assigning a non-negative probability to each possible value that the variable can take. For a random variable X taking values in a , the PMF is typically denoted as p_X(x) = P(X = x), where P(X = x) represents the probability that X equals exactly x. The PMF must satisfy two fundamental properties: first, p_X(x) \geq 0 for all x in the of X, ensuring probabilities are non-negative; second, the sum of p_X(x) over all possible values x equals 1, \sum p_X(x) = 1, which guarantees that the total probability is conserved. These properties make the PMF a valid for outcomes, fully characterizing the distribution and enabling the computation of expected values, variances, and other statistical moments. In contrast to the used for continuous random variables, the PMF provides the actual probability mass at discrete points rather than a density over an , and the derived from the PMF is a that jumps by p_X(x) at each point x. Common examples include the PMF of the , which models the number of successes in fixed trials, and the , which describes the number of events in a fixed . The concept is central to and finds applications in fields such as , , and for modeling countable phenomena.

Fundamentals

Definition

In , the probability mass function (PMF) of a discrete random variable X is defined as the p_X: S \to [0,1] that assigns to each possible outcome x in the support set S the probability p_X(x) = P(X = x), where S is the countable consisting of all values that X can take with positive probability. A fundamental requirement of the PMF is that the probabilities over the entire sum to unity: \sum_{x \in S} p_X(x) = 1. This normalization ensures that the PMF fully describes the of X. The PMF applies specifically to random variables, where probabilities are concentrated at isolated points in the S, in contrast to continuous random variables that require probability functions to integrate over intervals. The S is typically finite or countably infinite and comprises exactly those points where p_X(x) > 0.

Properties

The probability mass function (PMF) of a X, denoted p_X(x), must satisfy non-negativity, meaning p_X(x) \geq 0 for all x in the state space S, as probabilities cannot be negative by the axioms of . This ensures that the assigned probabilities represent valid measures of likelihood. Additionally, each individual probability satisfies $0 \leq p_X(x) \leq 1, since no single event can have a probability exceeding the total probability of the . The normalization property requires that \sum_{x \in S} p_X(x) = 1, reflecting the fact that the events \{X = x\} for x \in S form a of the , and by the of total probability (or countable additivity for infinite supports), their probabilities sum to the probability of the entire space, which is 1. This condition guarantees that the PMF fully accounts for all possible outcomes without overlap or omission. For a given discrete random variable, the PMF is uniquely determined, as it is defined directly by p_X(x) = P(X = x) for each x, and the probabilities P(X = x) are uniquely specified by the underlying . The effective support of the PMF is the set \{x \in S \mid p_X(x) > 0\}, which identifies the values that the random variable can actually attain with positive probability, while p_X(x) = 0 for all other x \in S. This support set encapsulates the possible realizations of X under the given distribution. The expected value of X can be computed using the PMF as E[X] = \sum_{x \in S} x \, p_X(x), leveraging the normalization property.

Relationships

Cumulative distribution function

The cumulative distribution function (CDF) of a discrete random variable X with probability mass function p_X and support S is defined as F_X(x) = P(X \leq x) = \sum_{\substack{y \leq x \\ y \in S}} p_X(y), where the sum accumulates the probabilities assigned by the PMF up to and including x. This function maps any x to the [0, 1], representing the total probability that X takes a value less than or equal to x. For discrete random variables, the CDF exhibits a step-function form, remaining constant between points in the S and featuring discontinuous jumps precisely at those points where p_X(y) > 0. The magnitude of each jump at a point y \in S equals p_X(y), reflecting the probability concentrated there, while the function is right-continuous at every point. This stepwise increase ensures that F_X(x) approaches 1 as x tends to and starts at 0 for x less than the smallest of S. The PMF can be recovered from the CDF through the relation p_X(x) = F_X(x) - F_X(x^-), where F_X(x^-) denotes the left-hand limit of the CDF at x, capturing the size of the jump at x. This difference directly corresponds to the increments in the CDF, as each PMF value p_X(x) quantifies the vertical rise at that point, allowing the original discrete distribution to be reconstructed solely from the cumulative form.

Probability density function

In contrast to the probability mass function (PMF) for discrete random variables, the probability density function (PDF), denoted f_X(x), describes the probability distribution of a continuous random variable X. It is a nonnegative integrable function defined over the support of X such that the integral over the entire support equals 1, i.e., \int_{-\infty}^{\infty} f_X(x) \, dx = 1. The probability that X falls within an interval (a, b) is given by the area under the PDF curve over that interval: P(a < X < b) = \int_a^b f_X(x) \, dx. A fundamental distinction between the PMF and PDF lies in their interpretation and normalization. The PMF p_X(x) directly assigns probabilities to discrete points, summing to 1 across all possible outcomes: \sum_x p_X(x) = 1, with each p_X(x) representing P(X = x) \leq 1. In contrast, the PDF provides a density rather than probabilities at points, integrating to 1 over the continuous domain, and its values f_X(x) can exceed 1, as they measure relative likelihood per unit interval rather than absolute probability. This density-based approach reflects the infinite divisibility of continuous spaces, where probabilities are accumulated over intervals rather than assigned to isolated values. For continuous random variables governed by a PDF, the probability of the variable taking any exact single value is zero: P(X = x) = 0 for any specific x, because the integral over an infinitesimally small interval around x approaches zero. This property underscores the inapplicability of PMFs to continuous cases, as no finite probability can be assigned to points without violating the total probability measure. In certain limiting scenarios, discrete distributions described by PMFs can approximate continuous ones via PDFs. For instance, as the number of trials n in a grows large while the success probability p is fixed, the PMF converges to the PDF of a by the , enabling the use of continuous approximations for large-scale discrete processes. The serves as a unifying framework that accommodates both PMFs and PDFs, defining F_X(x) = P(X \leq x) for either type of random variable.

Examples

Finite support

A probability mass function (PMF) with finite support is defined over a discrete random variable that can take only a finite number of possible values, making it straightforward to enumerate all probabilities directly. These distributions are fundamental in modeling scenarios with limited outcomes, such as coin flips or dice rolls, where the total probability sums to 1 across the support. The Bernoulli distribution is the simplest example, representing a single binary trial with outcomes 0 (failure) or 1 (success), parameterized by the success probability p \in [0,1]. Its PMF is given by p_X(x) = p^x (1-p)^{1-x}, \quad x = 0,1. This distribution interprets real-world events like a coin landing heads (success) with probability p = 0.5, where the PMF assigns p to x=1 and $1-p to x=0. The binomial distribution extends the Bernoulli to n independent trials, counting the number of successes k = 0, 1, \dots, n, with the same success probability p. Its PMF is p_X(k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0,1,\dots,n, where \binom{n}{k} is the . This arises as the sum of n independent , modeling aggregates like the number of heads in n coin flips. The discrete uniform distribution applies when all outcomes in a finite set S with |S| = m elements are equally likely, parameterized by the support size. Its PMF is p_X(x) = \frac{1}{m}, \quad x \in S. This models fair dice or random selection from a finite list without bias, ensuring uniform probability across the support. The multinomial distribution generalizes the binomial to r categories over n trials, with probabilities p_1, \dots, p_r summing to 1, assigning counts (k_1, \dots, k_r) with \sum k_i = n. Its PMF is p_{\mathbf{X}}(\mathbf{k}) = \frac{n!}{k_1! \cdots k_r!} p_1^{k_1} \cdots p_r^{k_r}, focusing on finite categorical outcomes like distributing n items into r bins.

Infinite support

Discrete probability distributions with infinite support assign positive probabilities to a countably infinite set of outcomes, typically non-negative integers, while ensuring the total probability sums to 1 over all possible values. This requires the infinite series \sum_{k=0}^{\infty} p_X(k) = 1, where the probabilities p_X(k) decrease sufficiently rapidly to converge. Such distributions model phenomena with no upper bound on outcomes, like the number of occurrences in an unbounded time frame. The Poisson distribution is a canonical example, with probability mass function p_X(k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \dots where \lambda > 0 is the rate parameter, representing the average number of events per interval. Its is \lambda, and it applies to modeling rare events or counts, such as radioactive decays or arrivals in a . The infinite sum converges due to the factorial growth in the denominator overpowering the in the numerator. The geometric distribution describes the number of trials until the first success in independent Bernoulli trials with success probability p \in (0,1]. Its PMF is p_X(k) = (1-p)^{k-1} p, \quad k = 1, 2, 3, \dots (or shifted to start at k=0 for failures before success), with mean (1-p)/p. It models waiting times, like the number of coin flips until heads. Convergence of the sum to 1 follows from the geometric series formula. The negative binomial distribution generalizes the geometric to the number of trials until the r-th success, where r is a positive integer. The PMF is p_X(k) = \binom{k-1}{r-1} p^r (1-p)^{k-r}, \quad k = r, r+1, r+2, \dots with mean r(1-p)/p. It extends waiting time models to multiple successes, such as defect counts in quality control until a fixed number of inspections. The probabilities sum to 1 via the negative binomial series expansion.

Extensions

Multivariate case

In the multivariate case, the probability mass function is extended to describe the joint distribution of multiple random variables. For two random variables X and Y taking values in countable sets, the probability mass function is defined as p_{X,Y}(x,y) = P(X = x, Y = y) for each pair (x, y) in the support, where p_{X,Y}(x,y) \geq 0 and the normalization condition \sum_x \sum_y p_{X,Y}(x,y) = 1 holds to ensure the probabilities sum to unity. This bivariate formulation serves as the foundation for higher-dimensional cases, where the PMF for a \mathbf{X} = (X_1, \dots, X_n) is p_{\mathbf{X}}(\mathbf{x}) = P(X_1 = x_1, \dots, X_n = x_n), satisfying \sum_{\mathbf{x}} p_{\mathbf{X}}(\mathbf{x}) = 1. Marginal probability mass functions are derived from the joint PMF by summing over the unwanted variables. Specifically, the marginal PMF of X is p_X(x) = \sum_y p_{X,Y}(x,y), where the sum is over all possible values of Y, and likewise p_Y(y) = \sum_x p_{X,Y}(x,y). In the general multivariate setting, the marginal PMF for any subset of variables is obtained by summing the joint PMF over the complementary variables, preserving the univariate properties as a special case. Conditional probability mass functions capture dependencies between variables. The conditional PMF of Y given X = x is given by p_{Y|X}(y|x) = \frac{p_{X,Y}(x,y)}{p_X(x)} whenever p_X(x) > 0, and it satisfies the properties of a valid PMF for each fixed x. This definition extends to multivariate conditionals, such as p_{Y,Z|X}(y,z|x) = \frac{p_{X,Y,Z}(x,y,z)}{p_X(x)} for p_X(x) > 0, allowing analysis of partial dependencies in higher dimensions. Independence in the multivariate context implies that the joint PMF factors into the product of marginal PMFs. For X and Y, they are if and only if p_{X,Y}(x,y) = p_X(x) p_Y(y) for all x, y. More generally, for \mathbf{X} = (X_1, \dots, X_n), mutual holds if p_{\mathbf{X}}(\mathbf{x}) = \prod_{i=1}^n p_{X_i}(x_i) for all \mathbf{x}, simplifying computations and modeling in applications like .

Measure-theoretic formulation

In measure theory, the probability mass function arises as part of the rigorous treatment of discrete random variables on a . A probability space is defined as a triple (\Omega, \mathcal{F}, P), where \Omega is a countable , \mathcal{F} is the power set of \Omega (which is a \sigma-algebra since \Omega is countable), and P: \mathcal{F} \to [0,1] is a satisfying P(\Omega) = 1 and countable additivity. A X on this space is a X: \Omega \to S, where S is a serving as the or support. This X induces a \mu (also denoted P_X) on the measurable space (S, \mathcal{P}(S)), where \mathcal{P}(S) is the power set of S, via the construction: for any A \subseteq S, \mu(A) = P(X^{-1}(A)). The induced measure \mu assigns point masses to singletons in S, with the probability mass function p_X defined by p_X(x) = \mu(\{x\}) = P(X^{-1}(\{x\})) = P(X = x) for each x \in S. These point masses satisfy the normalization condition \sum_{x \in S} p_X(x) = \mu(S) = 1, ensuring \mu is a . Equivalently, p_X serves as the Radon-Nikodym derivative of \mu with respect to the on S, which assigns mass equal to the of finite sets and otherwise. This formulation extends naturally to cases with infinite support, where S is countably infinite; here, the counting measure on S is \sigma-finite (as it is a countable union of finite-mass sets), and the induced \mu remains a finite probability measure with the same point-mass structure, provided the series \sum p_X(x) converges to 1.