Marginal distribution
In probability theory and statistics, a marginal distribution is the probability distribution of a single random variable or a subset of random variables derived from a joint probability distribution by summing or integrating out the probabilities of the other variables.[1] This process effectively ignores the dependencies on the excluded variables, providing the unconditional probability distribution for the variable(s) of interest.[2] For discrete random variables, the marginal probability mass function is obtained by summing the joint probabilities over all possible values of the other variables, as in f_X(x) = \sum_y f_{X,Y}(x,y).[1] In the continuous case, the marginal probability density function results from integrating the joint density, such as f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) \, dy.[3] Marginal distributions are fundamental in multivariate analysis because they allow researchers to focus on individual variables without considering interactions, which is essential for tasks like hypothesis testing, simulation, and model simplification.[4] For instance, in a bivariate joint distribution table, the marginal probabilities appear in the row and column totals, representing the distributions of each variable alone.[5] They differ from conditional distributions, which account for the value of another variable, and from the full joint distribution, which captures all interdependencies.[6] Understanding marginals is crucial in fields like Bayesian inference, where summing out latent variables yields predictive distributions, and in machine learning for marginalizing over hidden states in probabilistic models.[7] The concept extends to higher dimensions, where marginalizing over multiple variables produces the distribution for any subset, facilitating dimensionality reduction in complex datasets.[8] Properties of marginal distributions include preserving certain moments (like means and variances under independence) and enabling the calculation of expectations via iterated integrals, as per the law of total expectation.[4] In practice, computing marginals analytically is straightforward for simple cases but often requires numerical methods or approximations for high-dimensional or non-standard distributions.[3]Definition
General concept
In probability theory and statistics, a marginal distribution is the probability distribution of one or more random variables from a larger set, derived from their joint distribution by summing or integrating out the probabilities associated with the remaining variables.[9] This process, known as marginalization, effectively isolates the distribution of the variables of interest while preserving the total probability mass or density.[10] The resulting marginal distribution captures the behavior of the selected variables without regard to the specific values of the others, making it a fundamental tool for simplifying multivariate analyses.[11] The term "marginal distribution" originates from the practice of recording totals in the margins of joint probability tables, a convention that emerged in early 20th-century statistics with the development of contingency table analysis by Karl Pearson around 1900.[12] Intuitively, obtaining a marginal distribution is akin to collapsing a multi-dimensional contingency table into a lower-dimensional one by summing the entries along rows or columns, thereby focusing on the totals for the variables of interest.[13] In general, for two random variables X and Y with joint distribution P(X, Y), the marginal distribution of X, denoted P(X), is obtained through the marginalization operation over Y.[14] This framework extends naturally to subsets of any collection of random variables, providing a way to extract univariate or lower-dimensional distributions from more complex joint structures.[10]Discrete case
In the discrete case, the marginal distribution of a random variable X from a joint distribution of discrete random variables X and Y is defined by its probability mass function (PMF), given by p_X(x) = \sum_y p_{X,Y}(x,y), where the sum is over all possible values of Y, and p_{X,Y}(x,y) is the joint PMF.[1][15] This formula extracts the distribution of X by aggregating the joint probabilities across the support of Y. Similarly, the marginal PMF for Y is p_Y(y) = \sum_x p_{X,Y}(x,y).[16][8] The marginal cumulative distribution function (CDF) for the discrete random variable X is then obtained by summing the marginal PMF up to x: F_X(x) = \sum_{k \leq x} p_X(k), where the sum is over all discrete points k in the support of X that are less than or equal to x.[17][18] This step function CDF fully characterizes the marginal distribution, reflecting the countable nature of the outcomes. To compute the marginal PMF in practice, one constructs a joint PMF table representing p_{X,Y}(x,y) for the finite or countable supports of X and Y, then sums the entries along the rows (for p_X(x)) or columns (for p_Y(y)). For instance, consider a simple bivariate discrete distribution where X takes values {1, 2} and Y takes values {a, b}, with the following joint PMF table:| y = a | y = b | Marginal p_X(x) | |
|---|---|---|---|
| x=1 | 0.2 | 0.3 | 0.5 |
| x=2 | 0.1 | 0.4 | 0.5 |
| Marginal p_Y(y) | 0.3 | 0.7 | 1.0 |