Probability density function

A probability density function (PDF), often denoted as f(x), is a nonnegative function that describes the relative likelihood for a continuous random variable X to take on values in a given range, where the actual probability of X falling within an interval [a, b] is computed as the integral \int_a^b f(x) \, dx.^[1] Unlike discrete distributions, the PDF assigns zero probability to any exact single value, emphasizing densities rather than point masses, and it must integrate to 1 over its entire support to form a valid probability distribution.^[2] Key properties of a PDF include nonnegativity (f(x) \geq 0 for all x) and normalization (\int_{-\infty}^{\infty} f(x) \, dx = 1), ensuring it represents a proper probability measure.^[2] The PDF is closely related to the cumulative distribution function (CDF) F(x) = P(X \leq x), which for continuous variables is the antiderivative of the PDF, i.e., F(x) = \int_{-\infty}^x f(t) \, dt, allowing probabilities to be derived from either function.^[3] These functions underpin continuous probability models, such as the uniform distribution on [0, 1] where f(x) = 1 for x \in [0, 1], or the normal distribution with its bell-shaped curve centered at the mean.^[1] In statistical applications, PDFs are essential for modeling real-world phenomena like measurement errors, waiting times, or physical quantities, enabling computations of expectations, variances, and other moments via integrals involving the PDF.^[3] For instance, the expected value E[X] = \int_{-\infty}^{\infty} x f(x) \, dx quantifies the average outcome, while higher moments describe spread and shape.^[2] This framework extends to multivariate cases, where joint PDFs describe dependencies between multiple continuous variables, forming the basis for advanced topics in probability theory and data analysis.^[3]

Fundamental Concepts

Definition

In probability theory, a probability density function (PDF) describes the likelihood of a continuous random variable taking on a particular value within its range, though the probability at any single point is zero. For a continuous random variable X with an absolutely continuous probability distribution, the PDF, denoted f_X(x), is a non-negative function such that the probability that X lies in an interval (a, b) is given by the integral

P(a < X \leq b) = \int_a^b f_X(x) \, dx

for any a < b, where the subscript X indicates the density associated with the random variable X.^[1]^[2]^[4] The non-negativity requirement ensures f_X(x) \geq 0 for all x in the real line, reflecting that densities cannot be negative as they represent relative likelihoods.^[2]^[1] Additionally, the PDF must satisfy the normalization condition

\int_{-\infty}^{\infty} f_X(x) \, dx = 1,

which guarantees that the total probability over the entire sample space is unity.^[1]^[2]^[4] The support of the PDF is the set of points where f_X(x) > 0, which corresponds to the values over which the random variable X has positive density, distinguishing it from the full range of possible outcomes where the density may be zero outside this set.^[1]^[4] This notation and structure form the foundational framework for univariate continuous distributions.^[2]

Interpretation and Properties

The probability density function (PDF) of a continuous random variable X, denoted f_X(x), provides the relative likelihood of X taking on values near a specific point x, but it does not represent the probability at that exact point, which is always zero for continuous distributions.^[1] Instead, the probability that X falls within an interval (a, b) is given by the area under the PDF curve over that interval, P(a < X < b) = \int_a^b f_X(x) \, dx.^[5] This interpretation emphasizes that the PDF describes a density, where higher values of f_X(x) indicate regions of greater concentration of probability, but actual probabilities require integration to accumulate the area.^[1] A fundamental property of any PDF is normalization, ensuring that the total probability across the entire real line sums to one: \int_{-\infty}^{\infty} f_X(x) \, dx = 1.^[5] This condition guarantees that the PDF functions as a valid weighting mechanism for the distribution, akin to a weighted average where the weights are the densities integrated over all possible outcomes.^[1] The PDF must also satisfy non-negativity, f_X(x) \geq 0 for all x, and be integrable over the real line, meaning the integral exists and is finite.^[5] Continuity is not required; the PDF may exhibit discontinuities at certain points, provided it remains integrable.^[6] Additionally, the mode of the distribution is defined as the value x that maximizes f_X(x), representing the point of highest density.^[7] The expectation of a function g(X) of the random variable, E[g(X)], is computed using the PDF as E[g(X)] = \int_{-\infty}^{\infty} g(x) f_X(x) \, dx, assuming g is integrable with respect to the PDF.^[5] For the simple case of the expected value E[X], this becomes E[X] = \int_{-\infty}^{\infty} x f_X(x) \, dx. To derive this, consider partitioning the real line into small intervals of width \Delta x_i around points x_i, where the probability mass in each interval approximates f_X(x_i) \Delta x_i. The contribution to the expectation from each interval is roughly x_i \cdot f_X(x_i) \Delta x_i, analogous to a Riemann sum for the discrete case. Summing over all intervals and taking the limit as \Delta x_i \to 0 yields the integral form, which serves as the continuous analog of the discrete expectation \sum x_i p(x_i).^[1] This integral represents the long-run average value of X under repeated sampling from the distribution.^[5]

Univariate Probability Density Functions

Relation to Cumulative Distribution Function

The cumulative distribution function (CDF) of a univariate random variable X, denoted F_X(x), is defined as F_X(x) = P(X \leq x). For an absolutely continuous random variable with probability density function (PDF) f_X, this CDF takes the explicit form

F_X(x) = \int_{-\infty}^x f_X(t) \, dt,

which represents the accumulated probability up to x as the area under the PDF curve from negative infinity to x. Conversely, if the CDF F_X is absolutely continuous, then by the fundamental theorem of calculus, the PDF exists and is given by the derivative f_X(x) = \frac{d}{dx} F_X(x) almost everywhere with respect to Lebesgue measure.^[8] Absolute continuity of the CDF is a key condition that guarantees this differentiability, ensuring that F_X can be expressed as the integral of its derivative and excluding distributions with singular components, such as those concentrated on sets of Lebesgue measure zero (e.g., Dirac delta distributions).^[9] This bidirectional relationship allows the CDF to be recovered from the PDF via the integral formula provided above, while the PDF can be obtained from the CDF through differentiation when absolute continuity holds. The absolute continuity assumption aligns with the standard framework for continuous distributions, where the PDF fully characterizes the probability measure.^[10]

Absolutely Continuous Distributions

Absolutely continuous distributions are characterized by probability density functions (PDFs) that integrate to 1 over the real line, providing a smooth description of probability concentrations for continuous random variables. Common families of univariate PDFs illustrate this concept through specific parametric forms that model diverse phenomena, such as uniform outcomes in bounded intervals or decay processes in reliability analysis. These distributions are foundational in statistics and probability theory, enabling precise calculations of probabilities via integration of the PDF. The uniform distribution represents equal likelihood across a finite interval, serving as a baseline for many sampling methods. Its PDF is given by

f(x) = \frac{1}{b - a}, \quad a \leq x \leq b,

where a < b are the location parameters defining the interval endpoints.^[11] Outside this support, f(x) = 0. This form ensures the total probability is 1, as the height is constant and the width is b - a. The normal distribution, also known as the Gaussian distribution, models symmetric, bell-shaped data around a central value, ubiquitous in natural and social sciences due to the central limit theorem. Its PDF is

f(x) = \frac{1}{\sigma \sqrt{2\pi}} \exp\left( -\frac{(x - \mu)^2}{2\sigma^2} \right),

defined for all real x, with parameters \mu \in \mathbb{R} (mean or location) and \sigma > 0 (standard deviation or scale).^[12] The standard normal case sets \mu = 0 and \sigma = 1. The exponential distribution captures memoryless waiting times, such as inter-arrival times in Poisson processes, with probabilities decreasing exponentially. Its PDF is

f(x) = \lambda \exp(-\lambda x), \quad x \geq 0,

where \lambda > 0 is the rate parameter.^[13] For x < 0, f(x) = 0, and the mean is $1/\lambda. The gamma distribution generalizes the exponential to model positive, skewed data like rainfall amounts or lifetimes, encompassing the exponential as a special case when the shape parameter is 1. Its PDF is

f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha - 1} \exp(-\beta x), \quad x > 0,

with shape parameter \alpha > 0 and rate parameter \beta > 0, where \Gamma denotes the gamma function.^[14] This parameterization allows flexibility in tail behavior and variance. These families emerged in the 19th and early 20th centuries through contributions from mathematicians like Pierre-Simon Laplace, who advanced the normal distribution in error theory around 1812, and Karl Pearson, who formalized aspects of the gamma distribution in his 1895 work on continuous frequency curves.^[15]^[16]

Moments and Characteristic Function

The raw moments of a univariate random variable X with probability density function f_X(x) are defined as the expected values \mu_n = \mathbb{E}[X^n] = \int_{-\infty}^{\infty} x^n f_X(x) \, dx for n = 1, 2, \dots, assuming the integral exists.^[17] The first raw moment \mu_1 corresponds to the mean \mu = \mathbb{E}[X], while higher-order raw moments capture additional distributional information. Central moments, which measure deviations from the mean, are given by \mathbb{E}[(X - \mu)^n] = \int_{-\infty}^{\infty} (x - \mu)^n f_X(x) \, dx.^[17] The variance, as the second central moment, quantifies the spread of the distribution and is expressed as \operatorname{Var}(X) = \mathbb{E}[(X - \mu)^2] = \int_{-\infty}^{\infty} (x - \mu)^2 f_X(x) \, dx, or equivalently, \operatorname{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2.^[17] These moment formulas provide a way to summarize key features of the distribution directly from the density function, with the integrals converging under appropriate integrability conditions on f_X(x).^[17] The characteristic function of X, denoted \phi_X(t) = \mathbb{E}[e^{itX}], offers a Fourier transform-based representation of the distribution and is defined via the integral \phi_X(t) = \int_{-\infty}^{\infty} e^{itx} f_X(x) \, dx for real t.^[18] This function encodes all probabilistic information about X, including its moments: if the n-th moment exists, then \mathbb{E}[X^n] = (-i)^n \frac{d^n}{dt^n} \phi_X(0), where derivatives are evaluated at t = 0.^[18]^[19] Under suitable conditions, such as absolute continuity of the distribution, the characteristic function uniquely determines the probability density function, as distinct distributions yield distinct characteristic functions.^[18]^[19] This uniqueness theorem facilitates inversion techniques to recover the density from \phi_X(t), underscoring its role in probabilistic analysis.^[18]

Connection to Discrete and Mixed Distributions

Probability Mass Functions

In contrast to the probability density function (PDF) used for continuous random variables, the probability mass function (PMF) describes the distribution of a discrete random variable X taking values in a countable set, such as the integers. The PMF, denoted p_X(k) = P(X = k), assigns a probability to each possible value k of X, satisfying p_X(k) \geq 0 for all k and \sum_k p_X(k) = 1. Unlike PDFs, which involve integrals to compute probabilities over intervals, PMFs directly provide probabilities at discrete points without integration, as the total probability is the sum over all points.^[20]^[21] The connection between PMFs and PDFs arises in the limiting case where a discrete distribution approximates a continuous one. Consider discretizing the real line into bins of width \Delta; the PMF value at a point k approximates the PDF via p_X(k) \approx f_X(k) \Delta, where f_X is the underlying density. As \Delta \to 0, the sum \sum_k p_X(k) g(k) over a function g converges to the integral \int f_X(x) g(x) \, dx, illustrating how discrete probabilities become continuous densities in the limit. This approximation is fundamental in deriving continuous distributions from discrete models, such as histograms converging to smooth densities.^[21]^[1] For discrete distributions, an informal way to represent the PMF as a "density" uses the Dirac delta function, allowing treatment within the continuous framework. The PMF can be expressed as f_X(x) = \sum_k p_X(k) \delta(x - k), where \delta is the Dirac delta satisfying \int \delta(x - k) g(x) \, dx = g(k) for a test function g. This sum of weighted deltas places probability mass at discrete points, enabling integrals like P(a < X \leq b) = \int_a^b f_X(x) \, dx = \sum_{k \in (a,b]} p_X(k), though the Dirac delta is a distribution rather than a classical function. Such representations are useful in advanced probability for unifying discrete and continuous cases.^[22]^[23] A simple example is the Bernoulli distribution, a discrete random variable X with P(X=0) = 1-p and P(X=1) = p for $0 < p < 1, so the PMF is p_X(k) = (1-p)^{1-k} p^k for k=0,1. To approximate this with a continuous uniform distribution, consider narrowing the support around 0 and 1; for instance, as bin width \Delta \to 0, the discrete masses at 0 and 1 can be modeled by deltas weighted by $1-p and p, or smeared into narrow uniforms of height (1-p)/\Delta and p/\Delta over intervals of length \Delta, yielding a PDF that integrates to the original probabilities in the limit. This highlights how a two-point discrete uniform (Bernoulli with p=0.5) approximates a continuous uniform on [0,1] only loosely, but the delta approach captures the exact discrete nature precisely.^[20]^[21]

Generalization to Signed and Complex Measures

In measure theory, the concept of a probability density function generalizes beyond positive probability measures through the Radon-Nikodym theorem, which defines the density of one measure with respect to another. Specifically, if P and \mu are \sigma-finite measures on a measurable space (\Omega, \mathcal{F}) with P \ll \mu (i.e., P is absolutely continuous with respect to \mu), then there exists a \mu-integrable function f: \Omega \to \mathbb{R} such that P(A) = \int_A f \, d\mu for all A \in \mathcal{F}, and this f is unique up to \mu-almost everywhere equality; here, f = \frac{dP}{d\mu} is the Radon-Nikodym derivative, serving as the generalized density.^[24] In the standard probabilistic setting, \mu is the Lebesgue measure on \mathbb{R}^d, and f is nonnegative with \int f \, d\mu = 1, but the framework allows extensions where these constraints do not hold.^[25] For mixed distributions, the Lebesgue decomposition theorem provides a canonical breakdown of any probability measure P on (\mathbb{R}, \mathcal{B}) with respect to Lebesgue measure \lambda. It states that P decomposes uniquely as P = P_{ac} + P_d + P_{sc}, where P_{ac} \ll \lambda (absolutely continuous part, admitting a density f_{ac} such that P_{ac}(A) = \int_A f_{ac} \, d\lambda), P_d is atomic (discrete part, concentrated on countable points with masses summing to at most 1), and P_{sc} is singular continuous (neither absolutely continuous nor atomic, supported on sets of Lebesgue measure zero but without point masses).^[26] This decomposition captures mixed distributions, where both continuous and discrete components contribute, such as the distribution of a random variable that is continuous on an interval but has a point mass at a boundary; the density then refers only to the P_{ac} component, while the full description requires specifying all parts.^[27] Signed measures extend densities to settings where the total variation may not normalize to 1 and where f can take negative values. A signed measure \nu on (\Omega, \mathcal{F}) is absolutely continuous with respect to a positive measure \mu if |\nu| \ll \mu, where |\nu| is the total variation measure; by the Radon-Nikodym theorem for signed measures, there then exists an integrable f: \Omega \to \mathbb{R} (possibly negative) such that \nu(A) = \int_A f \, d\mu for all A \in \mathcal{F}, with \int |f| \, d\mu = |\nu|(\Omega) < \infty.^[25] In discrepancy theory, such signed densities arise in analyzing the deviation between empirical point distributions and uniform measures; for instance, the local discrepancy function for a point set in [0,1]^d can be expressed as the integral of a signed density f with respect to Lebesgue measure, quantifying how well the points approximate uniformity, where \int f \, d\lambda measures net signed mass rather than probability.^[28] Complex measures and densities further generalize the framework, particularly in applications requiring phase information. A complex measure \nu on (\Omega, \mathcal{F}) is a countably additive set function with values in \mathbb{C}, decomposable via the Jordan decomposition into positive and negative imaginary parts; if \nu \ll \mu for a positive \mu, the Radon-Nikodym derivative f = \frac{d\nu}{d\mu} is a complex-valued integrable function such that \nu(A) = \int_A f \, d\mu.^[29] In quantum mechanics, the wave function \psi: \mathbb{R}^d \to \mathbb{C} acts as such a complex density with respect to Lebesgue measure, where the probability density is given by |\psi(x)|^2, normalized so that \int |\psi(x)|^2 \, d\lambda(x) = 1, but \psi itself encodes interference effects through its complex phases; this structure ensures that probabilities are real and nonnegative while allowing the amplitude \psi to be complex.^[30] Similar complex densities appear in signal processing for analytic representations, where the complex envelope of a bandpass signal has a density whose modulus squared yields the instantaneous power, facilitating computations in the frequency domain.^[31]

Multivariate Probability Density Functions

Joint Densities

In the multivariate setting, the joint probability density function (PDF) describes the distribution of a random vector \mathbf{X} = (X_1, \dots, X_n)^\top, where each X_i is a continuous random variable. The joint PDF, denoted f_{\mathbf{X}}(x_1, \dots, x_n), is a non-negative function defined on \mathbb{R}^n such that for any Borel set A \subseteq \mathbb{R}^n, the probability P(\mathbf{X} \in A) equals the integral of f_{\mathbf{X}} over A.^[32] Specifically, for a rectangular region defined by vectors \mathbf{a} = (a_1, \dots, a_n)^\top and \mathbf{b} = (b_1, \dots, b_n)^\top with a_i < b_i for all i, the probability P(\mathbf{a} < \mathbf{X} \leq \mathbf{b}) is given by

P(\mathbf{a} < \mathbf{X} \leq \mathbf{b}) = \int_{a_1}^{b_1} \cdots \int_{a_n}^{b_n} f_{\mathbf{X}}(x_1, \dots, x_n) \, dx_1 \cdots dx_n.

A fundamental property is that f_{\mathbf{X}}(x_1, \dots, x_n) \geq 0 for all (x_1, \dots, x_n) \in \mathbb{R}^n, and the total integral over the entire space is 1:

\int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} f_{\mathbf{X}}(x_1, \dots, x_n) \, dx_1 \cdots dx_n = 1.

These conditions ensure that the joint PDF normalizes the probabilities correctly across the n-dimensional space.^[33]^[34] The support of the joint PDF consists of the hyperregions in \mathbb{R}^n where f_{\mathbf{X}} > 0, which may be the entire space or a lower-dimensional subset depending on the distribution; outside this support, the density is zero, concentrating the probability mass accordingly.^[34] To obtain the marginal PDF of a single component, say X_1, integrate the joint PDF over the other variables:

f_{X_1}(x_1) = \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} f_{\mathbf{X}}(x_1, x_2, \dots, x_n) \, dx_2 \cdots dx_n.

This process reduces the multivariate density to a univariate one.^[35] A prominent example is the bivariate normal distribution for n=2, where the joint PDF of \mathbf{X} = (X_1, X_2)^\top with means \mu_1, \mu_2, variances \sigma_1^2, \sigma_2^2, and correlation \rho is

f_{\mathbf{X}}(x_1, x_2) = \frac{1}{2\pi \sigma_1 \sigma_2 \sqrt{1 - \rho^2}} \exp\left( -\frac{1}{2(1 - \rho^2)} \left[ \frac{(x_1 - \mu_1)^2}{\sigma_1^2} + \frac{(x_2 - \mu_2)^2}{\sigma_2^2} - \frac{2\rho (x_1 - \mu_1)(x_2 - \mu_2)}{\sigma_1 \sigma_2} \right] \right),

for |\rho| < 1; this form extends the univariate normal to capture dependence between variables.^[36]

Marginal and Conditional Densities

In multivariate probability distributions, the marginal probability density function (PDF) of a single component X_i is obtained by integrating the joint PDF f_{\mathbf{X}}(\mathbf{x}) over all other variables. For a random vector \mathbf{X} = (X_1, \dots, X_n), the marginal PDF is given by

f_{X_i}(x_i) = \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} f_{\mathbf{X}}(x_1, \dots, x_n) \, dx_1 \cdots \widehat{dx_i} \cdots dx_n,

where the integral is taken over all variables except X_i, denoted as dx_{-i}.^[37] This process "marginalizes out" the dependence on the other components, yielding the univariate distribution of X_i alone.^[38] The conditional PDF describes the distribution of one or more variables given the values of others. For continuous random variables X and Y with joint PDF f_{X,Y}(x,y) and marginal PDF f_Y(y) > 0, the conditional PDF of X given Y = y is

f_{X|Y}(x|y) = \frac{f_{X,Y}(x,y)}{f_Y(y)}.

^[39] This ratio normalizes the joint density along the conditioning variable, providing the density of X restricted to the event Y = y.^[40] In the multivariate case, the conditional PDF of a subset of variables given the rest follows analogously by dividing the joint PDF by the marginal PDF of the conditioning subset.^[41] A key consequence is the chain rule for joint PDFs, which decomposes the multivariate density into a product of marginal and conditional densities. For \mathbf{X} = (X_1, \dots, X_n), the joint PDF satisfies

f_{\mathbf{X}}(x_1, \dots, x_n) = f_{X_1}(x_1) \prod_{i=2}^n f_{X_i | X_1, \dots, X_{i-1}}(x_i | x_1, \dots, x_{i-1}),

assuming all conditional densities are well-defined.^[42] This factorization, akin to the law of total probability, allows recursive computation of joint probabilities from sequential conditionals and is foundational for modeling high-dimensional distributions.^[43] As an illustration, consider the bivariate normal distribution, where \mathbf{X} = (X, Y) follows a joint normal PDF with mean vector \boldsymbol{\mu} and covariance matrix \boldsymbol{\Sigma}. The marginal PDF of X is then univariate normal with mean \mu_X and variance \sigma_{XX}, the corresponding elements from \boldsymbol{\mu} and \boldsymbol{\Sigma}.^[44] This property holds because integration of the quadratic form in the bivariate normal exponent yields the univariate form, preserving normality under marginalization.

Independence and Copulas

In the context of multivariate probability density functions, two continuous random variables X and Y with joint density f_{X,Y}(x,y) are statistically independent if and only if the joint density factors into the product of the marginal densities, that is,

f_{X,Y}(x,y) = f_X(x) f_Y(y)

for all x, y in the respective supports of X and Y. This factorization implies that the occurrence of one variable provides no information about the other, allowing separate parameterization of each marginal distribution without affecting the joint structure. Under independence, expectations of products of functions of these variables simplify significantly. Specifically, for measurable functions g and h,

\mathbb{E}[g(X) h(Y)] = \mathbb{E}[g(X)] \mathbb{E}[h(Y)],

provided the expectations exist. This property extends to higher moments and facilitates computations in scenarios like variance of sums, where \mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y), highlighting the absence of covariance. While independence assumes no dependence, real-world multivariate data often exhibits complex dependencies that cannot be captured by simple products of marginals. Copulas provide a framework to model such dependencies separately from the marginal distributions. According to Sklar's theorem, any multivariate cumulative distribution function (CDF) F_{X,Y}(x,y) can be expressed as

F_{X,Y}(x,y) = C(F_X(x), F_Y(y)),

where C is a copula—a joint CDF on [0,1]^2 with uniform marginals—and F_X, F_Y are the marginal CDFs. For absolutely continuous distributions, the copula density c(u,v) exists and is given by

c(u,v) = \frac{\partial^2 C(u,v)}{\partial u \partial v},

yielding the joint density as f_{X,Y}(x,y) = c(F_X(x), F_Y(y)) f_X(x) f_Y(y). This decomposition allows flexible modeling: marginals can be fitted empirically or parametrically, while the copula captures the dependence structure. A prominent example is the Gaussian copula, derived from the multivariate normal distribution. For bivariate normal marginals with correlation \rho, the copula C(u,v; \rho) is the CDF of the transformed uniforms via the inverse normal CDF, preserving linear correlation in the latent Gaussian space. This copula is widely used in finance for modeling joint defaults, as it links arbitrary marginals (e.g., non-normal) to a Gaussian dependence pattern.

Transformations and Derived Distributions

Change of Variables Formula

In probability theory, the change of variables formula provides a method to determine the probability density function (PDF) of a transformed random variable when the transformation is invertible and sufficiently smooth. Consider a random vector \mathbf{X} with PDF f_{\mathbf{X}}(\mathbf{x}) defined on a support in \mathbb{R}^n, and let \mathbf{Y} = g(\mathbf{X}) where g is a diffeomorphism—meaning it is continuously differentiable, invertible, and has a continuously differentiable inverse. The PDF of \mathbf{Y}, denoted f_{\mathbf{Y}}(\mathbf{y}), is given by

f_{\mathbf{Y}}(\mathbf{y}) = f_{\mathbf{X}}\left(g^{-1}(\mathbf{y})\right) \left| \det J_{g^{-1}}(\mathbf{y}) \right|,

where J_{g^{-1}}(\mathbf{y}) is the Jacobian matrix of the inverse transformation evaluated at \mathbf{y}.^[45]^[46] The Jacobian matrix J_{g^{-1}}(\mathbf{y}) is the n \times n matrix whose (i,j)-th entry is the partial derivative \frac{\partial x_i}{\partial y_j}, with \mathbf{x} = g^{-1}(\mathbf{y}). The determinant of this matrix, \det J_{g^{-1}}(\mathbf{y}), quantifies the local scaling of volumes under the transformation from the \mathbf{x}-space to the \mathbf{y}-space, ensuring that the integral of the PDF over any region remains a valid probability by adjusting for the distortion in infinitesimal volumes. This formula applies uniformly to both univariate (n=1) and multivariate cases, reducing to the absolute value of the derivative in the scalar setting.^[45]^[46] The absolute value around the Jacobian determinant is essential because the transformation may reverse orientation, causing the determinant to be negative, but the PDF must remain non-negative to integrate to 1 over its support. Without the absolute value, the formula could yield negative densities in regions where the mapping flips the coordinate system, violating the fundamental properties of a PDF.^[45]^[46] A sketch of the proof relies on the change of variables theorem from multivariable calculus. The cumulative distribution function satisfies \mathbb{P}(\mathbf{Y} \in B) = \int_{g^{-1}(B)} f_{\mathbf{X}}(\mathbf{x}) \, d\mathbf{x} for any measurable set B. Substituting the change of variables \mathbf{x} = g^{-1}(\mathbf{y}) into the integral yields d\mathbf{x} = \left| \det J_{g^{-1}}(\mathbf{y}) \right| d\mathbf{y}, so \mathbb{P}(\mathbf{Y} \in B) = \int_B f_{\mathbf{X}}(g^{-1}(\mathbf{y})) \left| \det J_{g^{-1}}(\mathbf{y}) \right| \, d\mathbf{y}, which defines the PDF of \mathbf{Y}. This holds under the assumptions of differentiability and invertibility, preserving the total probability measure.^[45]^[46]

Scalar to Scalar Transformations

In the context of univariate random variables, consider a continuous random variable X with probability density function (PDF) f_X(x) defined on some support, and let Y = g(X) where g: \mathbb{R} \to \mathbb{R} is a differentiable function. The PDF of Y, denoted f_Y(y), can be derived using the method of transformations, which preserves probability mass under the mapping.^[47] For a strictly monotonic g, the transformation is one-to-one, and the support of Y is the image g(\{x : f_X(x) > 0\}), mapping intervals in the support of X directly to intervals in the support of Y. If g is strictly increasing, then g has an inverse g^{-1}, and

f_Y(y) = f_X(g^{-1}(y)) \cdot \frac{1}{g'(g^{-1}(y))}, \quad y \in g(\operatorname{supp}(X)).

If g is strictly decreasing, the formula adjusts for the direction of mapping by incorporating the absolute value:

f_Y(y) = f_X(g^{-1}(y)) \cdot \frac{1}{|g'(g^{-1}(y))|}, \quad y \in g(\operatorname{supp}(X)).

This ensures the PDF remains non-negative and integrates to 1, as the derivative term accounts for the stretching or compression of probability densities under the transformation.^[47]^[48] For non-monotonic g, the transformation is not one-to-one, and multiple values of x may map to the same y. In such cases, the PDF of Y is obtained by summing contributions from each branch of the inverse:

f_Y(y) = \sum_k \frac{f_X(x_k)}{|g'(x_k)|},

where the sum is over all x_k such that g(x_k) = y and f_X(x_k) > 0. The support of Y is then the range of g over the support of X, but intervals may fold or overlap, requiring careful identification of the preimages for each y. This approach, often derived via the cumulative distribution function technique for verification, handles cases like quadratic or absolute value transformations.^[49]^[50] As an illustrative example, suppose X \sim \operatorname{Exp}(\lambda) with PDF f_X(x) = \lambda e^{-\lambda x} for x > 0, and let Y = -\log(X). Here, g(x) = -\log(x) is strictly decreasing and differentiable on (0, \infty), with inverse g^{-1}(y) = e^{-y} and g'(x) = -1/x, so |g'(x)| = 1/x. The support of X is (0, \infty), which maps to (-\infty, \infty) under g. Substituting into the monotonic formula yields

f_Y(y) = \lambda e^{-\lambda e^{-y}} \cdot e^{-y}, \quad y \in (-\infty, \infty).

This is the PDF of a Gumbel distribution with location parameter \log(\lambda) and scale parameter 1, a type of extreme value distribution related to transformations of exponential variables in extreme value theory.^[51]^[52]

Vector to Vector Transformations

In the context of multivariate probability density functions, vector-to-vector transformations describe how the joint PDF of a random vector \mathbf{X} \in \mathbb{R}^n changes under an invertible mapping \mathbf{Y} = \mathbf{g}(\mathbf{X}), where \mathbf{g} is a differentiable diffeomorphism preserving the dimension n.^[53] This extends the univariate change of variables by incorporating the full Jacobian matrix to account for multidimensional distortions.^[54] For linear transformations, consider \mathbf{Y} = A\mathbf{X} + \mathbf{b}, where A is an n \times n invertible matrix and \mathbf{b} \in \mathbb{R}^n. The joint PDF of \mathbf{Y} is given by

f_{\mathbf{Y}}(\mathbf{y}) = f_{\mathbf{X}}(A^{-1}(\mathbf{y} - \mathbf{b})) \cdot \frac{1}{|\det A|},

where |\det A| scales the density to preserve total probability mass.^[55] This formula arises because the constant Jacobian determinant A adjusts for the volume change induced by the linear mapping.^[56] Nonlinear vector-to-vector transformations follow a similar principle but with a position-dependent Jacobian. For a general diffeomorphism \mathbf{Y} = \mathbf{g}(\mathbf{X}), the joint PDF is

f_{\mathbf{Y}}(\mathbf{y}) = f_{\mathbf{X}}(\mathbf{g}^{-1}(\mathbf{y})) \cdot \frac{1}{|\det J_{\mathbf{g}}(\mathbf{g}^{-1}(\mathbf{y}))|},

where J_{\mathbf{g}} denotes the Jacobian matrix of \mathbf{g}.^[53] This ensures the transformation remains valid only for bijective mappings that maintain the n-dimensional support without collapse or expansion.^[54] A illustrative example is the rotation of a bivariate normal distribution, where \mathbf{X} \sim \mathcal{N}_2(\boldsymbol{\mu}, \Sigma) undergoes an orthogonal transformation \mathbf{Y} = R\mathbf{X}, with R a rotation matrix satisfying R^T R = I and |\det R| = 1. The joint PDF form is preserved, yielding \mathbf{Y} \sim \mathcal{N}_2(R\boldsymbol{\mu}, R\Sigma R^T), as the unit determinant implies no density scaling beyond the covariance adjustment.^[57] This property highlights rotational invariance in the isotropic case where \Sigma = \sigma^2 I.^[58]

Operations on Independent Random Variables

Sums and Convolutions

When two continuous random variables X and Y are independent, the probability density function (PDF) of their sum Z = X + Y is obtained through the convolution of their individual PDFs. This arises because the joint PDF factors as f_{X,Y}(x,y) = f_X(x) f_Y(y), allowing the marginal PDF of Z to be computed by integrating over the possible values of X (or Y). Specifically, the PDF f_Z(z) is given by the integral

f_Z(z) = \int_{-\infty}^{\infty} f_X(x) f_Y(z - x) \, dx,

which represents the convolution f_X * f_Y.^[59] The independence assumption is crucial, as it simplifies the joint distribution and enables this direct integral form; without independence, the PDF of the sum would require the full joint PDF and a more general transformation approach. The convolution operation is associative, meaning that for independent random variables X, Y, and W, the PDF of (X + Y) + W equals that of X + (Y + W), facilitating extensions to sums of multiple variables. Additionally, the characteristic function of the sum multiplies under independence: \phi_{X+Y}(t) = \phi_X(t) \phi_Y(t), providing an alternative method to derive the PDF via Fourier inversion (as detailed in the Moments and Characteristic Functions section).^[59] A concrete example illustrates this: the sum of two independent exponential random variables with the same rate parameter \lambda > 0 has a gamma PDF with shape parameter 2 and rate \lambda. If X \sim \exp(\lambda) and Y \sim \exp(\lambda), then f_X(x) = \lambda e^{-\lambda x} for x \geq 0 and similarly for f_Y, yielding

f_Z(z) = \int_{0}^{z} \lambda e^{-\lambda x} \lambda e^{-\lambda (z - x)} \, dx = \lambda^2 z e^{-\lambda z}, \quad z \geq 0,

which is the gamma(2, \lambda) density. This result generalizes to the sum of n such exponentials following a gamma(n, \lambda) distribution.^[59]

Products and Quotients

When two independent continuous random variables X and Y with probability density functions f_X(x) and f_Y(y) are considered, the probability density function of their product Z = XY is derived via transformation of variables, yielding

f_Z(z) = \int_{-\infty}^{\infty} f_X(x) f_Y\left(\frac{z}{x}\right) \frac{1}{|x|} \, dx,

where the integral is over x \neq 0 to account for the Jacobian of the transformation.$$](https://apps.dtic.mil/sti/tr/pdf/AD0603667.pdf) This formula arises from the joint density f_{X,Y}(x,y) = f_X(x) f_Y(y) due to independence, followed by integrating over the curve xy = z with the appropriate change-of-variable adjustment.[](https://apps.dtic.mil/sti/tr/pdf/AD0603667.pdf) The factor $1/|x| emerges as the absolute value of the derivative in the transformation, ensuring the density integrates to unity.[](https://arxiv.org/abs/2111.13487) For the quotient W = X/Y of the same independent variables, the probability density function is [ f_W(w) = \int_{-\infty}^{\infty} f_X(w y) f_Y(y) |y| , dy,

obtained similarly by transforming to the relation $x = w y$ and incorporating the [Jacobian](/page/Jacobian) [determinant](/page/Determinant) $|y|$ from the [change of variables](/page/Change_of_variables).$$](https://apps.dtic.mil/sti/tr/pdf/AD0603667.pdf) [Independence](/page/Independence) ensures the joint density separates, allowing the marginal PDF of $W$ to be expressed as this [integral](/page/Integral) over $y$, with the $|y|$ term adjusting for the [scaling](/page/Scaling) in the [transformation](/page/Transformation).\[](https://apps.dtic.mil/sti/tr/pdf/AD0603667.pdf) An alternative method for the product, particularly when $X > 0$ and $Y > 0$, employs a logarithmic transformation: let $U = \log X$ and $V = \log Y$, so $\log Z = U + V$. The PDF of the sum $U + V$ is found via [convolution](/page/Convolution), and the PDF of $Z$ is then recovered by the scalar transformation formula applied to the exponential back-transformation $Z = e^{U+V}$, leveraging the [monotone](/page/Monotone) nature of the [exponential function](/page/Exponential_function).\[](https://apps.dtic.mil/sti/tr/pdf/AD0603667.pdf) This approach highlights the separability enabled by [independence](/page/Independence), mirroring additive structures in [logarithmic scale](/page/Logarithmic_scale) while requiring positivity for the logs to be defined.\[](https://apps.dtic.mil/sti/tr/pdf/AD0603667.pdf) ### Examples of Specific Derived Distributions One prominent example of a derived distribution arises from the quotient of two independent standard normal random variables. Let $X \sim \mathcal{N}(0,1)$ and $Y \sim \mathcal{N}(0,1)$ be independent, and define $Z = X/Y$. The probability density function of $Z$ is given by \[ f_Z(z) = \frac{1}{\pi (1 + z^2)}, \quad -\infty < z < \infty,

which is the standard Cauchy distribution.^[60] This result, first systematically studied by Geary in 1930, highlights how ratios of normals can produce heavy-tailed distributions without finite moments.^[61] Another illustrative case is the sum of independent uniform random variables on [0,1]. For n such variables U_1, \dots, U_n, the sum S_n = \sum_{i=1}^n U_i follows the Irwin-Hall distribution, whose PDF is a piecewise polynomial expressed as

f_{S_n}(x) = \frac{1}{(n-1)!} \sum_{k=0}^{\lfloor x \rfloor} (-1)^k \binom{n}{k} (x - k)^{n-1}, \quad 0 \leq x \leq n.

This density, derived through repeated convolution, starts as a triangular form for n=2 and evolves into a smoother, bell-shaped curve approximating a normal distribution for large n by the central limit theorem.^[62] The product of two independent exponential random variables also yields a non-standard form. Consider X \sim \operatorname{Exp}(1) and Y \sim \operatorname{Exp}(1), independent, with Z = XY. The PDF of Z is

f_Z(z) = 2 K_0(2 \sqrt{z}), \quad z > 0,

where K_0 is the modified Bessel function of the second kind of order zero. This distribution, while not belonging to the gamma family, shares some tail behaviors reminiscent of gamma distributions and arises in contexts like reliability analysis for series systems.^[63]