Fact-checked by Grok 2 weeks ago

Degenerate distribution

In probability theory, a degenerate distribution is a probability distribution concentrated entirely on a single point, where a random variable takes a fixed value c with probability 1. This corresponds to the Dirac delta measure centered at c.^[1] This makes it a deterministic case with no randomness, serving as a trivial or baseline example in statistical modeling.^[2] For a univariate real-valued random variable X, the probability mass function assigns P(X = c) = 1 and P(X = x) = 0 for all x \neq c, while the cumulative distribution function is a step function jumping from 0 to 1 at c.^[3] The expected value (mean) is exactly c, and the variance is 0, reflecting the absence of variability.^[1] In the multivariate setting, a random vector X = (X_1, \dots, X_k) with k > 1 has a degenerate distribution if there exists a non-zero vector a such that a^T X equals a constant with probability 1, implying the support lies on a lower-dimensional hyperplane.^[1] Degenerate distributions often arise as limiting cases of non-degenerate distributions in convergence theorems, such as weak convergence where a sequence of random variables converges in probability to a constant.^[2] They play a key role in foundational results like the weak law of large numbers, where the sample mean converges to its expectation, yielding a degenerate limit at that value.^[2] Although lacking practical randomness, they provide essential theoretical insights into probability measures and distribution theory.^[1]

Fundamentals

Definition

A degenerate distribution is a probability distribution that assigns probability 1 to a single point, known as the degenerate point, in the sample space and probability 0 to all other outcomes.^[3] This makes it the distribution of a constant random variable, where the outcome is deterministic with no randomness involved.^[3] In measure-theoretic probability, a degenerate distribution is formally defined as the Dirac measure \delta_x at a point x, which places all mass at x.^[4] The support of the distribution is the singleton set \{x\}, such that for a random variable X following this distribution, P(X = x) = 1.^[4] In contrast to non-degenerate distributions, which exhibit variability across multiple outcomes, a degenerate distribution has zero variance and lacks any spread, effectively collapsing the probability mass to a single value.^[3] This property distinguishes it as a boundary case in probability theory, often arising in limiting scenarios.

Probability Measures

The Dirac measure, denoted \delta_x, is a fundamental probability measure associated with the degenerate distribution concentrated at a point x in a measurable space (S, \mathcal{S}). It is defined such that for any measurable set A \in \mathcal{S}, \delta_x(A) = 1 if x \in A and \delta_x(A) = 0 otherwise.^[5]^[6] This construction ensures that \delta_x assigns the entire probability mass of 1 to the singleton \{x\}, making it a valid probability measure since \delta_x(S) = 1.^[5] In the context of probability theory, the Dirac measure represents the distribution of a deterministic random variable that takes the value x with probability 1.^[6] For a discrete random variable X following a degenerate distribution at a point a, the probability mass function (PMF) is given by p_X(k) = 1 if k = a and p_X(k) = 0 otherwise, for all k in the support.^[4] This PMF fully captures the measure-theoretic structure, where the probability is entirely concentrated at the single point a, aligning with the Dirac measure \delta_a.^[4] In the continuous setting, a degenerate distribution does not admit a true probability density function with respect to Lebesgue measure, as the support is a single point of measure zero. However, the Dirac delta function \delta(x - a) serves as a generalized density, satisfying the normalization condition \int_{-\infty}^{\infty} \delta(x - a) \, dx = 1.^[7] This generalized function acts as the continuous analog of the Dirac measure, enabling the representation of expectations through integration.^[7] Consequently, for any measurable function f, the expectation with respect to the degenerate distribution is E[f(X)] = f(a), reflecting the concentration of the probability measure at a. This follows directly from the sifting property of the Dirac measure or delta function, where \int f(y) \delta_x(dy) = f(x).^[8]^[7]

Univariate Case

Cumulative Distribution Function

The cumulative distribution function (CDF) of a univariate degenerate random variable X that takes the value a with probability 1 is given by

F_X(x) = \begin{cases} 0 & \text{if } x < a, \\ 1 & \text{if } x \geq a. \end{cases}

This form reflects the concentration of all probability mass at the single point a.^[9]^[10] The CDF F_X(x) is non-decreasing and right-continuous, as required for any valid CDF, with a single jump discontinuity of height 1 at x = a. The left-hand limit at a is 0, and the right-hand limit is 1, while \lim_{x \to -\infty} F_X(x) = 0 and \lim_{x \to \infty} F_X(x) = 1. This step-function behavior arises because the distribution assigns no probability to any interval not containing a, and full probability to those that do.^[9] Graphically, the CDF appears as a horizontal line at height 0 for all x < a, followed by a vertical jump to height 1 at x = a, and then remains constant at 1 for x > a. This representation underscores the deterministic nature of the degenerate distribution. The function is equivalently expressed using the indicator function as F_X(x) = I_{\{x \geq a\}}, where I denotes the indicator that equals 1 if the condition holds and 0 otherwise; it is also known as a shifted Heaviside step function \theta(x - a).^[10]

Moments and Characteristics

The expected value of a univariate degenerate random variable X concentrated at a point a \in \mathbb{R} is E[X] = a, as the distribution assigns probability 1 to the value a.^[11] The variance follows directly as \operatorname{Var}(X) = E[(X - a)^2] = 0, reflecting the complete lack of dispersion in the distribution.^[12] Higher-order raw moments are given by E[X^k] = a^k for any positive integer k, since X = a with probability 1. The central moments \mu_k = E[(X - a)^k] are zero for all k \geq 2, while the first central moment is zero by definition; this underscores the deterministic nature of the distribution, where no variability affects moment calculations beyond the mean. The characteristic function of X is \phi(t) = E[e^{itX}] = e^{ita} for t \in \mathbb{R}, which serves as the Fourier transform of the Dirac delta measure at a.^[13] All measures of location, including the median, mode, and quantiles, coincide at a, as the cumulative distribution function jumps from 0 to 1 exactly at this point. The Shannon entropy H(X) = -\sum p(x) \log p(x) = 0, since the distribution is fully concentrated on a single outcome with probability 1, indicating zero uncertainty.^[14]

Multivariate Case

Geometric Interpretation

In the multivariate setting, a degenerate distribution places its entire probability measure on a lower-dimensional affine subspace of the n-dimensional Euclidean space \mathbb{R}^n, where the dimension k of this subspace satisfies k < n. This concentration arises due to linear dependencies among the random variables, restricting the possible realizations to a proper subset that does not span the full space. Geometrically, the support forms a flat structure such as a point, line, plane, or higher-dimensional hyperplane embedded within \mathbb{R}^n, with the distribution behaving as a non-degenerate probability measure only along this subspace.^[15]^[16] To illustrate in two dimensions, consider a degenerate distribution with k=0, where the support is a single point, assigning probability 1 to a fixed location like (c, c); for k=1, the support reduces to a line, such as all points satisfying x_1 + x_2 = a for some constant a, forming a one-dimensional manifold; in contrast, a non-degenerate case with k=2 would have support filling the entire plane. These examples highlight how degeneracy collapses the geometric extent, preventing the distribution from having positive density across the full ambient space. This structure extends the univariate degenerate distribution, which concentrates on a single point as a 0-dimensional case in \mathbb{R}^1.^[15]^[17]^[16] From a measure-theoretic perspective, a degenerate multivariate distribution is singular with respect to the Lebesgue measure \lambda_n on \mathbb{R}^n whenever k < n, as the support has \lambda_n-measure zero and the distribution assigns probability only to sets intersecting this lower-dimensional subspace. The degree of this degeneracy is captured by the codimension n - k, which quantifies the "deficiency" in dimensionality relative to the full space, influencing properties like the impossibility of defining a density function over \mathbb{R}^n.^[15]^[16]

Covariance Matrix Properties

In the multivariate case, the covariance matrix \Sigma of a degenerate distribution supported on an r-dimensional affine subspace of \mathbb{R}^n (with r < n) is singular, meaning its determinant is zero, and its rank is exactly r, reflecting the lower-dimensional nature of the support.^[18] This rank deficiency arises because the random vector X lies almost surely in a proper subspace, preventing the distribution from having full support in \mathbb{R}^n.^[19] As a symmetric positive semi-definite matrix, \Sigma satisfies x^T \Sigma x \geq 0 for all x \in \mathbb{R}^n, but it is not positive definite due to the existence of non-trivial vectors in its kernel.^[18] The eigenvalues of \Sigma consist of exactly n - r zeros and r non-negative values, with the non-zero eigenvalues determining the spread along the support directions, as per the spectral theorem applied to symmetric matrices.^[19] This structure underscores the semi-definiteness: the zero eigenvalues correspond to directions orthogonal to the support where there is no variance.^[20] A degenerate random vector X \in \mathbb{R}^n can be represented as X = \mu + A Y, where Y \in \mathbb{R}^r follows a non-degenerate distribution (e.g., multivariate normal with positive definite covariance), \mu \in \mathbb{R}^n is the location vector, and A is an n \times r matrix of full column rank r. The covariance matrix then takes the form

\operatorname{Var}(X) = A \operatorname{Var}(Y) A^T,

which inherits the rank r from A and \operatorname{Var}(Y), ensuring the singularity of \Sigma.^[18] This parametrization highlights how the degeneracy propagates through linear mappings from a lower-dimensional space. Such rank deficiency implies linear dependence among the components of X, affecting covariances and precluding mutual independence unless the dependence is trivial. For instance, if the second component satisfies X_2 = c X_1 + d almost surely for constants c, d, then \operatorname{Cov}(X_1, X_2) = c \operatorname{[Var](/page/Var)}(X_1), illustrating how the off-diagonal entries capture the deterministic relationship.^[19]

Applications and Examples

Linear Transformations

Linear transformations of random variables can lead to degenerate distributions when the transformation effectively eliminates variability. A simple case occurs with a constant transformation, where Y = a for some fixed constant a; here, Y follows a degenerate distribution concentrated at a, regardless of the distribution of any underlying random variables.^[3] This reflects the zero variance property inherent to degenerate distributions.^[21] More generally, consider a linear combination Y = c X + d, where X is a random variable and c, d are constants. If c = 0, then Y = d almost surely, yielding a degenerate distribution at d.^[21] Conversely, if X itself is degenerate at some value m, then Y is degenerate at c m + d, preserving the point-mass nature through the transformation.^[21] An illustrative example arises in linear regression models. When the residual term \varepsilon is identically zero, the model achieves a perfect fit, with all residuals degenerate at zero; in this scenario, the observed response values exactly equal the predicted values, resulting in no variability in the errors.^[22] In the multivariate setting, degeneracy manifests when applying an affine transformation Z = A X + b, where X is a random vector in \mathbb{R}^n, A is an m \times n matrix with rank r < m, and b is a constant vector. The resulting distribution of Z is degenerate, supported solely on an affine subspace of dimension r.^[19]

Limit Distributions

A sequence of probability distributions F_n converges in distribution to a degenerate distribution \delta_a if, for every continuity point x of the limiting cumulative distribution function, F_n(x) \to 0 for x < a and F_n(x) \to 1 for x \geq a.^[23] This form of weak convergence captures the concentration of probability mass at the point a, where the limiting random variable equals a with probability 1.^[23] A prominent example occurs in the normal distribution family: as the variance parameter \sigma^2 \to 0 in the N(\mu, \sigma^2) distribution, the probability density concentrates entirely at \mu, yielding the degenerate distribution \delta_\mu.^[24] This limiting behavior illustrates how non-degenerate distributions with shrinking spread approach degeneracy.^[25] In statistical estimation, a sequence of estimators \hat{\theta}_n is consistent for the true parameter \theta if \hat{\theta}_n converges in probability to \theta, implying that the limiting distribution of \hat{\theta}_n is degenerate at \theta.^[26] This convergence ensures that, for large sample sizes, the estimator's variability diminishes, placing all probabilistic weight on the parameter value.^[26] The law of large numbers provides another key instance: for independent and identically distributed random variables X_1, X_2, \dots with finite mean \mu, the sample mean \bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i converges almost surely (and thus in probability and distribution) to \mu, resulting in a degenerate limiting distribution \delta_\mu.^[27] Under finite variance conditions, the weak law of large numbers establishes this via Chebyshev's inequality, highlighting the asymptotic certainty of the average.^[23]

References

[1]
[PDF] Properties of Probability Distributions
The starting point for probability theory and, hence, distribution theory is the concept of ... X = (X1, X2) has a degenerate distribution if, as in the case of a ...
[2]
[PDF] Probability Theory - UChicago Math
Aug 11, 2006 · This paper introduces some elementary notions in Measure-Theoretic. Probability Theory. ... Suppose Xn ⇒ X, and X is a degenerate distribution ...
[3]
[PDF] Chapter 2 Random Variables and Distributions
c is sometimes called a point mass or point distribution or degenerate distribution. ... Probability Theory, Second Edition, by J. S. Rosenthal (World Scientific ...
[4]
[PDF] 2 Probability - Computer Science
(b) a degenerate distribution p(x)=1 if x = 1 and p(x)=0 if x ∈ {2, 3, 4}. Figure generated by discreteProbDistFig. 2.2 A brief review of probability theory.
[5]
[PDF] Lecture 2 Measures
Sep 19, 2013 · Dirac measure. For x ∈ S, we define the set function δx on S ... measure space in probability theory. We use it here to construct the.
[6]
Dirac measure in nLab
Jul 20, 2021 · A Dirac measure is a measure whose (unit) mass is concentrated on a single point x x of a space X X . From the point of view of probability ...
[7]
[PDF] The Dirac delta
The Dirac delta is a mathematical object called a “distribution.” That means that it only makes sense as something that shows up inside an integral alongside an ...
[8]
[PDF] Measure Theory - Columbia University
Example 3 Define a Dirac measure on a metric space in the following way: for some w ∈ X, δ(S) = 1 if w ∈ S. = 0 otherwise.
[9]
[PDF] Expected Utility And Risk Aversion
A cumulative distribution function (cdf) is a function F : R ! [0, 1] which ... δx is the degenerate distribution function at x; i.e. δx yields x with certainty:.
[10]
https://arxiv.org/pdf/1901.09849.pdf
[11]
[PDF] 7 Expected Value
Let X1, X2,..Xn be random variables, and a, b, c and d constants. Then, a. Var(X) = 0 ←→ ∃ c s.t. P(X = c) = 1 (degenerate distribution).
[12]
[PDF] Variance - MATH 451/551 Chapter 3. Random Variables
a degenerate distribution. ► Some authors use V(X) for the population variance. a random variable X. I the population range R = sup(A) inf(A), I the population ...Missing: mass | Show results with:mass
[13]
[PDF] Chapter 3 Independent Sums
which is the characteristic function of the distribution degenerate at m. Hence the distribution of Sn n tends to the degenerate distribution at the point m.
[14]
[PDF] Entropy Properties Joint and conditional entropy Properties - Inria
The degenerate distribution (i.e. constant) has zero entropy. E4 H(X) ≤ log |X|. with equality iff X uniform. The uniform distribution maximizes entropy.
[15]
[PDF] Degenerate Gaussian factors for probabilistic inference - arXiv
Oct 12, 2021 · A useful interpretation of the degenerate factor in (12) is as a lower-dimensional, non-degenerate factor (parametrised by Λ, h and g) expanded ...
[16]
https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/05%3A_Special_Distributions/5.07%3A_The_Multivariate_Normal_Distribution
[17]
[PDF] Multivariate Stable Distributions - EdSpace
But it is also possible for 𝛾(u) = 0 when u ≠ 0. This happens when ⟨u, X⟩ is degenerate, i.e. X is supported on a lower dimensional subset, see. Theorem 1.4 ...Missing: interpretation | Show results with:interpretation
[18]
[PDF] Stochastics III - Uni Ulm
In this section we recall the notion of a multivariate normal distribution and discuss some fundamental properties of this family of distributions. 1.2.1 De ...
[19]
[PDF] 6.436J Lecture 15: Multivariate normal distributions - DSpace@MIT
Oct 29, 2008 · It is easily checked that Cov(X, Y) = E (X − E[X])(Y − E[Y])T . The theorem below includes almost everything useful there is to know about ...
[20]
https://arxiv.org/pdf/1802.04878.pdf
[21]
[PDF] Chapter 2 Multivariate Distributions and Transformations
If Σ is positive semidefinite but not positive definite, then X has a degenerate distribution. For example, the univariate. N(0, 02) distribution is degenerate ...
[22]
[PDF] Inverse Sampling of Degenerate Datasets from a Linear Regression ...
Aug 27, 2021 · In this light, this work revisits linear regression fundamentals, ana- lyzes Anscombe's quartet data, and provides a possible algorithm to ...<|control11|><|separator|>
[23]
[PDF] 5.1 Convergence in Distribution - Mathematics and Statistics
be the cdf of a degenerate distribution with probability mass 1 at x = ϵ. ... as, by elementary probability theory, P(A ∪ B) ≤ P(A) + P(B). But, as it is ...
[24]
[PDF] Lecture 11: An Introduction to The Multivariate Normal Distribution
Singular covariance matrices. When a jointly Normal distribution has a singular covariance matrix, then the density does not exist. But if the matrix has rank ...
[25]
[PDF] Lecture 1. Random vectors and multivariate normal distribution
is normal, with a convention that a degenerate distribution δc has a normal distribution with variance 0, i.e., c ∼ N(c,0). The definition does not require ...
[26]
[PDF] Consistency and Limiting Distributions (Hogg Chapter Five)
On the other hand, in the special case of convergence to a degenerate random variable which is always equal to the same number, it does work.
[27]
[PDF] Brief Asymptotic Theory for 240A - Colin Cameron
For example Xi = xiui. Definition A7: (Law of Large Numbers) A weak law of large numbers (LLN) ... By a LLN XN has a degenerate distribution as it converges to a ...