Fact-checked by Grok 2 weeks ago

Degenerate distribution

In , a degenerate distribution is a concentrated entirely on a single point, where a takes a fixed value c with probability 1. This corresponds to the Dirac delta measure centered at c. This makes it a deterministic case with no , serving as a trivial or baseline example in statistical modeling. For a univariate real-valued X, the assigns P(X = c) = 1 and P(X = x) = 0 for all x \neq c, while the is a jumping from 0 to 1 at c. The () is exactly c, and the variance is 0, reflecting the absence of variability. In the multivariate setting, a X = (X_1, \dots, X_k) with k > 1 has a degenerate distribution if there exists a non-zero a such that a^T X equals a constant with probability 1, implying the support lies on a lower-dimensional . Degenerate distributions often arise as limiting cases of non-degenerate distributions in convergence theorems, such as where a of random variables converges in probability to a . They play a key role in foundational results like the , where the sample mean converges to its expectation, yielding a degenerate limit at that value. Although lacking practical , they provide essential theoretical insights into probability measures and distribution theory.

Fundamentals

Definition

A degenerate distribution is a that assigns probability 1 to a single point, known as the degenerate point, in the and probability 0 to all other outcomes. This makes it the distribution of a constant , where the outcome is deterministic with no randomness involved. In measure-theoretic probability, a degenerate distribution is formally defined as the \delta_x at a point x, which places all mass at x. The of the is the set \{x\}, such that for a X following this , P(X = x) = 1. In contrast to non-degenerate distributions, which exhibit variability across multiple outcomes, a degenerate distribution has zero variance and lacks any spread, effectively collapsing the probability mass to a single value. This property distinguishes it as a boundary case in , often arising in limiting scenarios.

Probability Measures

The , denoted \delta_x, is a fundamental probability measure associated with the degenerate distribution concentrated at a point x in a measurable space (S, \mathcal{S}). It is defined such that for any measurable set A \in \mathcal{S}, \delta_x(A) = 1 if x \in A and \delta_x(A) = 0 otherwise. This construction ensures that \delta_x assigns the entire probability mass of 1 to the singleton \{x\}, making it a valid probability measure since \delta_x(S) = 1. In the context of , the represents the distribution of a deterministic random variable that takes the value x with probability 1. For a discrete X following a degenerate distribution at a point a, the (PMF) is given by p_X(k) = 1 if k = a and p_X(k) = 0 otherwise, for all k in the . This PMF fully captures the measure-theoretic structure, where the probability is entirely concentrated at the single point a, aligning with the \delta_a. In the continuous setting, a degenerate distribution does not admit a true probability density function with respect to Lebesgue measure, as the support is a single point of measure zero. However, the Dirac delta function \delta(x - a) serves as a generalized density, satisfying the normalization condition \int_{-\infty}^{\infty} \delta(x - a) \, dx = 1. This generalized function acts as the continuous analog of the Dirac measure, enabling the representation of expectations through integration. Consequently, for any f, the with respect to the degenerate distribution is E[f(X)] = f(a), reflecting the concentration of the at a. This follows directly from the sifting property of the or delta function, where \int f(y) \delta_x(dy) = f(x).

Univariate Case

Cumulative Distribution Function

The (CDF) of a univariate degenerate X that takes the value a with probability 1 is given by F_X(x) = \begin{cases} 0 & \text{if } x < a, \\ 1 & \text{if } x \geq a. \end{cases} This form reflects the concentration of all probability mass at the single point a. The CDF F_X(x) is non-decreasing and right-continuous, as required for any valid CDF, with a single jump discontinuity of height 1 at x = a. The left-hand limit at a is 0, and the right-hand limit is 1, while \lim_{x \to -\infty} F_X(x) = 0 and \lim_{x \to \infty} F_X(x) = 1. This step-function behavior arises because the distribution assigns no probability to any interval not containing a, and full probability to those that do. Graphically, the CDF appears as a horizontal line at height 0 for all x < a, followed by a vertical jump to height 1 at x = a, and then remains constant at 1 for x > a. This representation underscores the deterministic nature of the degenerate distribution. The function is equivalently expressed using the as F_X(x) = I_{\{x \geq a\}}, where I denotes the indicator that equals 1 if the condition holds and 0 otherwise; it is also known as a shifted \theta(x - a).

Moments and Characteristics

The of a univariate degenerate X concentrated at a point a \in \mathbb{R} is E[X] = a, as the assigns probability 1 to the value a. The variance follows directly as \operatorname{Var}(X) = E[(X - a)^2] = 0, reflecting the complete lack of dispersion in the . Higher-order raw moments are given by E[X^k] = a^k for any positive k, since X = a with probability 1. The s \mu_k = E[(X - a)^k] are zero for all k \geq 2, while the first central moment is zero by definition; this underscores the deterministic nature of the distribution, where no variability affects moment calculations beyond the . The characteristic function of X is \phi(t) = E[e^{itX}] = e^{ita} for t \in \mathbb{R}, which serves as the of the Dirac delta measure at a. All measures of location, including the median, mode, and quantiles, coincide at a, as the cumulative distribution function jumps from 0 to 1 exactly at this point. The Shannon entropy H(X) = -\sum p(x) \log p(x) = 0, since the distribution is fully concentrated on a single outcome with probability 1, indicating zero uncertainty.

Multivariate Case

Geometric Interpretation

In the multivariate setting, a degenerate distribution places its entire probability measure on a lower-dimensional affine subspace of the n-dimensional Euclidean space \mathbb{R}^n, where the dimension k of this subspace satisfies k < n. This concentration arises due to linear dependencies among the random variables, restricting the possible realizations to a proper subset that does not span the full space. Geometrically, the support forms a flat structure such as a point, line, plane, or higher-dimensional hyperplane embedded within \mathbb{R}^n, with the distribution behaving as a non-degenerate probability measure only along this subspace. To illustrate in two dimensions, consider a degenerate distribution with k=0, where the support is a single point, assigning probability 1 to a fixed location like (c, c); for k=1, the support reduces to a line, such as all points satisfying x_1 + x_2 = a for some constant a, forming a one-dimensional manifold; in contrast, a non-degenerate case with k=2 would have support filling the entire plane. These examples highlight how degeneracy collapses the geometric extent, preventing the distribution from having positive density across the full ambient space. This structure extends the univariate degenerate distribution, which concentrates on a single point as a 0-dimensional case in \mathbb{R}^1. From a measure-theoretic , a degenerate multivariate is singular with respect to the \lambda_n on \mathbb{R}^n whenever k < n, as the has \lambda_n-measure zero and the assigns probability only to sets intersecting this lower-dimensional . The of this degeneracy is captured by the n - k, which quantifies the "deficiency" in dimensionality relative to the full space, influencing properties like the impossibility of defining a density function over \mathbb{R}^n.

Covariance Matrix Properties

In the multivariate case, the \Sigma of a degenerate supported on an r-dimensional affine of \mathbb{R}^n (with r < n) is singular, meaning its is zero, and its is exactly r, reflecting the lower-dimensional nature of the . This rank deficiency arises because the random vector X lies in a proper , preventing the from having full in \mathbb{R}^n. As a symmetric positive semi-definite matrix, \Sigma satisfies x^T \Sigma x \geq 0 for all x \in \mathbb{R}^n, but it is not positive definite due to the existence of non-trivial vectors in its . The eigenvalues of \Sigma consist of exactly n - r zeros and r non-negative values, with the non-zero eigenvalues determining the spread along the support directions, as per the applied to symmetric matrices. This structure underscores the semi-definiteness: the zero eigenvalues correspond to directions orthogonal to the support where there is no variance. A degenerate random vector X \in \mathbb{R}^n can be represented as X = \mu + A Y, where Y \in \mathbb{R}^r follows a non-degenerate distribution (e.g., multivariate normal with positive definite covariance), \mu \in \mathbb{R}^n is the location vector, and A is an n \times r matrix of full column rank r. The covariance matrix then takes the form \operatorname{Var}(X) = A \operatorname{Var}(Y) A^T, which inherits the rank r from A and \operatorname{Var}(Y), ensuring the singularity of \Sigma. This parametrization highlights how the degeneracy propagates through linear mappings from a lower-dimensional space. Such rank deficiency implies linear dependence among the components of X, affecting covariances and precluding mutual unless the dependence is trivial. For instance, if the second component satisfies X_2 = c X_1 + d for constants c, d, then \operatorname{Cov}(X_1, X_2) = c \operatorname{[Var](/page/Var)}(X_1), illustrating how the off-diagonal entries capture the deterministic relationship.

Applications and Examples

Linear Transformations

Linear transformations of random variables can lead to degenerate distributions when the transformation effectively eliminates variability. A simple case occurs with a constant transformation, where Y = a for some fixed constant a; here, Y follows a concentrated at a, regardless of the distribution of any underlying random variables. This reflects the zero variance property inherent to degenerate distributions. More generally, consider a Y = c X + d, where X is a and c, d are constants. If c = 0, then Y = d , yielding a degenerate distribution at d. Conversely, if X itself is degenerate at some value m, then Y is degenerate at c m + d, preserving the point-mass nature through the transformation. An illustrative example arises in linear regression models. When the residual term \varepsilon is identically zero, the model achieves a perfect fit, with all residuals degenerate at zero; in this scenario, the observed response values exactly equal the predicted values, resulting in no variability in the errors. In the multivariate setting, degeneracy manifests when applying an affine transformation Z = A X + b, where X is a random vector in \mathbb{R}^n, A is an m \times n matrix with rank r < m, and b is a constant vector. The resulting distribution of Z is degenerate, supported solely on an affine subspace of dimension r.

Limit Distributions

A sequence of probability distributions F_n converges in distribution to a degenerate distribution \delta_a if, for every continuity point x of the limiting , F_n(x) \to 0 for x < a and F_n(x) \to 1 for x \geq a. This form of captures the concentration of probability mass at the point a, where the limiting equals a with probability 1. A prominent example occurs in the normal distribution family: as the variance parameter \sigma^2 \to 0 in the N(\mu, \sigma^2) distribution, the probability density concentrates entirely at \mu, yielding the degenerate distribution \delta_\mu. This limiting behavior illustrates how non-degenerate distributions with shrinking spread approach degeneracy. In statistical , a sequence of estimators \hat{\theta}_n is consistent for the true \theta if \hat{\theta}_n converges in probability to \theta, implying that the limiting of \hat{\theta}_n is degenerate at \theta. This ensures that, for large sample sizes, the estimator's variability diminishes, placing all probabilistic weight on the value. The provides another key instance: for independent and identically distributed random variables X_1, X_2, \dots with finite mean \mu, the sample mean \bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i converges (and thus in probability and ) to \mu, resulting in a degenerate limiting \delta_\mu. Under finite variance conditions, the weak law of large numbers establishes this via , highlighting the asymptotic certainty of the average.

References

  1. [1]
    [PDF] Properties of Probability Distributions
    The starting point for probability theory and, hence, distribution theory is the concept of ... X = (X1, X2) has a degenerate distribution if, as in the case of a ...
  2. [2]
    [PDF] Probability Theory - UChicago Math
    Aug 11, 2006 · This paper introduces some elementary notions in Measure-Theoretic. Probability Theory. ... Suppose Xn ⇒ X, and X is a degenerate distribution ...
  3. [3]
    [PDF] Chapter 2 Random Variables and Distributions
    c is sometimes called a point mass or point distribution or degenerate distribution. ... Probability Theory, Second Edition, by J. S. Rosenthal (World Scientific ...
  4. [4]
    [PDF] 2 Probability - Computer Science
    (b) a degenerate distribution p(x)=1 if x = 1 and p(x)=0 if x ∈ {2, 3, 4}. Figure generated by discreteProbDistFig. 2.2 A brief review of probability theory.
  5. [5]
    [PDF] Lecture 2 Measures
    Sep 19, 2013 · Dirac measure. For x ∈ S, we define the set function δx on S ... measure space in probability theory. We use it here to construct the.
  6. [6]
    Dirac measure in nLab
    Jul 20, 2021 · A Dirac measure is a measure whose (unit) mass is concentrated on a single point x x of a space X X . From the point of view of probability ...
  7. [7]
    [PDF] The Dirac delta
    The Dirac delta is a mathematical object called a “distribution.” That means that it only makes sense as something that shows up inside an integral alongside an ...
  8. [8]
    [PDF] Measure Theory - Columbia University
    Example 3 Define a Dirac measure on a metric space in the following way: for some w ∈ X, δ(S) = 1 if w ∈ S. = 0 otherwise.
  9. [9]
    [PDF] Expected Utility And Risk Aversion
    A cumulative distribution function (cdf) is a function F : R ! [0, 1] which ... δx is the degenerate distribution function at x; i.e. δx yields x with certainty:.
  10. [10]
  11. [11]
    [PDF] 7 Expected Value
    Let X1, X2,..Xn be random variables, and a, b, c and d constants. Then, a. Var(X) = 0 ←→ ∃ c s.t. P(X = c) = 1 (degenerate distribution).
  12. [12]
    [PDF] Variance - MATH 451/551 Chapter 3. Random Variables
    a degenerate distribution. ► Some authors use V(X) for the population variance. a random variable X. I the population range R = sup(A) inf(A), I the population ...Missing: mass | Show results with:mass
  13. [13]
    [PDF] Chapter 3 Independent Sums
    which is the characteristic function of the distribution degenerate at m. Hence the distribution of Sn n tends to the degenerate distribution at the point m.
  14. [14]
    [PDF] Entropy Properties Joint and conditional entropy Properties - Inria
    The degenerate distribution (i.e. constant) has zero entropy. E4 H(X) ≤ log |X|. with equality iff X uniform. The uniform distribution maximizes entropy.
  15. [15]
    [PDF] Degenerate Gaussian factors for probabilistic inference - arXiv
    Oct 12, 2021 · A useful interpretation of the degenerate factor in (12) is as a lower-dimensional, non-degenerate factor (parametrised by Λ, h and g) expanded ...
  16. [16]
  17. [17]
    [PDF] Multivariate Stable Distributions - EdSpace
    But it is also possible for 𝛾(u) = 0 when u ≠ 0. This happens when ⟨u, X⟩ is degenerate, i.e. X is supported on a lower dimensional subset, see. Theorem 1.4 ...Missing: interpretation | Show results with:interpretation
  18. [18]
    [PDF] Stochastics III - Uni Ulm
    In this section we recall the notion of a multivariate normal distribution and discuss some fundamental properties of this family of distributions. 1.2.1 De ...
  19. [19]
    [PDF] 6.436J Lecture 15: Multivariate normal distributions - DSpace@MIT
    Oct 29, 2008 · It is easily checked that Cov(X, Y) = E (X − E[X])(Y − E[Y])T . The theorem below includes almost everything useful there is to know about ...
  20. [20]
  21. [21]
    [PDF] Chapter 2 Multivariate Distributions and Transformations
    If Σ is positive semidefinite but not positive definite, then X has a degenerate distribution. For example, the univariate. N(0, 02) distribution is degenerate ...
  22. [22]
    [PDF] Inverse Sampling of Degenerate Datasets from a Linear Regression ...
    Aug 27, 2021 · In this light, this work revisits linear regression fundamentals, ana- lyzes Anscombe's quartet data, and provides a possible algorithm to ...<|control11|><|separator|>
  23. [23]
    [PDF] 5.1 Convergence in Distribution - Mathematics and Statistics
    be the cdf of a degenerate distribution with probability mass 1 at x = ϵ. ... as, by elementary probability theory, P(A ∪ B) ≤ P(A) + P(B). But, as it is ...
  24. [24]
    [PDF] Lecture 11: An Introduction to The Multivariate Normal Distribution
    Singular covariance matrices. When a jointly Normal distribution has a singular covariance matrix, then the density does not exist. But if the matrix has rank ...
  25. [25]
    [PDF] Lecture 1. Random vectors and multivariate normal distribution
    is normal, with a convention that a degenerate distribution δc has a normal distribution with variance 0, i.e., c ∼ N(c,0). The definition does not require ...
  26. [26]
    [PDF] Consistency and Limiting Distributions (Hogg Chapter Five)
    On the other hand, in the special case of convergence to a degenerate random variable which is always equal to the same number, it does work.
  27. [27]
    [PDF] Brief Asymptotic Theory for 240A - Colin Cameron
    For example Xi = xiui. Definition A7: (Law of Large Numbers) A weak law of large numbers (LLN) ... By a LLN XN has a degenerate distribution as it converges to a ...