Euclidean distance matrix

A Euclidean distance matrix (EDM) is a symmetric n \times n matrix D = (d_{ij}) whose off-diagonal entries represent the Euclidean distances between a finite set of n points \{x_1, \dots, x_n\} in some Euclidean space \mathbb{R}^k, with d_{ij} = \|x_i - x_j\| for i \neq j and d_{ii} = 0.^[1] In many mathematical contexts, particularly in optimization and distance geometry, the term EDM specifically refers to the matrix of squared Euclidean distances, d_{ij} = \|x_i - x_j\|^2, to facilitate linear algebraic treatments and avoid square roots in computations.^[2] This squared form is a convex cone and is closely related to positive semidefinite matrices via the Gram matrix of the points, centered at the origin.^[2] Key properties of EDMs include their symmetry, non-negativity of entries, and adherence to the triangle inequality, ensuring they embed isometrically into Euclidean space.^[1] A fundamental characterization, due to Schoenberg, states that a symmetric matrix with zero diagonal is an EDM (in the squared sense) if and only if a certain transformation of it—specifically, \tau(D) = -\frac{1}{2} (I - \frac{1}{n} \mathbf{1}\mathbf{1}^T) D (I - \frac{1}{n} \mathbf{1}\mathbf{1}^T)—is positive semidefinite, with the rank of this matrix giving the embedding dimension of the points.^[3] This theorem, originally from 1935, provides a practical test for whether a given distance matrix arises from points in Euclidean space and underpins algorithms for EDM completion, where partial distance information is filled in to recover full configurations.^[3] EDMs play a central role in distance geometry, a field originating in the early 20th century with contributions from mathematicians like Karl Menger and later formalized by Schoenberg, focusing on realizing geometric structures solely from distance data.^[1] Notable applications include determining molecular conformations in structural biology, where nuclear magnetic resonance (NMR) data yields partial EDMs for protein folding; localizing nodes in wireless sensor networks using measured distances; and classical multidimensional scaling (MDS) for dimensionality reduction in data analysis, where EDMs help project high-dimensional data into lower spaces while preserving distances.^[1] These uses highlight the matrix's utility in bridging abstract metric spaces with concrete Euclidean embeddings, with ongoing research in semidefinite programming for efficient computation and rigidity theory for ensuring unique realizations.^[2]

Fundamentals

Definition

In mathematics, a Euclidean distance matrix is an n \times n symmetric matrix D = (d_{ij}) whose entries represent the pairwise squared Euclidean distances between a finite set of n points in a k-dimensional Euclidean space \mathbb{R}^k. Formally, for points x_1, \dots, x_n \in \mathbb{R}^k, the entry d_{ij} is defined as d_{ij} = \|x_i - x_j\|_2^2, where \|\cdot\|_2 denotes the Euclidean norm \|v\|_2 = \sqrt{v \cdot v}. This construction yields a matrix with zero diagonal entries (d_{ii} = 0) and nonnegative off-diagonal entries that satisfy symmetry (d_{ij} = d_{ji}).^[3]^[2] Such matrices arise directly from any configuration of points in Euclidean space, capturing the geometry of their relative positions. For instance, when k=1, the points lie on a line, and the squared distances reflect one-dimensional separations along that line; in higher dimensions like k=2 or k=3, the matrix encodes planar or spatial arrangements, respectively, with the embedding dimension k potentially up to n-1 for generic points. The matrix provides a complete summary of the point set's metric structure without reference to coordinates, making it useful in fields such as geometry, statistics, and data analysis.^[3]^[2] Unlike a general distance matrix, which need only satisfy the axioms of a metric space (nonnegativity, symmetry, the triangle inequality, and zero diagonal), a Euclidean distance matrix must additionally ensure that the distances are realizable by points in some Euclidean space—a property known as Euclidean embeddability. This stricter condition distinguishes Euclidean distance matrices from those derived from non-Euclidean metrics, such as those on manifolds or graphs, and was first characterized by Schoenberg.^[4]^[3] A fundamental relation for these distances is the expansion of the entry:

d_{ij} = \|x_i\|_2^2 + \|x_j\|_2^2 - 2 x_i \cdot x_j,

which connects the distance matrix to the inner products (or Gram matrix) of the points and facilitates algebraic manipulations.^[3]^[2]

Examples

To illustrate the concept of a Euclidean distance matrix, consider three points on a one-dimensional line at positions x_1 = 0, x_2 = 1, and x_3 = 3. The pairwise squared distances are d_{12} = 1, d_{13} = 9, and d_{23} = 4, yielding the matrix

D = \begin{pmatrix} 0 & 1 & 9 \\ 1 & 0 & 4 \\ 9 & 4 & 0 \end{pmatrix}.

This matrix captures the spacing along the line, where the squared Euclidean norm \|x_i - x_j\|_2^2 determines each entry.^[5] In two dimensions, an equilateral triangle with side length 1 provides another simple configuration. Place the vertices at coordinates (0,0), (1,0), and (0.5, \sqrt{3}/2). The squared distances between each pair are all 1, resulting in the matrix

D = \begin{pmatrix} 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 0 \end{pmatrix}.

Such a matrix reflects the uniform spacing characteristic of regular polygons in the plane.^[5] For a higher-dimensional case, consider four points forming a tetrahedron in three-dimensional space at coordinates (0,0,0), (1,0,0), (0,1,0), and (0,0,1). The pairwise squared distances are 1 between the origin and each of the other points, and 2 between the pairs of axis points, giving the matrix

D = \begin{pmatrix} 0 & 1 & 1 & 1 \\ 1 & 0 & 2 & 2 \\ 1 & 2 & 0 & 2 \\ 1 & 2 & 2 & 0 \end{pmatrix}.

This example demonstrates how the matrix encodes the geometry of non-planar configurations.^[5] In data analysis, Euclidean distance matrices often represent squared distances derived from geographic coordinates of cities on a map, enabling applications such as route optimization or spatial clustering.^[6]

Algebraic Properties

Symmetry and Diagonal

A Euclidean distance matrix D = [d_{ij}] is symmetric, meaning d_{ij} = d_{ji} for all indices i, j = 1, \dots, n, because the Euclidean distance between two points x_i and x_j in \mathbb{R}^p is undirected and thus equals the distance between x_j and x_i. This symmetry arises directly from the definition d_{ij} = \|x_i - x_j\|_2^2, where the squared Euclidean norm satisfies \|x_i - x_j\|_2^2 = \|x_j - x_i\|_2^2.^[2] The diagonal entries of D are zero, so d_{ii} = 0 for all i = 1, \dots, n, as the distance from a point to itself is zero: d_{ii} = \|x_i - x_i\|_2^2 = 0.^[7] This property ensures that D is a hollow matrix, with no non-zero elements on the main diagonal. As a consequence of these properties, D is a real symmetric matrix with zero trace, since the trace is the sum of the diagonal entries, each of which is zero: \operatorname{tr}(D) = \sum_{i=1}^n d_{ii} = 0.^[2] The symmetry and zero diagonal together position D within the subspace of hollow symmetric matrices, facilitating algebraic operations such as eigenvalue decompositions used in downstream applications.^[7]

Metric Properties

A Euclidean distance matrix encodes the squared pairwise Euclidean distances between points in a Euclidean space, from which the actual distances \delta_{ij} = \sqrt{d_{ij}} inherit the metric properties of that space. These properties ensure that the distances form a valid metric, facilitating applications in geometry, optimization, and data analysis. Specifically, the distances satisfy the axioms of positivity and the triangle inequality, which are fundamental to metric spaces.^[3] The positivity axiom requires that distances are nonnegative, with \delta_{ij} \geq 0 for all i, j, and strict positivity \delta_{ij} > 0 for distinct points i \neq j. This follows directly from the definition \delta_{ij} = \| \mathbf{x}_i - \mathbf{x}_j \|, where \| \cdot \| denotes the Euclidean norm, which is zero only for the zero vector and positive otherwise for distinct points \mathbf{x}_i \neq \mathbf{x}_j. Equivalently, for the squared distances, d_{ij} \geq 0 with d_{ij} = 0 if and only if \mathbf{x}_i = \mathbf{x}_j, and the diagonal entries are zero (d_{ii} = 0), reflecting zero distance from a point to itself.^[3] The triangle inequality states that \delta_{ij} \leq \delta_{ik} + \delta_{kj} for all indices i, j, k. This property is inherited from the underlying Euclidean space and holds for any Euclidean distance matrix. To outline the proof, consider the vectors in the space: \delta_{ij} = \| \mathbf{x}_i - \mathbf{x}_j \|. The Euclidean norm satisfies the triangle inequality \| \mathbf{u} + \mathbf{v} \| \leq \| \mathbf{u} \| + \| \mathbf{v} \| for any vectors \mathbf{u}, \mathbf{v}. Substituting \mathbf{u} = \mathbf{x}_i - \mathbf{x}_k and \mathbf{v} = \mathbf{x}_k - \mathbf{x}_j yields \| (\mathbf{x}_i - \mathbf{x}_k) + (\mathbf{x}_k - \mathbf{x}_j) \| \leq \| \mathbf{x}_i - \mathbf{x}_k \| + \| \mathbf{x}_k - \mathbf{x}_j \|, simplifying to \delta_{ij} \leq \delta_{ik} + \delta_{kj}. The full proof of the norm's triangle inequality relies on the Cauchy-Schwarz inequality applied to the inner product expansion.^[8]^[3] In distinction from general metrics, the Euclidean distance is specifically induced by an inner product on the vector space, where the norm derives from \| \mathbf{x} \| = \sqrt{\langle \mathbf{x}, \mathbf{x} \rangle} and the distance from the induced metric d(\mathbf{x}, \mathbf{y}) = \| \mathbf{x} - \mathbf{y} \|. This inner product structure underpins additional algebraic properties of Euclidean distance matrices, such as their relation to Gram matrices.^[9]^[3]

Reconstruction and Inner Products

Relation to Gram Matrix

The Gram matrix G of a set of points \{ \mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_n \} in Euclidean space is defined as the n \times n symmetric matrix with entries G_{ij} = \mathbf{x}_i \cdot \mathbf{x}_j, representing the inner products between the position vectors.^[3] If the points are collected as columns in a matrix X \in \mathbb{R}^{m \times n}, then G = X^T X, which is positive semidefinite.^[3] The Euclidean distance matrix D with entries d_{ij} = \| \mathbf{x}_i - \mathbf{x}_j \| is algebraically linked to G through the squared distances:

d_{ij}^2 = G_{ii} + G_{jj} - 2 G_{ij}.

This equation arises directly from expanding the squared Euclidean norm \| \mathbf{x}_i - \mathbf{x}_j \|^2 = \| \mathbf{x}_i \|^2 + \| \mathbf{x}_j \|^2 - 2 \mathbf{x}_i \cdot \mathbf{x}_j, where G_{ii} = \| \mathbf{x}_i \|^2 and G_{jj} = \| \mathbf{x}_j \|^2.^[3]^[5] In matrix notation, let \mathbf{g} = \operatorname{diag}(G) denote the column vector of diagonal entries of G, and let \mathbf{1} be the all-ones vector of length n, so J = \mathbf{1} \mathbf{1}^T is the all-ones matrix. The squared distance matrix D^{(2)} (with entries d_{ij}^2) then satisfies

D^{(2)} = \mathbf{g} \mathbf{1}^T + \mathbf{1} \mathbf{g}^T - 2 G.

This form highlights the linear dependence of the distance matrix on the Gram matrix entries, enabling transformations between them under appropriate conditions.^[3]^[10] This connection between distance matrices and inner products has roots in classical geometry, with early recognition appearing in the context of Cayley-Menger determinants, introduced by Arthur Cayley in 1841 for computing volumes from distances and later extended by Karl Menger in 1928 to characterize embeddability in Euclidean space.^[11]

Centering and Reconstruction

The centering matrix plays a crucial role in reconstructing the Gram matrix and point coordinates from a Euclidean distance matrix, assuming the points are translated so that their centroid lies at the origin. This matrix, denoted H = I - \frac{1}{n} \mathbf{1}\mathbf{1}^T, where I is the n \times n identity matrix, n is the number of points, and \mathbf{1} is the all-ones vector, projects the data onto the subspace orthogonal to the all-ones vector, effectively centering the configuration by subtracting the mean from each coordinate.^[12]^[6] Given a Euclidean distance matrix D = (d_{ij}) with entries d_{ij} = \| \mathbf{x}_i - \mathbf{x}_j \|, the centered Gram matrix B can be reconstructed using the formula

B = -\frac{1}{2} H (D \circ D) H,

where \circ denotes the Hadamard (element-wise) product, and D \circ D is the matrix of squared distances. This B represents the matrix of inner products \langle \mathbf{x}_i, \mathbf{x}_j \rangle for the centered points \{\mathbf{x}_i\}, enabling the recovery of the configuration up to rotation and reflection. The operation assumes the distance matrix corresponds to points in Euclidean space with centroid at the origin, ensuring B is positive semidefinite with rank equal to the embedding dimension.^[12]^[3]^[6] To obtain the coordinates, perform the eigendecomposition B = V \Lambda V^T, where \Lambda is the diagonal matrix of eigenvalues (retaining only the non-negative ones) and V contains the corresponding orthonormal eigenvectors. The centered point coordinates are then given by X = V \Lambda^{1/2}, an n \times r matrix whose rows (or columns, depending on convention) represent the points in r-dimensional space, where r = \rank(B) is the intrinsic dimension. This reconstruction is unique up to orthogonal transformations, as the eigendecomposition preserves the geometry under such ambiguities.^[3]^[6]

Embeddings and Representations

Multidimensional Scaling

Classical multidimensional scaling (MDS), also known as principal coordinates analysis, is a technique for embedding a set of points represented by a Euclidean distance matrix into a low-dimensional Euclidean space while preserving the interpoint distances exactly. Introduced by Warren S. Torgerson in 1952, classical MDS transforms the distance matrix into coordinates that realize an isometric embedding, making it particularly suited for Euclidean distance matrices where the configuration is uniquely determined up to rigid transformations.^[13] This method leverages the algebraic relation between distances and inner products to recover the point configuration without requiring the original coordinates.^[3] The core of classical MDS relies on reconstructing the centered Gram matrix B from the squared Euclidean distance matrix D^{(2)}, as detailed in the centering process, to obtain the inner products of the points relative to their centroid. Specifically, B = -\frac{1}{2} \mathcal{J} D^{(2)} \mathcal{J}, where \mathcal{J} is the centering matrix, yields a positive semidefinite matrix whose eigendecomposition provides the embedding coordinates. The eigenvalues and eigenvectors of B determine the dimensionality k = \rank(B), with the principal coordinates formed by scaling the top k eigenvectors by the square roots of their corresponding eigenvalues, resulting in points in \mathbb{R}^k that exactly reproduce the input distances.^[3] For a true Euclidean distance matrix, this embedding achieves zero stress, meaning the reconstructed distances match the original ones precisely, unlike approximate variants that minimize a stress function for non-Euclidean dissimilarities. Computationally, classical MDS proceeds in the following steps: first, square all entries of the distance matrix to form D^{(2)}; second, apply double centering to compute B; third, perform eigendecomposition B = V \Lambda V^T, retaining only the positive eigenvalues and their eigenvectors; and finally, construct the coordinate matrix X as the product of the eigenvector matrix (truncated to k dimensions) and the diagonal matrix of square-rooted eigenvalues. This process is efficient for moderate-sized matrices and forms the basis for isometric embeddings in \mathbb{R}^k.^[3] Classical MDS finds extensive applications in psychometrics, where it embeds perceptual or similarity data from surveys—such as judgments of stimulus proximities—into interpretable spatial maps that reveal underlying psychological structures.^[13] In data visualization, it is used to project high-dimensional datasets onto two or three dimensions for exploratory analysis, facilitating the identification of clusters or patterns in fields like social sciences and bioinformatics while preserving Euclidean geometry.

Uniqueness Conditions

In Euclidean space \mathbb{R}^k, a configuration of n > k+1 points that affinely span \mathbb{R}^k is rigid, meaning the pairwise distances uniquely determine the positions up to Euclidean motions, which include rotations, translations, and reflections.^[3] This rigidity ensures that the embedding derived from the corresponding Euclidean distance matrix (EDM) cannot flex or deform while preserving all distances, provided the points are in general position without unintended dependencies. For generic point configurations, the realization of an EDM is unique up to isometry if the matrix has full affine rank k and satisfies global rigidity conditions, such as the underlying complete graph being redundantly rigid and (k+1)-connected. A key theorem states that such a framework is globally rigid in \mathbb{R}^k if it admits an equilibrium stress matrix of rank n - (k + 1), implying no non-congruent realizations exist that satisfy the distance constraints.^[14] These conditions prevent local minima in reconstruction algorithms like multidimensional scaling, where multiple embeddings might otherwise arise from incomplete or noisy data. The degrees of freedom for uniquely determining n points in \mathbb{R}^k up to isometry total nk - \frac{k(k+1)}{2}, accounting for the k translational and \frac{k(k-1)}{2} rotational degrees of freedom in the Euclidean group.^[3] When the EDM provides \binom{n}{2} distance constraints that are consistent and sufficient to match this count of independent parameters, the configuration is fixed modulo isometries, assuming full rank. Non-uniqueness occurs in degenerate cases, such as when all points are collinear, reducing the effective dimension to 1 despite an ambient space of higher k; here, the EDM admits multiple non-congruent embeddings, like reflections across a hyperplane not aligned with the line. Similarly, configurations with points lying on a lower-dimensional subspace, such as coplanar points in \mathbb{R}^3, allow flexing or flipping that preserves distances but violates full-dimensional spanning, leading to infinitely many realizations up to isometry.^[3]

Characterizations

Positive Semidefiniteness Criteria

A fundamental algebraic test for verifying whether a symmetric matrix with zero diagonal entries corresponds to the Euclidean distances among a set of points is Schoenberg's criterion. This criterion establishes a necessary and sufficient condition based on positive semidefiniteness of a transformed matrix.^[4] Let D = (d_{ij}) \in \mathbb{R}^{n \times n} be a symmetric matrix with d_{ii} = 0 for all i and nonnegative off-diagonal entries. Then D is the Euclidean distance matrix for some points in \mathbb{R}^k (with k \leq n-1) if and only if the matrix

B = -\frac{1}{2} H (D \circ D) H

is positive semidefinite, where H = I_n - \frac{1}{n} \mathbf{e} \mathbf{e}^T is the centering matrix, \mathbf{e} is the n-dimensional all-ones vector, and D \circ D denotes the entrywise (Hadamard) product of D with itself, yielding the matrix of squared distances.^[3] The matrix B represents the Gram matrix of the centered configuration of points.^[2] When the input matrix already consists of squared distances, say K = (k_{ij}) \in \mathbb{R}^{n \times n} with k_{ii} = 0, the condition simplifies: K is a squared Euclidean distance matrix if and only if the doubly centered matrix -\frac{1}{2} H K H has non-negative eigenvalues (i.e., is positive semidefinite).^[3] This ensures that the distances satisfy the properties required for embedding in Euclidean space.^[2] The rank of the positive semidefinite matrix B provides the minimal embedding dimension k = \rank(B) \leq n-1, corresponding to the affine dimension of the point configuration.^[2] Schoenberg's criterion was introduced in his seminal 1935 paper, which explored the embedding of metric spaces arising from Euclidean spaces into Hilbert space via transformations of the metric.^[4]

Distance Matrix Completion

In the Euclidean distance matrix completion problem, a partial symmetric matrix with zero diagonal entries is provided, specifying squared distances d_{ij} for only a subset of pairs (i,j), and the goal is to fill in the missing entries to obtain a full matrix that corresponds to the squared distances between points in Euclidean space. This ensures the completed matrix admits an embedding into a low-dimensional Euclidean space, preserving the given distances exactly. The problem is nontrivial because not all partial matrices admit such completions, and finding one requires verifying embeddability conditions while optimizing the unspecified entries.^[15] Existence of a completion depends on the partial entries satisfying certain submatrix conditions derived from extensions of Schoenberg's theorem, specifically that every fully specified principal submatrix must itself be a valid Euclidean distance matrix, meaning its associated Gram matrix is positive semidefinite after centering. These criteria extend the positive semidefiniteness checks for complete matrices to incomplete ones, with a key result stating that completions exist if the underlying graph of specified entries is chordal, allowing positive definite realizations via perfect elimination orderings. For general graphs, the problem is NP-hard, but chordal structures guarantee polynomial-time solvability through iterative completion of cliques.^[16] Algorithms for solving the completion typically rely on semidefinite programming (SDP) relaxations, which formulate the problem as minimizing a convex objective subject to linear constraints on the partial entries and semidefiniteness of the implied Gram matrix. A seminal SDP-based approach uses facial reduction to handle rank constraints efficiently, providing exact solutions when feasible and high-quality approximations otherwise, with computational complexity scaling as O(n^3) for n points in moderate dimensions. More recent nonconvex methods, such as Riemannian optimization on low-rank manifolds, offer scalable alternatives for large-scale instances by avoiding full SDP solves.^[17]^[18] A primary application of Euclidean distance matrix completion is in sensor network localization, where partial distance measurements between sensors (e.g., via ranging signals) are used to infer full positions in \mathbb{R}^d, enabling completion of the distance matrix to reconstruct the network geometry even with missing data from noisy or incomplete links. This formulation unifies anchor-based and anchor-free localization, treating all nodes symmetrically and leveraging SDP to achieve global optimality under mild noise assumptions, with demonstrated robustness in simulations involving up to hundreds of sensors.^[19]