Euclidean distance
The Euclidean distance between two points in Euclidean space is the length of the straight-line segment connecting them, computed as the square root of the sum of the squared differences between their corresponding coordinates.[1] For points P = (x_1, x_2, \dots, x_n) and Q = (y_1, y_2, \dots, y_n) in n-dimensional space, this is given by the formulad(P, Q) = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}. [2] In two dimensions, it corresponds directly to the hypotenuse in the Pythagorean theorem for the right triangle formed by the coordinate differences.[3] This distance metric satisfies the standard properties of a metric: it is non-negative (d(P, Q) \geq 0, with equality if and only if P = Q), symmetric (d(P, Q) = d(Q, P)), and obeys the triangle inequality (d(P, R) \leq d(P, Q) + d(Q, R) for any points P, Q, R).[1] Equivalent to the \ell^2-norm (or L^2-norm) of the vector difference P - Q, it generalizes the ordinary distance in the plane and space to arbitrary finite dimensions, serving as the foundational measure in Euclidean geometry.[2] The concept underpins key geometric structures, such as circles (sets of points at fixed distance from a center) and orthogonality (vectors with zero distance projection).[3] Beyond pure mathematics, Euclidean distance finds extensive applications across disciplines. In physics, it quantifies displacements and trajectories in classical mechanics.[1] In computer science and machine learning, it powers algorithms like k-nearest neighbors for classification and clustering in data analysis.[4] Fields such as sensor network localization, molecular conformation determination in chemistry, and multidimensional scaling in statistics rely on it to reconstruct positions from distance data or analyze similarities in high-dimensional spaces.[5] These uses highlight its role in bridging theoretical geometry with practical problem-solving.[4]
Definition and Formulas
One dimension
In one dimension, the Euclidean distance between two points x and y on the real line is defined as the absolute difference |x - y|.[6] This formulation arises as a special case of the Pythagorean theorem applied to a degenerate right triangle where one leg has zero length, reducing the distance to \sqrt{(x - y)^2 + 0^2} = |x - y|.[7] For instance, the Euclidean distance between the points 3 and 7 is |7 - 3| = 4.[6] This one-dimensional distance intuitively represents the straight-line separation between the points along the number line, providing the simplest measure of how far apart they are without any deviation or curvature.[7] It serves as the foundational concept for extending the Euclidean distance to higher dimensions, such as two dimensions where the Pythagorean theorem fully comes into play.[7]Two dimensions
In two dimensions, the Euclidean distance measures the straight-line separation between two points in the Cartesian plane, extending the one-dimensional case where equal y-coordinates reduce to a difference along the x-axis.[8] Consider two points P = (x_1, y_1) and Q = (x_2, y_2) in the plane. The Euclidean distance d(P, Q) is given by the formula d(P, Q) = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}. This expression arises directly from the Pythagorean theorem, which states that in a right-angled triangle, the square of the hypotenuse equals the sum of the squares of the other two sides.[9][2] To derive the distance, draw a horizontal line segment from P to the point (x_2, y_1), with length \Delta x = |x_2 - x_1|, and a vertical line segment from (x_2, y_1) to Q, with length \Delta y = |y_2 - y_1|. These segments form the legs of a right triangle, where the line segment from P to Q is the hypotenuse. By the Pythagorean theorem, the hypotenuse length is \sqrt{(\Delta x)^2 + (\Delta y)^2}, yielding the distance formula.[8][9] For example, the distance between the points (0, 0) and (3, 4) is \sqrt{(3-0)^2 + (4-0)^2} = \sqrt{9 + 16} = \sqrt{25} = 5.[10] The Euclidean distance inherits the units of the coordinate system, such as meters if x and y represent spatial measurements, ensuring physical consistency.[11] It also scales linearly under uniform scaling of the coordinates by a factor k > 0, meaning d(kP, kQ) = k \cdot d(P, Q), preserving relative proportions in the plane.[2]Higher dimensions
In n-dimensional Euclidean space, denoted \mathbb{R}^n, the Euclidean distance between two points, represented as vectors \mathbf{x} = (x_1, x_2, \dots, x_n) and \mathbf{y} = (y_1, y_2, \dots, y_n), is defined using the Euclidean norm of their difference vector \mathbf{x} - \mathbf{y}.[11][12] The formula is: d(\mathbf{x}, \mathbf{y}) = \|\mathbf{x} - \mathbf{y}\| = \sqrt{\sum_{i=1}^n (x_i - y_i)^2} This expression arises from the vector difference, where the squared differences in each coordinate are summed before taking the square root.[11] The summation over coordinates generalizes the Pythagorean theorem to multiple dimensions (or axes), treating the space as a multi-axial extension where the total distance is the hypotenuse across all perpendicular directions.[12] This reduces to the two-dimensional case when n=2.[11] For a concrete illustration in three dimensions, consider the points (1, 2, 3) and (4, 5, 6). The differences in coordinates are $4-1=3, $5-2=3, and $6-3=3. Squaring these gives $9 + 9 + 9 = 27, and the distance is \sqrt{27} = 3\sqrt{3} \approx 5.196.[1] Computing the Euclidean distance requires evaluating the sum of n terms, resulting in O(n) time complexity.[13]General formula
The Euclidean distance between two points x and y in a Hilbert space is defined as d(x, y) = \sqrt{\langle x - y, x - y \rangle}, where \langle \cdot, \cdot \rangle denotes the inner product on the space.[14][15] This formula arises from the norm induced by the inner product, where the norm of a vector v is given by \|v\| = \sqrt{\langle v, v \rangle}; applying this to the difference vector v = x - y yields the distance as d(x, y) = \|x - y\|.[16] The definition applies to any finite-dimensional Euclidean space, providing a coordinate-free characterization that unifies distances across dimensions without reliance on explicit coordinates.[16] In such spaces, the standard inner product corresponds to the dot product of coordinate representations.[16] The positive definiteness of the inner product—ensuring \langle v, v \rangle > 0 for all nonzero v—guarantees that the resulting distances are real and nonnegative, with d(x, y) = 0 if and only if x = y.[16]Geometric and Metric Properties
Geometric interpretation
The Euclidean distance between two points in Euclidean space represents the length of the straight-line segment connecting them, embodying the intuitive notion of the shortest path in a flat, uncurved geometry. This concept originates from classical Euclidean geometry, where space is assumed to be homogeneous and isotropic, allowing direct measurement along the geodesic that is a straight line. In this framework, any deviation from the straight path would increase the total length, as established by the properties of straight lines in plane and solid geometry.[17] Geometrically, the Euclidean distance can be visualized as the hypotenuse of a right triangle formed by the differences in the coordinates of the two points. For instance, in two dimensions, consider two lattice points such as (0,0) and (3,4); the horizontal and vertical separations form the legs of the triangle, and the straight-line distance is the hypotenuse spanning these differences, measuring 5 units. This interpretation extends naturally to higher dimensions, where in three-dimensional space, the distance between points like (0,0,0) and (1,1,1) traces the space diagonal of a cube, again as the hypotenuse generalized across the coordinate axes. Such visualizations highlight how the distance captures the direct, minimal separation without intermediate curves or bends.[17][18] A key property of Euclidean distance is its preservation under isometries, which are rigid motions such as translations, rotations, and reflections that maintain the structure of the space. These transformations do not alter distances between points, ensuring that the geometric interpretation remains invariant; for example, rotating a pair of points around an axis leaves their separation unchanged. This invariance underscores the foundational role of Euclidean distance in defining the geometry of flat space.[19]Metric space axioms
The Euclidean distance, defined on the vector space \mathbb{R}^n by the general formula d(\mathbf{x}, \mathbf{y}) = \sqrt{\sum_{i=1}^n (x_i - y_i)^2} for \mathbf{x} = (x_1, \dots, x_n) and \mathbf{y} = (y_1, \dots, y_n), satisfies the axioms of a metric space, thereby establishing \mathbb{R}^n as a metric space under this distance function.[20] The non-negativity axiom holds: d(\mathbf{x}, \mathbf{y}) \geq 0 for all \mathbf{x}, \mathbf{y} \in \mathbb{R}^n, with equality if and only if \mathbf{x} = \mathbf{y}. This follows directly from the definition, as the expression under the square root is a sum of squares, which is nonnegative and zero precisely when each x_i = y_i.[21] Symmetry is also satisfied: d(\mathbf{x}, \mathbf{y}) = d(\mathbf{y}, \mathbf{x}) for all \mathbf{x}, \mathbf{y} \in \mathbb{R}^n. This is immediate, since (x_i - y_i)^2 = (y_i - x_i)^2 for each i.[20] The triangle inequality states that d(\mathbf{x}, \mathbf{z}) \leq d(\mathbf{x}, \mathbf{y}) + d(\mathbf{y}, \mathbf{z}) for all \mathbf{x}, \mathbf{y}, \mathbf{z} \in \mathbb{R}^n. To verify this, consider the vectors \mathbf{u} = \mathbf{x} - \mathbf{y} and \mathbf{v} = \mathbf{y} - \mathbf{z}, so \mathbf{x} - \mathbf{z} = \mathbf{u} + \mathbf{v}. The inequality d(\mathbf{x}, \mathbf{z}) \leq d(\mathbf{x}, \mathbf{y}) + d(\mathbf{y}, \mathbf{z}) then reduces to \|\mathbf{u} + \mathbf{v}\| \leq \|\mathbf{u}\| + \|\mathbf{v}\|, where \|\cdot\| denotes the Euclidean norm. This follows from the Cauchy-Schwarz inequality: |\mathbf{u} \cdot \mathbf{v}| \leq \|\mathbf{u}\| \|\mathbf{v}\|, which implies \|\mathbf{u} + \mathbf{v}\|^2 = \|\mathbf{u}\|^2 + 2\mathbf{u} \cdot \mathbf{v} + \|\mathbf{v}\|^2 \leq \|\mathbf{u}\|^2 + 2\|\mathbf{u}\| \|\mathbf{v}\| + \|\mathbf{v}\|^2 = (\|\mathbf{u}\| + \|\mathbf{v}\|)^2. Taking square roots yields the desired result, with equality when \mathbf{u} and \mathbf{v} are linearly dependent and point in the same direction.[21][22] In \mathbb{R}^n, the Euclidean metric is unique up to isometry, meaning any other metric that induces the standard Euclidean geometry on the space is equivalent via a distance-preserving bijection.[23]Squared Euclidean Distance
Definition and computation
The squared Euclidean distance between two points \mathbf{x} = (x_1, \dots, x_n) and \mathbf{y} = (y_1, \dots, y_n) in n-dimensional Euclidean space is defined as the sum of the squared differences of their corresponding coordinates: d^2(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n (x_i - y_i)^2 = \|\mathbf{x} - \mathbf{y}\|^2, where \|\cdot\|^2 denotes the square of the Euclidean norm.[11] This formulation omits the square root operation required in the standard Euclidean distance, which simplifies both manual and algorithmic computations.[24] The avoidance of the square root provides key computational advantages, particularly in programming and numerical methods, as the square root function introduces additional processing overhead and potential floating-point precision issues.[25] For example, in iterative algorithms like k-means clustering, using the squared form accelerates distance calculations across large datasets without altering the relative ordering of distances.[25] Additionally, it facilitates gradient computations in optimization problems, where the partial derivative with respect to \mathbf{x} is simply \nabla_{\mathbf{x}} d^2(\mathbf{x}, \mathbf{y}) = 2(\mathbf{x} - \mathbf{y}), enabling efficient updates in methods such as gradient descent. To illustrate, consider the points (0,0) and (3,4) in two dimensions: the squared Euclidean distance is (3-0)^2 + (4-0)^2 = 9 + 16 = 25. The standard Euclidean distance is the square root of this value.Relation to Euclidean distance
The squared Euclidean distance d^2(\mathbf{x}, \mathbf{y}) is a strictly increasing transformation of the standard Euclidean distance d(\mathbf{x}, \mathbf{y}) for non-negative values, since the squaring function is monotonic on [0, \infty). Consequently, for any points \mathbf{x}, \mathbf{y}, \mathbf{z}, d(\mathbf{x}, \mathbf{y}) < d(\mathbf{x}, \mathbf{z}) if and only if d^2(\mathbf{x}, \mathbf{y}) < d^2(\mathbf{x}, \mathbf{z}), preserving the relative ordering of distances.[11] This monotonic relationship makes minimizing d^2(\mathbf{x}, \mathbf{y}) mathematically equivalent to minimizing d(\mathbf{x}, \mathbf{y}) in optimization problems, as the square root operation does not alter the location of minima. In particular, this equivalence underpins least squares optimization, where the objective function minimizes the sum of squared Euclidean distances (or errors) between observed data and model predictions, facilitating closed-form solutions via linear algebra.[26] Despite these advantages, the squared Euclidean distance fails to qualify as a true metric because it violates the triangle inequality: for some points \mathbf{x}, \mathbf{y}, \mathbf{z}, d^2(\mathbf{x}, \mathbf{z}) > d^2(\mathbf{x}, \mathbf{y}) + d^2(\mathbf{y}, \mathbf{z}). A simple counterexample in one dimension is the points x = 0, y = 0.5, z = 1: here, d^2(x, z) = 1 but d^2(x, y) + d^2(y, z) = 0.25 + 0.25 = 0.5.[27] The squared Euclidean distance is commonly employed in algorithms where absolute scale is irrelevant and only comparative ordering or relative proximity matters, such as nearest-neighbor search or k-means clustering, as it avoids the computational overhead of square roots while yielding identical rankings.Generalizations and Extensions
To non-Euclidean spaces
In non-Euclidean spaces, distance measures adapt to intrinsic geometry defined by curvature, replacing the flat Euclidean metric with path lengths along geodesics. In Riemannian manifolds, the metric tensor g_{ij} governs local geometry, with the infinitesimal arc length given byds = \sqrt{g_{ij} \, dx^i \, dx^j}.
The distance between points p and q is the infimum of \int ds over all smooth curves connecting them, representing the geodesic length.[28] A key example is the sphere, where positive curvature necessitates great-circle distances for surface travel, as straight Euclidean lines (chords) underestimate paths. The haversine formula computes this geodesic distance d between points at latitudes \phi_1, \phi_2 and longitudes \lambda_1, \lambda_2 (in radians) on a sphere of radius R:
a = \sin^2\left(\frac{\phi_2 - \phi_1}{2}\right) + \cos \phi_1 \cos \phi_2 \sin^2\left(\frac{\lambda_2 - \lambda_1}{2}\right),
d = 2R \arcsin(\sqrt{a}).
This avoids numerical instability in cosine-based alternatives for small angles.[29] In practice, global positioning systems (GPS) on Earth employ geodesic distances based on ellipsoidal models such as WGS84 for navigation, as Euclidean distances fail over scales comparable to the planet's radius, leading to errors in route optimization and positioning.[30][31] Hyperbolic spaces, exhibiting negative curvature, further diverge, with the Poincaré disk model confining the geometry to the unit disk |z| < 1. The distance between points z_1 and z_2 is
d(z_1, z_2) = 2 \artanh \left| \frac{z_2 - z_1}{1 - \overline{z_1} z_2} \right|,
or equivalently
d(z_1, z_2) = \cosh^{-1} \left( 1 + \frac{2 |z_1 - z_2|^2}{(1 - |z_1|^2)(1 - |z_2|^2)} \right).
These forms arise from the model's conformal metric ds = \frac{2 |dz|}{1 - |z|^2}, emphasizing exponential expansion away from the origin.[32] Overall, Euclidean distance approximates these only locally in low-curvature regimes, breaking down globally where geodesics diverge or converge non-linearly.[33]