Fact-checked by Grok 2 weeks ago

Minkowski distance

The Minkowski distance, also known as the \ell_p distance, is a metric defined in an n-dimensional normed vector space to measure the distance between two points \mathbf{x} = (x_1, \dots, x_n) and \mathbf{y} = (y_1, \dots, y_n), generalizing common distance measures like the Euclidean and Manhattan distances through a tunable parameter p \geq 1. The formula is given by d_p(\mathbf{x}, \mathbf{y}) = \left( \sum_{i=1}^n |x_i - y_i|^p \right)^{1/p}, which satisfies the properties of a metric, including non-negativity, symmetry, and the triangle inequality for p \geq 1. This distance is named after the German mathematician Hermann Minkowski, who introduced foundational concepts related to norms and convex bodies in his 1910 work Geometrie der Zahlen, laying the groundwork for such generalized metrics in the geometry of numbers. Special cases of the Minkowski distance correspond to familiar metrics: when p = 1, it reduces to the Manhattan distance (or taxicab distance), d_1(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n |x_i - y_i|, which measures distance along axis-aligned paths; for p = 2, it becomes the , d_2(\mathbf{x}, \mathbf{y}) = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}, representing the straight-line distance in ; and as p \to \infty, it approaches the Chebyshev distance, d_\infty(\mathbf{x}, \mathbf{y}) = \max_i |x_i - y_i|, focusing on the maximum coordinate difference. These variations allow the Minkowski distance to adapt to different data characteristics, such as emphasizing large differences (higher p) or cumulative deviations (lower p). In applications, the Minkowski distance is widely used in fields like for tasks such as k-nearest neighbors classification, clustering algorithms (e.g., k-means variants), and similarity search in high-dimensional data, where the choice of p influences sensitivity to outliers or feature scales. It also appears in optimization, , and , often requiring of features to ensure meaningful comparisons across dimensions. The metric's flexibility stems from its roots in , where it induces the \ell_p-norm on vectors, enabling rigorous study of convergence and approximation in infinite-dimensional spaces.

Fundamentals

Definition

The Minkowski distance, also known as the L_p metric, is a measure of the distance between two points in a , parameterized by a positive p. For two points \mathbf{x} = (x_1, \dots, x_n) and \mathbf{y} = (y_1, \dots, y_n) in \mathbb{R}^n, the Minkowski distance of order p (where $1 \leq p < \infty) is defined as D_p(\mathbf{x}, \mathbf{y}) = \left( \sum_{i=1}^n |x_i - y_i|^p \right)^{1/p}. The parameter p controls the sensitivity of the distance to the differences in individual coordinates: larger values of p emphasize larger deviations more heavily relative to smaller ones, while the requirement p \geq 1 ensures that the function satisfies the properties of a metric, including the . This distance arises directly from the L_p norm of the difference vector \mathbf{x} - \mathbf{y}, where the L_p norm is given by \|\mathbf{z}\|_p = \left( \sum_{i=1}^n |z_i|^p \right)^{1/p} for \mathbf{z} \in \mathbb{R}^n, so D_p(\mathbf{x}, \mathbf{y}) = \|\mathbf{x} - \mathbf{y}\|_p. For the limiting case p = \infty, the Minkowski distance is defined as D_\infty(\mathbf{x}, \mathbf{y}) = \max_{1 \leq i \leq n} |x_i - y_i|, which corresponds to the L_\infty norm and measures the maximum absolute difference across coordinates. To illustrate, consider the points (0,0) and (3,4) in \mathbb{R}^2: for p=1, D_1 = |3-0| + |4-0| = 7; for p=2, D_2 = \sqrt{3^2 + 4^2} = 5; and for p=\infty, D_\infty = \max(3,4) = 4.

Properties

The Minkowski distance d_p(\mathbf{x}, \mathbf{y}) = \|\mathbf{x} - \mathbf{y}\|_p for $1 \leq p \leq \infty satisfies the axioms of a metric in finite-dimensional Euclidean spaces. It is non-negative, with d_p(\mathbf{x}, \mathbf{y}) \geq 0 and d_p(\mathbf{x}, \mathbf{y}) = 0 if and only if \mathbf{x} = \mathbf{y}, due to the properties of the L_p norm. It is also symmetric, as d_p(\mathbf{x}, \mathbf{y}) = d_p(\mathbf{y}, \mathbf{x}), since the norm of \mathbf{x} - \mathbf{y} equals the norm of \mathbf{y} - \mathbf{x}. The triangle inequality d_p(\mathbf{x}, \mathbf{z}) \leq d_p(\mathbf{x}, \mathbf{y}) + d_p(\mathbf{y}, \mathbf{z}) holds for p \geq 1, established by applying to the difference vectors: \|\mathbf{x} - \mathbf{z}\|_p = \|(\mathbf{x} - \mathbf{y}) + (\mathbf{y} - \mathbf{z})\|_p \leq \|\mathbf{x} - \mathbf{y}\|_p + \|\mathbf{y} - \mathbf{z}\|_p. , which states that for vectors \mathbf{u}, \mathbf{v} in \mathbb{R}^n and $1 \leq p \leq \infty, \|\mathbf{u} + \mathbf{v}\|_p \leq \|\mathbf{u}\|_p + \|\mathbf{v}\|_p, is proven using and relies on the convexity of the function t \mapsto |t|^p for p \geq 1. This confirms that the L_p norm induces a metric topology for p \geq 1. In finite-dimensional spaces, all L_p norms for $1 \leq p \leq \infty are equivalent, meaning there exist positive constants c_1, c_2 (depending on the dimension n) such that c_1 \|\mathbf{x}\|_q \leq \|\mathbf{x}\|_p \leq c_2 \|\mathbf{x}\|_q for all \mathbf{x} \in \mathbb{R}^n and any $1 \leq p, q \leq \infty. This equivalence follows from the compactness of the unit sphere in one norm and the continuity of another norm with respect to it, ensuring they generate the same topology. Consequently, the unit ball \{ \mathbf{x} \in \mathbb{R}^n : \|\mathbf{x}\|_p \leq 1 \} is convex for p \geq 1, as it is the unit ball of a norm, which satisfies absolute homogeneity and the triangle inequality. As p increases from 1 to \infty, the Minkowski distance places greater emphasis on the largest coordinate differences between points, with the unit ball transitioning from a diamond shape (for p=1) toward a square aligned with the axes (for p=\infty). For $0 < p < 1, the Minkowski distance fails to satisfy the triangle inequality, so it is not a metric; for example, in \mathbb{R}^2 with p=0.5, the points \mathbf{a} = (0,0), \mathbf{b} = (1,0), and \mathbf{c} = (0,1) yield d_{0.5}(\mathbf{b}, \mathbf{c}) = 2^{1/0.5} = 4 > 2 = d_{0.5}(\mathbf{a}, \mathbf{b}) + d_{0.5}(\mathbf{a}, \mathbf{c}). This violation arises because the function t \mapsto t^p is rather than for p < 1, preventing the necessary subadditivity.

Special Cases

Manhattan Distance (p=1)

The Manhattan distance arises as the specific instance of the Minkowski distance when p = 1, defined for two points \mathbf{x} = (x_1, \dots, x_n) and \mathbf{y} = (y_1, \dots, y_n) in \mathbb{R}^n by the formula d_1(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n |x_i - y_i|. This metric is equivalently known as the \ell_1-norm (or simply L1 norm) of the difference vector \mathbf{x} - \mathbf{y}, and it is commonly referred to as the taxicab distance due to its interpretation in grid-based navigation. Geometrically, the Manhattan distance measures the length of the shortest path between two points that follows only axis-aligned directions, analogous to traveling along city streets on a rectangular grid. In two dimensions, this path consists of horizontal and vertical segments, unlike the straight-line path of . The unit ball under this metric—the set of points at distance at most 1 from the origin—is a diamond shape (a square rotated 45 degrees with vertices at (\pm 1, 0) and (0, \pm 1)), which is the two-dimensional . In higher dimensions, it generalizes to the n-dimensional . To illustrate in two dimensions, consider a right triangle with vertices at (0,0), (3,0), and (0,4). The Manhattan distance from (0,0) to (3,4) is |3-0| + |4-0| = 7, representing the path along the axes (length 3 horizontally and 4 vertically). In contrast, the straight-line Euclidean distance is \sqrt{3^2 + 4^2} = 5, highlighting how the Manhattan path is longer due to its restriction to grid directions. This difference becomes visually apparent when plotting the axis-aligned route versus the hypotenuse, emphasizing the metric's emphasis on coordinate-wise deviations. A key characteristic of the Manhattan distance is its additive separability across coordinates: the total distance is simply the sum of absolute differences in each dimension, with no interaction between features. This property makes it particularly suitable for scenarios where dimensions are independent, such as in high-dimensional data analysis where coordinate-wise computations simplify processing. Additionally, in optimization contexts, the L1 norm underlying the promotes sparsity by driving many coefficients to exactly zero, a phenomenon exploited in methods like for variable selection in linear models. Although the general framework of L_p-norms was introduced by Hermann Minkowski in the early 20th century, the taxicab interpretation of the p=1 case was popularized by Karl Menger in 1952 through an educational exhibit, predating the widespread use of the term "Manhattan distance" in computing. It fits naturally within the L_p family as the limiting case for p=1.

Euclidean Distance (p=2)

The Euclidean distance, a special case of the Minkowski distance with parameter p=2, is defined as the \ell_2-norm (or L2 norm) of the difference vector between two points \mathbf{x} = (x_1, \dots, x_n) and \mathbf{y} = (y_1, \dots, y_n) in \mathbb{R}^n: d_2(\mathbf{x}, \mathbf{y}) = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}. This formula arises from generalizing the Pythagorean theorem to higher dimensions, measuring the length of the straight-line segment connecting the points. Geometrically, the Euclidean distance interprets the shortest path between points in a flat, isotropic space, akin to classical Euclidean geometry where directions are equivalent. The set of points at distance 1 from the origin forms the unit ball, which is a hypersphere (or n-sphere in n dimensions), defined by the equation \sum_{i=1}^n x_i^2 = 1. This spherical symmetry distinguishes it from other norms, enabling rotationally invariant measurements. Unique to p=2, the Euclidean distance induces an inner product structure on the space, given by \langle \mathbf{x}, \mathbf{y} \rangle = \sum_{i=1}^n x_i y_i, from which the norm derives as \|\mathbf{x}\| = \sqrt{\langle \mathbf{x}, \mathbf{x} \rangle}. This supports key properties like orthogonality: two vectors \mathbf{u} and \mathbf{v} are orthogonal if \langle \mathbf{u}, \mathbf{v} \rangle = 0, leading to the Pythagorean theorem \|\mathbf{u} + \mathbf{v}\|^2 = \|\mathbf{u}\|^2 + \|\mathbf{v}\|^2 for orthogonal pairs. In infinite-dimensional Hilbert spaces, which complete finite-dimensional Euclidean spaces, Parseval's identity holds: for an orthonormal basis \{e_i\}, \|\mathbf{f}\|^2 = \sum_i |\langle \mathbf{f}, e_i \rangle|^2, preserving energy or norm across expansions. These traits make Euclidean spaces complete inner product spaces, foundational for linear algebra and functional analysis. In physics, the Euclidean distance corresponds to the physical displacement in isotropic media, where space lacks preferred directions, as in Newtonian mechanics for particle trajectories. It underpins least squares optimization, minimizing the sum of squared residuals—equivalent to the squared L2 distance from data points to a model line or surface—to find best-fit parameters under Gaussian noise assumptions. For example, in 3D coordinate geometry, the distance between points (0,0,0) and (1,2,3) is d_2 = \sqrt{(1-0)^2 + (2-0)^2 + (3-0)^2} = \sqrt{14}, representing the straight-line separation in applications like or .

Chebyshev Distance (p=∞)

The Chebyshev distance represents the limiting case of the Minkowski distance as the parameter p approaches infinity, also known as the L^\infty norm of the difference between two points. For vectors \mathbf{x} = (x_1, x_2, \dots, x_n) and \mathbf{y} = (y_1, y_2, \dots, y_n) in \mathbb{R}^n, it is defined by the formula D_\infty(\mathbf{x}, \mathbf{y}) = \|\mathbf{x} - \mathbf{y}\|_\infty = \max_{i=1,\dots,n} |x_i - y_i|. This metric emphasizes the largest absolute difference across coordinates, rendering it insensitive to variations in all but the dominant dimension. Geometrically, the Chebyshev distance corresponds to the chessboard distance, equivalent to the minimum number of moves a king requires to travel between two positions on an empty chessboard, as the king can move horizontally, vertically, or diagonally. The unit ball under this norm forms a hypercube aligned with the coordinate axes, exhibiting uniform scaling in all directions while highlighting extreme deviations. The convergence of the L^p distance to the L^\infty distance as p \to \infty arises because the term with the maximum absolute difference dominates the sum when raised to high powers, a result provable through limits of the p-th root of powered sums. For example, in two dimensions, the Chebyshev distance between (0,0) and (3,4) is \max(|3-0|, |4-0|) = 4, focusing solely on the larger coordinate gap.

Generalizations

Cases for 0 < p < 1

The Minkowski distance for $0 < p < 1 retains the general form d_p(\mathbf{x}, \mathbf{y}) = \left( \sum_{i=1}^n |x_i - y_i|^p \right)^{1/p}, where \mathbf{x}, \mathbf{y} \in \mathbb{R}^n, but this parameterization no longer defines a true metric on the space. Unlike the cases where p \geq 1, the triangle inequality fails, rendering it unsuitable for applications requiring strict metric properties. A concrete counterexample illustrates this violation: consider points \mathbf{a} = (0,0), \mathbf{b} = (1,0), and \mathbf{c} = (1,1) in \mathbb{R}^2. Then d_p(\mathbf{a}, \mathbf{b}) = 1, d_p(\mathbf{b}, \mathbf{c}) = 1, but d_p(\mathbf{a}, \mathbf{c}) = 2^{1/p}. For $0 < p < 1, $1/p > 1, so $2^{1/p} > 2, yielding d_p(\mathbf{a}, \mathbf{c}) > d_p(\mathbf{a}, \mathbf{b}) + d_p(\mathbf{b}, \mathbf{c}), which breaches the triangle inequality. Despite this, the L_p structure induces a quasi-metric, satisfying a relaxed of the form \| \mathbf{x} + \mathbf{y} \|_p \leq 2^{1/p - 1} (\| \mathbf{x} \|_p + \| \mathbf{y} \|_p). The associated unit ball \{ \mathbf{x} \in \mathbb{R}^n : \sum_{i=1}^n |x_i|^p \leq 1 \} is non-convex; for instance, the points (1,0) and (0,1) lie on the , but their (0.5, 0.5) satisfies \sum |0.5|^p = 2 \cdot (0.5)^p = 2^{1-p} > 1, placing it outside the ball. Additionally, the p-norm reverses in a powered form: \| \mathbf{x} + \mathbf{y} \|_p^p \leq \| \mathbf{x} \|_p^p + \| \mathbf{y} \|_p^p. In practice, the fractional Minkowski distance (for $0 < p < 1) finds utility as a dissimilarity measure in scenarios where metric axioms are not essential, such as hierarchical clustering of high-dimensional or heavy-tailed data, where it enhances robustness to noise compared to Euclidean distances.

Variations and Extensions

One prominent variation of the Minkowski distance incorporates weights to account for differing importance across dimensions. The weighted Minkowski distance between two points x = (x_1, \dots, x_n) and y = (y_1, \dots, y_n) in \mathbb{R}^n is defined as d_p(x, y; w) = \left( \sum_{i=1}^n w_i |x_i - y_i|^p \right)^{1/p}, where w = (w_1, \dots, w_n) is a vector of positive weights w_i > 0. This formulation allows the distance to emphasize or de-emphasize specific coordinates, making it useful when features have unequal relevance, while preserving the metric properties for p \geq 1. The Minkowski distance extends beyond Euclidean spaces to general normed vector spaces, where it arises as the norm-induced metric \|x - y\|_p. In broader metric spaces, it can be realized through embeddings into Banach spaces equipped with an L_p-norm, enabling its application to abstract structures not directly coordinatized by \mathbb{R}^n. Analogous constructions appear in non-Archimedean settings, such as p-adic analysis, where the p-adic valuation replaces the absolute value to define the supremum norm \max_i |x_i - y_i|_p on \mathbb{Q}_p^n, yielding an ultrametric analogous to the Chebyshev distance (p=∞ case) of the Minkowski family. The Minkowski distance bears a direct relation to power means, specifically as a scaled version of the p-th power mean applied to the absolute component-wise differences. For points x and y, the distance d_p(x, y) equals n^{1/p} times the p-th power mean of the values |x_i - y_i| for i = 1, \dots, n, where the power mean of order p is \left( \frac{1}{n} \sum_{i=1}^n |x_i - y_i|^p \right)^{1/p}. This connection highlights how varying p interpolates between means like the (p=1) and quadratic mean (p=2), providing a unified perspective on the distance's sensitivity to outliers. Further extensions include the , which generalizes the Minkowski distance for p=2 by incorporating a positive definite \Sigma, yielding d(x, y) = \sqrt{(x - y)^T \Sigma^{-1} (x - y)}. This accounts for correlations and scales among variables, extending the case to ellipsoidal geometries in multivariate settings. Fractional values of p, such as p=0.25, have been explored in dissimilarity measures for data in non-integer-dimensional contexts like fractals, where the metric adapts to irregular structures while relaxing strict compliance. In , the Minkowski distance generalizes to L_p spaces over arbitrary measure spaces (\Omega, \mu), not limited to finite-dimensional \mathbb{R}^n. Here, for measurable functions f and g, the distance is the L_p-norm \|f - g\|_p = \left( \int_\Omega |f(\omega) - g(\omega)|^p \, d\mu(\omega) \right)^{1/p} for $1 \leq p < \infty, inducing a on the space of equivalence classes of functions. This extension supports applications in infinite-dimensional settings, such as probability densities or signals, where the replaces .

Applications

In Machine Learning and Data Analysis

In , the Minkowski distance serves as a tunable in the k-nearest neighbors (KNN) , where the parameter p adjusts the 's sensitivity to feature differences across data points. For instance, p = 1 (Manhattan distance) is particularly effective for sparse datasets, as it treats all dimensions equally without amplifying small differences through squaring, making it suitable for high-dimensional sparse representations like text features. In contrast, the default p = 2 () is widely used for dense, balanced data due to its computational efficiency and alignment with standard geometric intuitions in lower dimensions. Clustering algorithms frequently incorporate the to define data proximity. typically employs the variant (p = 2) as its default distance measure, optimizing cluster centroids based on squared differences to minimize intra-cluster variance in continuous data. , a density-based method, supports variable p values in its Minkowski metric, enabling flexible neighborhood definitions that enhance by identifying low-density regions as outliers in noisy or irregularly shaped datasets. In time series analysis, the Minkowski distance is often integrated with (DTW) to measure similarity between sequences that may vary in timing or speed. DTW aligns series by finding an optimal warping path, using Minkowski distances (e.g., with p = 1 or p = 2) as the local to compute accumulated distances along the path, which improves alignment accuracy for applications like motion recognition or financial forecasting. Recent applications from 2020 to 2025 highlight the Minkowski distance's role in AI-driven similarity search within high-dimensional data, such as embeddings in (). For example, angular variants of the Minkowski distance have been used to token frequencies in NLP embeddings, addressing challenges like distance concentration in high dimensions that degrade traditional similarity measures. In imbalanced datasets, customizable Minkowski metrics, often with tuned p values, improve classification performance by balancing minority class representation in granular or ensemble models. In optimization tasks, the L1 norm corresponding to p = 1 in the Minkowski distance underpins regression, where it acts as a regularization penalty to promote sparsity by driving irrelevant feature coefficients to exactly zero, enhancing model interpretability and reducing . The choice of p influences overall model robustness; lower p values like 1 provide greater resistance to outliers compared to p = 2, as the linear penalty avoids exaggerating extreme deviations.

In Geometry and Other Fields

In , the Minkowski distance arises as the Minkowski functional associated with the unit ball of a , which defines the in Banach spaces and characterizes the of bodies. The unit ball, defined as the set of points where the Minkowski distance to the origin is at most 1, serves as a fundamental object whose shape determines properties like strict and of the . For instance, in finite-dimensional Banach spaces, known as Minkowski spaces, the unit ball's boundary influences separation theorems and representations of hulls. In , the Minkowski distance with p = \infty, equivalent to the , is employed for efficient using axis-aligned bounding boxes (AABBs), where it measures the maximum coordinate difference to quickly reject non-intersecting objects. The Minkowski difference of two convex shapes transforms collision queries into origin-inclusion tests, accelerating real-time simulations. For ray tracing, the case p = 2 () computes the shortest path from rays to surfaces, essential for intersection tests and shading calculations in rendering pipelines. In , the Minkowski distance quantifies separations in structures, aiding analysis of crystal packing and defect distributions through metrics like the L_q- on points. Beyond these areas, the Minkowski distance with p = 1 (Manhattan distance) models linear costs in economic functions, representing additive preferences over bundles in spatial representations of consumer theory. In , it facilitates morphological distance measures for shape analysis, such as quantifying neuronal cell complexity via Minkowski valuations that capture branching and orientation patterns. A 2023 study in applied variable p in Minkowski distances to compare protein structures, enhancing accuracy by adapting the to structural variability in alignments.

History

Origins in Normed Spaces

The development of concepts akin to the Minkowski distance traces back to late 19th-century , where precursors to L^p norms emerged through inequalities that generalized classical bounds. A key example is , introduced in 1889, which established a fundamental relation between integrals or sums raised to conjugate exponents, laying essential groundwork for the structure of p-norms in finite dimensions. This work highlighted the role of parameter p in controlling convexity and , influencing subsequent generalizations in normed spaces. Hermann Minkowski advanced these ideas significantly in his 1901 papers on bodies, where he examined the geometric properties of sets in higher dimensions, including volume relations and support functions that implicitly relied on norm-like measurements. These contributions built on earlier explorations of quadratic forms, with Minkowski's 1896 body theorem providing a pivotal by demonstrating how symmetric sets in \mathbb{R}^n could enclose lattice points, thereby motivating the use of general p-norms to quantify such enclosures. In this context, norms served to bound the solutions within lattices, transforming Diophantine problems into geometric estimates of minimal volumes. Minkowski formalized the general p-norm framework in his 1910 monograph Geometrie der Zahlen, applying it systematically to the for analyzing point distributions in domains. A crucial step was his proof of the for p \geq 1, known as Minkowski's inequality, which confirmed the required for these functions to define valid s in finite-dimensional spaces. The Minkowski distance arises directly from these p-norms as a on \mathbb{R}^n.

Development and Naming

Following Hermann Minkowski's foundational work on convex bodies and the inequality bearing his name, which established key properties for p-norms in finite dimensions, the concept of what is now known as the — the induced by these norms—was extended to infinite-dimensional spaces by in 1910. Riesz's investigations into systems of integrable functions generalized the framework, enabling its application in and paving the way for broader normed space theories. The term "Minkowski distance" is attributed to (1864–1909) in recognition of his contributions to norm theory, though the specific nomenclature gained prominence in mid-20th-century mathematical literature on metric spaces and . By the 1930s, the metric was integrated into the developing field of metric geometry, notably through Stefan Banach's 1932 treatise on linear operations, which formalized Banach spaces and highlighted the role of Lp norms in complete normed spaces. In the 1950s, it entered contexts for computing similarity measures in early numerical and methods. The Minkowski distance holds a central position in theory, where spaces serve as prototypical examples of complete normed vector spaces essential for modern analysis. Notably, the name "Minkowski" can lead to confusion with , introduced in his on and time, but the distance pertains to the positive definite p-norm structure, distinct from the indefinite of .