Fact-checked by Grok 2 weeks ago

Mahalanobis distance

The Mahalanobis distance is a multivariate statistical measure that quantifies the distance between a point and the center of a distribution, accounting for the correlations and variances within the data via the covariance matrix, and was introduced by Indian statistician Prasanta Chandra Mahalanobis in 1936 to generalize distance concepts in correlated multivariate settings.^[1] Named after its originator, the distance addresses limitations of simpler metrics like Euclidean distance by scaling variables according to their standard deviations and incorporating inter-variable dependencies, rendering it invariant to linear transformations and particularly suited for normally distributed data. The mathematical formulation for the squared Mahalanobis distance D^2 between a vector \mathbf{x} and the mean \boldsymbol{\mu} of a p-dimensional distribution with covariance matrix \boldsymbol{\Sigma} is given by
D^2(\mathbf{x}) = (\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}),
where \boldsymbol{\Sigma}^{-1} is the inverse covariance matrix that "whitens" the data by decorrelating and standardizing the variables.^[2] Originally motivated by anthropological studies on anthropometric data, such as measuring physical differences between populations while adjusting for correlations (e.g., between height and arm length), Mahalanobis developed this metric during his work at the Indian Statistical Institute, where it facilitated discriminant analysis and population classification under correlated measurements.^[3] Over time, its applications expanded to fields like pattern recognition, where it underpins algorithms for outlier detection by identifying points far from the data centroid in scaled space; machine learning, including support vector machines and k-nearest neighbors for robust classification; and ecology, for niche modeling and assessing environmental suitability by incorporating multivariate habitat covariances.^[4]^[5] A key property is its equivalence to the Euclidean distance in a transformed space where the data are projected onto principal components and standardized, highlighting its geometric interpretation as the length of a vector in this uncorrelated coordinate system.^[2] Despite assumptions of known or estimated covariance (which can be sensitive to small samples), the Mahalanobis distance remains a foundational tool in multivariate statistics, influencing modern techniques in data science and high-dimensional analysis.^[6]

Fundamentals

Definition

The Mahalanobis distance measures the separation between a point \mathbf{x} in p-dimensional Euclidean space and the center (mean vector \boldsymbol{\mu}) of a multivariate probability distribution defined by its covariance matrix \boldsymbol{\Sigma}.^[7] The precise formulation is

D_M(\mathbf{x}) = \sqrt{ (\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}) },

where \mathbf{x} and \boldsymbol{\mu} are p \times 1 column vectors, ^T denotes the transpose operation (yielding a row vector when applied to \mathbf{x} - \boldsymbol{\mu}), and \boldsymbol{\Sigma}^{-1} is the inverse of the p \times p positive definite covariance matrix \boldsymbol{\Sigma}.^[7]^[8] To illustrate in the bivariate case (p=2), let \mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} and \boldsymbol{\mu} = \begin{pmatrix} \mu_1 \\ \mu_2 \end{pmatrix}, with covariance matrix \boldsymbol{\Sigma} = \begin{pmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{21} & \sigma_{22} \end{pmatrix} (where \sigma_{12} = \sigma_{21}). The distance simplifies to the same expression, substituting these 2D components into the general formula.^[9] In contrast to the Euclidean distance, which depends on the arbitrary units of measurement, the Mahalanobis distance is scale-invariant because the inverse covariance matrix normalizes for the variances along each dimension.^[10]

Geometric and Intuitive Interpretation

The Mahalanobis distance can be intuitively understood as a multivariate extension of the standardized Euclidean distance in one dimension. In a single variable, standardization involves subtracting the mean and dividing by the standard deviation, which scales the distance to account for the data's spread; similarly, in multiple dimensions, the Mahalanobis distance adjusts for both the mean vector and the covariance structure, effectively measuring distances in a transformed space where variables are uncorrelated and have unit variance. Geometrically, this distance is measured along the directions defined by the eigenvectors of the covariance matrix \boldsymbol{\Sigma}, with the eigenvalues determining the scaling in those directions, resulting in ellipsoidal contours of constant distance rather than spherical ones. These ellipsoids align with the principal axes of the data's variability, elongating along directions of high variance and compressing along those of low variance, providing a natural depiction of the data cloud's shape. Consider a two-dimensional example with correlated variables, such as human height and weight, where taller individuals tend to weigh more, inducing a positive covariance. The Euclidean distance would form circular contours centered at the mean, treating deviations equally in all directions and potentially misclassifying points along the correlation axis as outliers. In contrast, the Mahalanobis distance yields tilted elliptical contours that stretch along the line of correlation (e.g., the height-weight trend) and narrow perpendicular to it, accurately reflecting the data's elongated distribution and identifying true outliers as points far from this elliptical "cloud." This adjustment makes the Mahalanobis distance a measure of "outlierness" relative to the overall shape and orientation of the data distribution, rather than absolute spatial separation, allowing it to detect anomalies that respect the inherent structure of multivariate data.

Statistical Foundations

In Multivariate Normal Distributions

In the context of a p-dimensional multivariate normal distribution \mathcal{N}([\boldsymbol{\mu}](/page/MU), \boldsymbol{\Sigma}), the squared Mahalanobis distance D_M^2(\mathbf{x}) = (\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}) for a random vector \mathbf{x} follows a chi-squared distribution with p degrees of freedom.^[11] This distributional property arises because the transformation \boldsymbol{\Sigma}^{-1/2} (\mathbf{x} - \boldsymbol{\mu}) standardizes the normally distributed \mathbf{x} to a vector of independent standard normal variables, whose squared norm yields the chi-squared form.^[11] This chi-squared relationship provides a probabilistic foundation for interpreting Mahalanobis distances, particularly in outlier detection. Specifically, the probability that the squared distance exceeds a threshold c is P(D_M^2 > c) = P(\chi^2_p > c), where \chi^2_p denotes the chi-squared distribution with p degrees of freedom; thresholds are thus selected from chi-squared quantiles to control the false positive rate for identifying deviations from the distribution center.^[11] For instance, in anomaly detection tasks assuming multivariate normality, a point with D_M^2 surpassing the 95th percentile of \chi^2_p is flagged as an outlier with 5% significance.^[11] The level sets of the Mahalanobis distance further connect to the geometry of multivariate normal distributions through confidence ellipsoids, where contours of constant D_M^2 = c align with equiprobability regions bounded by the density function.^[12] These ellipsoids, scaled by chi-squared quantiles, enclose a specified probability mass around \boldsymbol{\mu}, with the shape dictated by \boldsymbol{\Sigma} to reflect correlated variability.^[12] As a concrete example in two dimensions (p=2), the threshold for a 95% confidence ellipsoid is c \approx 5.991, meaning approximately 95% of observations from \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}) lie within this boundary.^[13]

Relationship to Other Measures

The Mahalanobis distance generalizes the Euclidean distance by accounting for correlations and differing variances among variables, effectively applying inverse-variance weighting through the inverse of the covariance matrix.^[14] When the covariance matrix is the identity, the Mahalanobis distance reduces to the Euclidean distance, but in correlated data, it scales distances along principal axes of the data cloud, providing a more appropriate measure for ellipsoidal distributions.^[14] In multivariate hypothesis testing, the Mahalanobis distance forms the basis for Hotelling's T^2 statistic, which tests differences between sample means and a hypothesized mean vector.^[15] Specifically, for a sample of size n, Hotelling's T^2 is given by T^2 = n D_M^2(\bar{\mathbf{x}}, \boldsymbol{\mu}), where D_M^2 is the squared Mahalanobis distance using the sample covariance matrix, and under multivariate normality, T^2 follows a scaled F-distribution.^[15] In linear regression diagnostics, the Mahalanobis distance in the space of predictor variables measures an observation's leverage, quantifying its potential influence on fitted coefficients due to extremity relative to the design matrix.^[16] The leverage for the i-th observation is the diagonal element h_{ii} = \mathbf{x}_i^T (X^T X)^{-1} \mathbf{x}_i of the hat matrix, which corresponds to a scaled squared Mahalanobis distance from the predictor mean using the inverse of the unscaled covariance of the predictors.^[17] For non-normal data, the asymptotic chi-squared distribution of the squared Mahalanobis distance under the null hypothesis of centrality can be approximated more accurately using bootstrap resampling to estimate the empirical distribution, avoiding reliance on normality assumptions.^[18]

Extensions and Variants

General Forms of Location and Scatter

The Mahalanobis distance admits a general form that replaces the mean vector and covariance matrix with arbitrary location parameter \mathbf{m} \in \mathbb{R}^p and positive definite scatter matrix \mathbf{S} \in \mathbb{R}^{p \times p}, respectively, yielding

D(\mathbf{x}) = \sqrt{ (\mathbf{x} - \mathbf{m})^T \mathbf{S}^{-1} (\mathbf{x} - \mathbf{m}) }

for any observation \mathbf{x} \in \mathbb{R}^p. This parameterization extends the distance measure to settings where traditional moment-based estimators may be inappropriate, such as contaminated or non-Gaussian data, by allowing flexible choices for \mathbf{m} and \mathbf{S} that capture central tendency and dispersion in a tailored manner.^[19] In its original introduction, P. C. Mahalanobis defined the distance using the sample mean as the location and the sample covariance matrix as the scatter, applied to anthropometric data for classification purposes.^[20] Subsequent generalizations have employed alternative location estimators, such as the componentwise median for non-symmetric distributions, which provides robustness to skewness by minimizing the sum of absolute deviations in each dimension. For scatter, robust alternatives like the minimum covariance determinant (MCD) matrix—introduced by Rousseeuw in 1984—select the subset of observations yielding the smallest determinant of the covariance, achieving high breakdown point against outliers while maintaining affine equivariance.^[21] This general form proves particularly useful in elliptical distributions, where the probability density function is constant on ellipsoidal level sets centered at \mathbf{m} and shaped by \mathbf{S}; accordingly, the contours of constant Mahalanobis distance D(\mathbf{x}) = c align precisely with these density level sets, facilitating probabilistic interpretations and outlier detection across the family of elliptical models.^[22]

Robust and Kernel Variants

The robust Mahalanobis distance addresses the sensitivity of the classical version to outliers by replacing the sample covariance matrix with a robust estimator of scatter, such as the minimum covariance determinant (MCD). The MCD estimator selects a subset of h observations (where n/2 < h \leq n, with n the sample size and typically h \approx (n + p + 1)/2 for p dimensions) that minimizes the determinant of their covariance matrix, thereby reducing the influence of contaminants. The resulting robust distance for an observation \mathbf{x} is given by

d_{\text{MCD}}(\mathbf{x}, \boldsymbol{\mu}) = \sqrt{(\mathbf{x} - \boldsymbol{\mu})^\top \mathbf{S}_{\text{MCD}}^{-1} (\mathbf{x} - \boldsymbol{\mu})},

where \boldsymbol{\mu} and \mathbf{S}_{\text{MCD}} are the MCD location and scatter estimates, respectively. This approach achieves a high breakdown point of up to approximately 50%, meaning it can tolerate up to nearly half the data points being arbitrary outliers without breakdown.^[21] Computing the exact MCD is combinatorial and computationally intensive for large datasets, as it requires evaluating \binom{n}{h} subsets. To address this, the fast-MCD algorithm provides an efficient approximation by initializing with a concentration step to generate candidate subsets and iteratively refining them using order statistics and determinant inequalities, often achieving the exact solution for small to moderate n. This method balances accuracy and speed, making robust Mahalanobis distances practical for high-dimensional data.^[23] Kernel variants extend the Mahalanobis distance to non-linear manifolds by applying the kernel trick, mapping data into a reproducing kernel Hilbert space where distances are computed implicitly. In kernel Mahalanobis distance (KMD), the distance is defined in the feature space as d^2 = (\phi(\mathbf{x}) - \boldsymbol{\mu}_\phi)^\top \boldsymbol{\Sigma}_\phi^{-1} (\phi(\mathbf{x}) - \boldsymbol{\mu}_\phi), where \boldsymbol{\mu}_\phi is the mean in the feature space and \boldsymbol{\Sigma}_\phi is the covariance in that space. This is computed using centered kernel matrices without explicit feature maps, for a kernel function k (e.g., RBF). This formulation enables handling curved data structures and is often integrated into support vector machine frameworks for enhanced discrimination in non-linear settings.^[24] Post-2010 developments have applied KMD in image recognition tasks involving high-dimensional, non-linear data spaces, such as hyperspectral imaging, where the kernel captures complex spectral manifolds for improved classification accuracy over linear methods. For instance, a Mahalanobis kernel adapted for support vector classifiers on hyperspectral datasets demonstrates superior performance in distinguishing subtle class boundaries on curved feature spaces.^[25]

Recent Developments

Recent extensions address challenges in high-dimensional and non-Euclidean settings. For instance, regularized estimators of the inverse covariance matrix, using modified Cholesky decomposition, prevent singularity and improve performance when the dimension p exceeds the sample size n, as proposed in 2022.^[26] Additionally, variants like Mahalanobis++ (2025) normalize features to enhance out-of-distribution detection in machine learning, reducing false positives while maintaining detection rates. Infinite-dimensional formulations for functional data analysis have also emerged, extending the metric to Banach and Hilbert spaces for outlier detection in infinite-dimensional data as of 2024.^[27]^[28]

Applications

In Statistics and Quality Control

In statistics, the Mahalanobis distance serves as a key tool for outlier detection in multivariate datasets, particularly under the assumption of multivariate normality. The squared Mahalanobis distance, D_M^2, measures the extent to which an observation deviates from the multivariate mean, accounting for correlations via the inverse covariance matrix. Under multivariate normality, D_M^2 follows a chi-squared distribution with p degrees of freedom, where p is the number of variables. Outliers are identified by comparing D_M^2 to a critical threshold from the chi-squared distribution, such as the $1-\alpha quantile \chi^2_{p,1-\alpha}; observations exceeding this threshold are flagged as potential outliers with significance level \alpha.^[29]^[30] This approach enables effective screening of multivariate data for anomalies that univariate methods might overlook, enhancing data quality in statistical analyses.^[31] In quality control, the Mahalanobis distance underpins multivariate process monitoring, notably through Hotelling's T^2 charts, which extend Shewhart's univariate control principles to multiple correlated variables. Developed in the 1940s, these charts detect shifts in the process mean or covariance structure by computing a statistic equivalent to a scaled Mahalanobis distance between sample means and the target process mean.^[32]^[33] For instance, in phase II monitoring, upper and lower control limits are derived from the F-distribution related to the chi-squared properties of D_M^2, signaling out-of-control conditions when the T^2 value exceeds these limits. This method is widely adopted for simultaneous surveillance of multiple quality characteristics, improving detection sensitivity over individual univariate charts.^[34] The Mahalanobis distance originated from applications in anthropometry, where P. C. Mahalanobis introduced it in 1936 to classify populations based on multivariate measurements. In his seminal work, Mahalanobis applied the generalized distance to analyze cranial and other anthropometric data from Indian populations, such as Anglo-Indians in Calcutta, to quantify racial affinities and divergences while accounting for variable correlations.^[35]^[36] This approach, detailed in his paper "On the Generalized Distance in Statistics," provided a robust metric for discriminant analysis in anthropometric studies, influencing subsequent population genetics research.^[37] A practical example of its use in manufacturing quality control involves detecting defects in assembled products, such as circuit boards, by measuring multiple dimensions (e.g., length, width, thickness) relative to the process mean. Observations with a large Mahalanobis distance from the in-control mean indicate potential defects due to correlated shifts, allowing operators to isolate faulty items before shipment and maintain process stability.^[38]^[39]

In Machine Learning and Pattern Recognition

In machine learning, the Mahalanobis distance serves as a robust metric for anomaly detection in one-class classification tasks, where it quantifies deviations from a normal data profile by accounting for feature correlations and covariances.^[40] For instance, in network intrusion detection, it measures the distance of observed traffic patterns from the centroid of legitimate traffic, enabling the identification of outliers as potential attacks without requiring labeled anomalous examples.^[41] This approach enhances detection accuracy in high-dimensional spaces, such as cybersecurity datasets, by normalizing for varying feature scales and dependencies.^[42] The distance is also employed as a dissimilarity measure in clustering algorithms like k-means, particularly when features exhibit strong correlations, as it adjusts for covariance to produce more meaningful partitions than Euclidean distance.^[43] In bioinformatics, for gene expression analysis, Mahalanobis distance integrates into linear discriminant analysis (LDA) for classification, where it computes class separability by incorporating the pooled covariance matrix, improving discrimination in correlated high-dimensional microarray data.^[44] This handles multicollinearity effectively, leading to better feature weighting and reduced misclassification in tasks like tumor subtype identification.^[45] In feature extraction pipelines, Mahalanobis distance informs principal component analysis (PCA)-related methods by evaluating outlier influence during dimensionality reduction, ensuring that transformations preserve the data's ellipsoidal structure.^[46] It quantifies how far projections lie from the principal subspace, aiding in the selection of components that minimize distortion from correlated variables.^[47] Post-2020 applications in AI include its use in credit card fraud detection, where a 2024 method combines Mahalanobis distance with hybrid sampling (e.g., SMOTE-ENN) and random forest algorithms to address class imbalance and improve detection accuracy.^[48] Recent developments as of 2025 also feature its application in medical diagnostics, such as feature selection for Parkinson's disease classification, and hardware implementations using ferroelectric FinFETs for efficient outlier detection in high-dimensional data.^[49]^[50]

Computation and Implementation

Numerical Methods

The computation of the Mahalanobis distance requires evaluating the quadratic form (\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}), where direct inversion of the covariance matrix \boldsymbol{\Sigma} can introduce numerical instability due to rounding errors in floating-point arithmetic. To mitigate this, the Cholesky decomposition \boldsymbol{\Sigma} = \mathbf{L} \mathbf{L}^T (with \mathbf{L} lower triangular) is employed, allowing the distance to be computed as \| \mathbf{L}^{-1} (\mathbf{x} - \boldsymbol{\mu}) \|^2 via forward and backward substitutions, which is more stable and efficient for positive definite \boldsymbol{\Sigma}.^[26] This approach avoids explicit inversion and leverages the decomposition's O(p^3) cost once, followed by O(p^2) per distance evaluation for p-dimensional data.^[26] When \boldsymbol{\Sigma} is singular or near-singular—common in high-dimensional settings where the sample size n < p—the inverse does not exist, rendering the distance undefined. A regularized inverse \boldsymbol{\Sigma} + \lambda \mathbf{I} is then used, with the shrinkage parameter \lambda chosen via methods like the Ledoit-Wolf estimator, which optimally blends the sample covariance with a structured target (e.g., the identity matrix) to ensure positive definiteness and reduce estimation error.^[51] The Ledoit-Wolf approach computes the shrinkage intensity \hat{\delta}^* analytically as the ratio of bias and variance terms, yielding a full-rank estimator even for n < p.^[51] For scalability in high dimensions (p \gg n), exact computation becomes prohibitive due to O(p^3) decomposition costs, prompting approximate methods such as random projections via Johnson-Lindenstrauss (JL) sketches. These reduce dimensionality by projecting data onto a lower-dimensional space using a random matrix \Pi \in \mathbb{R}^{m \times p} with m = O(\log n / \epsilon^2), preserving distances up to (1 \pm \epsilon) factor while approximating the Mahalanobis metric in O(mp^2) preprocessing time.^[52] Incremental updates are also viable, supporting dynamic changes to \boldsymbol{\Sigma} or data points in sub-quadratic time via adaptive data structures that maintain sketched representations.^[52] Numerical errors in Mahalanobis distance arise primarily from the conditioning of \boldsymbol{\Sigma}, quantified by its condition number \kappa(\boldsymbol{\Sigma}) = \lambda_{\max}/\lambda_{\min}, where large \kappa amplifies perturbations in small samples. In ill-posed cases with n \approx p, the sample eigenvalues can lead to \kappa > 10^{12}, causing distance estimates to vary by orders of magnitude under minor noise; regularization or dimensionality reduction is essential to bound errors below 1% relative deviation.^[53]

Software Libraries

The Mahalanobis distance is implemented in several popular software libraries across programming languages, facilitating its use in statistical analysis, machine learning, and data science workflows. These implementations typically provide functions to compute the distance between points or datasets, often incorporating options for covariance matrix estimation. In Python, the scikit-learn library includes the Mahalanobis distance as part of its metrics.DistanceMetric class, allowing pairwise computations between samples with a user-specified covariance matrix.^[54] For robust variants, scikit-learn's covariance module offers the MinimumCovarianceDeterminant estimator, which computes a high-breakdown-point covariance matrix suitable for outlier-resistant Mahalanobis distances, as demonstrated in examples for anomaly detection. Additionally, SciPy's spatial.distance submodule provides a dedicated mahalanobis function that calculates the distance using the inverse covariance matrix VI.^[55] In R, the base stats package includes the mahalanobis function, which returns squared Mahalanobis distances for rows of a data matrix relative to a center vector and covariance matrix, enabling efficient multivariate outlier detection. For robust extensions, the robustbase package implements the Minimum Covariance Determinant (MCD) via covMcd, which generates robust location and scatter estimates; these can then be used to compute Mahalanobis distances, with visualization tools like plot.mcd for comparing classical and robust distances.^[56] MATLAB's Statistics and Machine Learning Toolbox features the built-in mahal function, which computes squared Mahalanobis distances from observations in Y to reference samples in X, automatically handling covariance estimation from the data.^[57] Variants are available in classification and Gaussian mixture model objects, such as ClassificationDiscriminant.mahal for distances to class means and gmdistribution.mahal for component-specific distances.^[58]^[59] In Julia, the Distances.jl package supports Mahalanobis distances through specialized methods like Mahalanobis and SqrMahalanobis, optimized for performance with precomputed covariance inverses and compatible with pairwise distance evaluations.^[60] For recent advancements in deep learning contexts, TensorFlow integrations for kernel variants of the Mahalanobis distance have emerged, particularly in outlier detection libraries like Alibi Detect (with updates through 2024), which uses TensorFlow backends for scalable computations in high-dimensional spaces.^[61]

References

[1]
Mahalanobis, P.C. (1936) "On the Generalised Distance in Statistics."
Mar 29, 2019 · Reprint of: Mahalanobis, P.C. (1936) "On the Generalised Distance in Statistics. ... Download PDF · Sankhya A Aims and scope Submit manuscript.
[2]
The Mahalanobis distance and its relationship to principal ...
Jan 9, 2015 · The Mahalanobis distance was first proposed by the Indian statistician P. C. Mahanobis in 1936 2. The 1930s were important years for the ...
[3]
[PDF] Mahalanobis' Distance : A Brief History and Some Observations
Mar 30, 2022 · Mahalanobis, P. C. (1936), 'On the generalized distance in statistics', Proc. Nat. Acad. Sci., India, 12, 49–55. Mahalanobis met Nelson ...
[4]
Mahalanobis distances for ecological niche modelling and outlier ...
May 11, 2021 · The Mahalanobis distance is a statistical technique that has been used in statistics and data science for data classification and outlier detection.
[5]
Mahalanobis distances and ecological niche modelling: correcting a ...
Apr 2, 2019 · The Mahalanobis distance is a statistical technique that can be used to measure how distant a point is from the centre of a multivariate ...
[6]
[PDF] On Mahalanobis Distance in Functional Settings
Mahalanobis distance is a classical tool in multivariate analysis. We suggest here an exten- sion of this concept to the case of functional data.<|control11|><|separator|>
[7]
On the generalized distance in statistics - SciSpace
The article was published on 01 Jan 1936. and is currently open access. The article focuses on the topics: Generalized integer gamma distribution ...
[8]
[PDF] Lecture Notes 27 36-705 1 The Fundamental Statistical Distances
Mahalanobis distance: for two distributions Pθ1 ,Pθ2 , with means µ1,µ2 and covariances. Σ1,Σ2, the Mahalanobis distance would be: d(Pθ1 ,Pθ2 )=(µ1 − µ2)T Σ−1.
[9]
[PDF] Mahalanobis Distance
Get Inverse of Covariance Matrix: Sigma-1 to calculate Mahalanobis distance. Page 5. Matrix = [ a b. c d ]. Matrix Inverse with determinant = (ad –bc).
[10]
Scaling - cs.Princeton
This distance has the important property that it is scale invariant. That ... The Mahalanobis distance is a generalization of this standardized distance.
[11]
4.3 - Exponent of Multivariate Normal Distribution | STAT 505
For each observation, calculate = sum of squared standardized principal components scores. This will equal the squared Mahalanobis distance.
[12]
[PDF] Financial Applications of the Mahalanobis Distance
Oct 23, 2014 · We describe existing and potential financial applications of the Mahalanobis distance. After a short motivation and a discussion of important ...
[13]
1.3.6.7.4. Critical Values of the Chi-Square Distribution
Upper-tail critical values of chi-square distribution with ν degrees of freedom. Probability less than the critical value ν 0.90 0.95 0.975 0.99 0.999
[14]
[PDF] Distance - The University of Texas at Dallas
The Mahalanobis distance can be seen as a multivariate equiv- alent of the Z-score transformation. The metric multidimensional scaling analysis of a Mahalanobis ...Missing: formula | Show results with:formula
[15]
7.1 - Basic - STAT ONLINE
This next expression says that the probability that n times the squared Mahalanobis distance ... Now we are ready to define the Two-sample Hotelling's T-Square ...
[16]
Regression Diagnostics - Information Technology Laboratory
Dec 4, 2023 · The Mahalanobis distance is a measure of the distance of an observation from the “center” of the observations and is essentially an alternate ...
[17]
[PDF] Regression diagnostics - Department of Statistics
Oct 31, 2021 · Leverage values in a linear regression with two independent variables: ... i }, as assessed with Mahalanobis distance, have the highest leverage.
[18]
[PDF] The Distribution of Robust Distances - David Rocke
S(X, Y )=(X − Y ) S−1(X − Y ). This quadratic form is often called the Mahalanobis squared distance (MSD). If there are only a few outliers, large values ...Missing: non- | Show results with:non-
[19]
[PDF] Fast robust location and scatter estimation: a depth-based method
May 13, 2023 · The location and scatter estimators are then defined as the average and a scaled covariance matrix of these samples, respectively. Butler et al.
[20]
On the generalized distance in statistics - Semantic Scholar
@inproceedings{Mahalanobis1936OnTG, title={On the generalized distance in statistics} ... This paper presents the on-line tracking method, which made the first ...Missing: original | Show results with:original
[21]
[1709.07045] Minimum Covariance Determinant and Extensions
Sep 20, 2017 · The Minimum Covariance Determinant (MCD) method is a highly robust estimator of multivariate location and scatter, for which a fast algorithm is available.
[22]
Influence Function and Efficiency of the Minimum Covariance ...
The minimum covariance determinant (MCD) scatter estimator is a highly robust estimator for the dispersion matrix of a multivariate, elliptically symmetric ...
[23]
Unmasking Multivariate Outliers and Leverage Points
Feb 27, 2012 · To avoid the masking effect, we propose to compute distances based on very robust estimates of location and covariance. These robust distances ...
[24]
[PDF] A Fast Algorithm for the Minimum Covariance - Determinant Estimator
The minimum covariance determinant (MCD) method of Rousseeuw is a highly robust estimator of multivariate location and scatter. Its objective is to find h ...
[25]
Learning the kernel parameters in kernel minimum distance classifier
In this paper, a novel approach is proposed to learn the kernel parameters in kernel minimum distance (KMD) classifier, where the values of the kernel ...
[26]
[PDF] Mahalanobis kernel for the classification of hyperspectral images
Mar 22, 2011 · ABSTRACT. The definition of the Mahalanobis kernel for the classification of hyperspectral remote sensing images is addressed. Class.
[27]
4.4 - Multivariate Normality and Outliers | STAT 505
For multivariate data, we plot the ordered Mahalanobis distances versus estimated quantiles (percentiles) for a sample of size n from a chi-squared distribution ...
[28]
https://arxiv.org/abs/2407.11873
[29]
Outlier Identification Using Mahalanobis Distance - Charles Holbert
Mar 27, 2022 · The Mahalanobis distance is a statistical technique that can be used to measure how distant a point is from the center of a multivariate normal distribution.
[30]
6.3.4. What are Multivariate Control Charts?
Hotelling in 1947 introduced a statistic which uniquely lends itself to plotting multivariate observations. This statistic, appropriately named Hotelling's T ...Missing: history | Show results with:history
[31]
Multivariate Quality Control: A Historical Perspective | PDF - Scribd
The topics covered by this paper are control charts, especially the use of Hotelling's T2 control chart to monitor the mean and charts for the process ...
[32]
Multivariate Control Charts: The Hotelling T2 Control Chart
A control chart normally monitors one variable over time. Perhaps this variable is machine uptime, a product characteristic, or on-time delivery.Introduction to the T Control... · Constructing a T Control Chart · Out of Control Points
[33]
[PDF] Mahalanobis Distance - Indian Academy of Sciences
[1] PC Mahalanobis, A statistical study oftbe Chinese hud, Man in India,. 8, 107-122, 1928. [2] PC Mahalanobis, On the generalised distance in statistics, ...
[34]
Mahalanobis, P.C. (1936) On the Generalized Distance in Statistics ...
May 26, 2021 · Mahalanobis, P.C. (1936) On the Generalized Distance in Statistics. Proceedings of National Institute Science in India, 2, 49-55.
[35]
C R Rao and Mahalanobis' distance - PMC - NIH
Mahalanobis P C. On the generalized distance in statistics. Proc. Natl Inst. Sci., India. 1936;12:49–55. [Google Scholar]; 3. Rao C R. The utilization of ...
[36]
[PDF] A Defect Detection Method of Mixed Wafer Map Using ...
Oct 2, 2024 · (2) Mahalanobis distance. Mahalanobis distance [26] is a metric used to evaluate the similarity between data points. For one-dimensional sample.
[37]
[PDF] Online-Adaptive Anomaly Detection for Defect Identification in ... - DFKI
Anomaly detection is then performed by thresholding the Mahalanobis distance between the adapted normality model and features extracted from the test image.
[38]
[PDF] Online One-class Classification for Intrusion Detection Based on the ...
In this paper, we propose an online one-class classification approach based on the Mahalanobis distance which takes into account the covariance in each feature ...Missing: anomaly | Show results with:anomaly
[39]
Factor-analysis based anomaly detection and clustering
This paper presents a novel anomaly detection and clustering algorithm for the network intrusion detection based on factor analysis and Mahalanobis distance.
[40]
[PDF] Mahalanobis-based one-class classification
We make use of the ad- vantages of kernel whitening and KPCA in order to compute the Mahalanobis distance in the feature space, by projecting the data into the ...Missing: anomaly | Show results with:anomaly
[41]
[PDF] A k-means procedure based on a Mahalanobis type distance for ...
We show, both in simulation and in two applications to real case studies, that the k-means algorithm with the generalized Mahalanobis distance provides better.
[42]
(PDF) Modified linear discriminant analysis approaches for ...
Aug 7, 2025 · It was concluded that the modified LDA approaches can be used as an effective classification tool in limited sample size and high-dimensional ...
[43]
Modified Bagging in Linear Discriminant Analysis: Machine Learning
May 2, 2024 · A commonly used distance measure is the Mahalanobis distance, which is defined as where Sp is the pooled sample covariance matrix, and xi and xj ...
[44]
https://www.researchgate.net/publication/46494154_Modified_linear_discriminant_analysis_approaches_for_classification_of_high-dimensional_microarray_data
[45]
Mahalanobis distance–based kernel supervised machine learning ...
Nov 26, 2020 · The Mahalanobis distance–based multiples kernel learning achieves higher classification accuracy than the Euclidean distance kernel function.
[46]
2.7. Novelty and Outlier Detection - Scikit-learn
... outliers). The Mahalanobis distances obtained from this estimate are used to derive a measure of outlyingness. This strategy is illustrated below ...
[47]
[PDF] mahalanobis distance based adversarial network for anomaly
The proposed MDAN outperforms conventional GAN-based methods considerably and has a higher inference speed, when applied to several tabular and image datasets.
[48]
Regularized estimation of the Mahalanobis distance based on ...
Aug 8, 2022 · This paper proposes a regularized estimator for the inverse covariance matrix. Modified Cholesky decomposition (MCD) is utilized to construct positive definite ...
[49]
[PDF] Honey, I Shrunk the Sample Covariance Matrix - Olivier Ledoit
Ledoit and Wolf (2003) suggest the single-factor model of Sharpe (1963) as the shrink- age target, instead of the constant correlation covariance matrix as we ...
[50]
[PDF] Online Adaptive Mahalanobis Distance Estimation - arXiv
Dec 30, 2023 · ... Mahalanobis distance between q and xi. By Hoeffding's Inequality (Lemma 5), we know that the median of the sampled R distance estimates can ...
[51]
[PDF] Stability of the Mahalanobis Distance: A Technical Note
Aug 4, 2010 · This technical note estimates the typical magnitude of error in the Mahalanobis distance calculation, and examines whether an ill-conditioned ...<|control11|><|separator|>
[52]
DistanceMetric — scikit-learn 1.7.2 documentation
The DistanceMetric class provides a convenient way to compute pairwise distances between samples. It supports various distance metrics.
[53]
mahalanobis — SciPy v1.16.2 Manual
The Mahalanobis distance between 1-D arrays u and v, is defined as where V is the covariance matrix. Note that the argument VI is the inverse of V.
[54]
[PDF] Package 'robustbase'
Sep 4, 2025 · Tools allowing to analyze data with robust methods. This includes regression methodology including model selections and multivariate statistics ...
[55]
mahal - Mahalanobis distance to reference samples - MATLAB
This MATLAB function returns the squared Mahalanobis distance of each observation in Y to the reference samples in X.Examples · Input Arguments · More About
[56]
mahal - Mahalanobis distance to class means of discriminant ...
This MATLAB function returns the squared Mahalanobis distances from observations (rows) in Tbl to the class means in Mdl.
[57]
mahal - Mahalanobis distance to Gaussian mixture component
This MATLAB function returns the squared Mahalanobis distance of each observation in X to each Gaussian mixture component in gm.
[58]
JuliaStats/Distances.jl: A Julia package for evaluating ... - GitHub
For [Sq]Mahalanobis , however, specialized methods are provided in Distances.jl, and the table below compares the performance (measured in terms of average ...
[59]
alibi_detect.od.mahalanobis module — alibi-detect 0.12.1.dev0 ...
Outlier detector for tabular data using the Mahalanobis distance. Parameters: threshold ( Optional [ float ]) – Mahalanobis distance threshold used to classify ...