Home »
Python »
Python Programs
How to calculate Mahalanobis distance in Python?
By Shivang Yadav Last updated : November 21, 2023
Mahalanobis Distance
Mahalanobis distance is a statistical measure of how close a point is to a given multivariate normal distribution. The formula to calculate Mahalanobis Distance is,
Here,
- D(x) is the Mahalanobis Distance between the point x and the distribution.
- X is the data point for which the calculation of Mahalanobis distance is calculated.
- μ is the mean vector for the distribution.
- S-1 is the inverse of the covariance matrix.
Calculation of Mahalanobis distance
The classical method for calculation of the Mahalanobis Distance in Python is by using the above-given formula for the calculation. Here is a program depicting the calculation of Mahalanobis distance in Python.
Example: Python program to calculate Mahalanobis Distance
# Python program to calculate Mahalanobis Distance
import numpy as np
import pandas as pd
import scipy as stats
def calculateMahalanobis(y=None, data=None, cov=None):
y_mu = y - np.mean(data)
if not cov:
cov = np.cov(data.values.T)
inv_covmat = np.linalg.inv(cov)
left = np.dot(y_mu, inv_covmat)
mahal = np.dot(left, y_mu.T)
return mahal.diagonal()
data = {
"Hindi": [91, 93, 72, 87, 86, 73, 68, 87, 78, 99],
"Science": [16, 76, 43, 61, 92, 23, 42, 15, 92, 55],
"English": [70, 88, 80, 83, 88, 84, 78, 94, 90, 93],
"Maths": [76, 89, 89, 57, 79, 84, 78, 99, 97, 99],
}
df = pd.DataFrame(data, columns=["Hindi", "Science", "English", "Maths"])
df["calculateMahalanobis"] = calculateMahalanobis(
y=df, data=df[["Hindi", "Science", "English", "Maths"]]
)
print(df)
Output
The output of the above program is:
Hindi Science English Maths calculateMahalanobis
0 91 16 70 76 7.761124
1 93 76 88 89 4.514153
2 72 43 80 89 2.894182
3 87 61 83 57 10.804791
4 86 92 88 79 3.700610
5 73 23 84 84 8.402898
6 68 42 78 78 3.169864
7 87 15 94 99 17.316577
8 78 92 90 97 4.062911
9 99 55 93 99 10.893471
When the values of the Mahalanobis Distance have huge variations i.e. some values are much higher than others. To figure out if any of the distances are different enough to be meaningful, we need to calculate their p-values.
P-value of Mahalanobis distance
The P-value is calculated as, it is the Chi-square statistic of Mahalanobis distance with K-1 Degrees of freedom, where k is the number of variables.
Python SciPy Programs »