How to calculate Mahalanobis distance in Python?

By Shivang Yadav Last updated : November 21, 2023

Mahalanobis Distance

Mahalanobis distance is a statistical measure of how close a point is to a given multivariate normal distribution. The formula to calculate Mahalanobis Distance is,

mahalanobis distance

Here,

  • D(x) is the Mahalanobis Distance between the point x and the distribution.
  • X is the data point for which the calculation of Mahalanobis distance is calculated.
  • μ is the mean vector for the distribution.
  • S-1 is the inverse of the covariance matrix.

Calculation of Mahalanobis distance

The classical method for calculation of the Mahalanobis Distance in Python is by using the above-given formula for the calculation. Here is a program depicting the calculation of Mahalanobis distance in Python.

Example: Python program to calculate Mahalanobis Distance

# Python program to calculate Mahalanobis Distance

import numpy as np
import pandas as pd
import scipy as stats

def calculateMahalanobis(y=None, data=None, cov=None):
    y_mu = y - np.mean(data)
    if not cov:
        cov = np.cov(data.values.T)
    inv_covmat = np.linalg.inv(cov)
    left = np.dot(y_mu, inv_covmat)
    mahal = np.dot(left, y_mu.T)
    return mahal.diagonal()

data = {
    "Hindi": [91, 93, 72, 87, 86, 73, 68, 87, 78, 99],
    "Science": [16, 76, 43, 61, 92, 23, 42, 15, 92, 55],
    "English": [70, 88, 80, 83, 88, 84, 78, 94, 90, 93],
    "Maths": [76, 89, 89, 57, 79, 84, 78, 99, 97, 99],
}

df = pd.DataFrame(data, columns=["Hindi", "Science", "English", "Maths"])

df["calculateMahalanobis"] = calculateMahalanobis(
    y=df, data=df[["Hindi", "Science", "English", "Maths"]]
)

print(df)

Output

The output of the above program is:

   Hindi  Science  English  Maths  calculateMahalanobis
0     91       16       70     76              7.761124
1     93       76       88     89              4.514153
2     72       43       80     89              2.894182
3     87       61       83     57             10.804791
4     86       92       88     79              3.700610
5     73       23       84     84              8.402898
6     68       42       78     78              3.169864
7     87       15       94     99             17.316577
8     78       92       90     97              4.062911
9     99       55       93     99             10.893471

When the values of the Mahalanobis Distance have huge variations i.e. some values are much higher than others. To figure out if any of the distances are different enough to be meaningful, we need to calculate their p-values.

P-value of Mahalanobis distance

The P-value is calculated as, it is the Chi-square statistic of Mahalanobis distance with K-1 Degrees of freedom, where k is the number of variables.

Python SciPy Programs »

Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.