Home »
Python »
Python Programs
How to calculate partial correlation in Python?
Python partial correlation calculation: In this tutorial, we will learn what is partial correlation, how to calculate it, and how to calculate the partial correlation in Python?
By Shivang Yadav Last updated : September 03, 2023
What is partial correlation?
Partial correlation is a statistical measure that quantifies the relationship between two variables while controlling for the influence of one or more other variables. In other words, it assesses the degree of association or correlation between two variables while accounting for the effects of additional variables that may be confounding the relationship.
The partial correlation coefficient, often denoted as "r," indicates how much two variables are correlated after removing the shared variance explained by the other variables in the analysis. It helps researchers and analysts to isolate the specific relationship between the variables of interest while holding constant the potential impact of other factors.
Calculation of partial Correlation in Python
The partial correlation in Python is calculated using a built-in function partial_corr() which is present in the pingoiun package (It is an open-source statistical package that is written in Python3 and based mostly on Pandas and NumPy). The function returns a dataset with multiple values.
Syntax:
partial_corr(data, x, y, cover)
Where,
- data is the data set for which the partial correlation is to be found.
- x and y are the column names for the correlation.
- cover is the covariate column name.
Let us understand with the help of an example,
Python program to calculate the partial correlation
import numpy as np
import pandas as pd
import pingouin as pg
data = {
"currentGrade": [82, 88, 75, 74, 93, 97, 83, 90, 90, 80],
"hours": [4, 3, 6, 5, 4, 5, 8, 7, 4, 6],
"examScore": [88, 85, 76, 70, 92, 94, 89, 85, 90, 93],
}
dataframe = pd.DataFrame(data, columns=["currentGrade", "hours", "examScore"])
print(f"The dataset is {dataframe}")
partCorrCoeff = pg.partial_corr(data=df, x="hours", y="examScore", covar="currentGrade")
print(f"The partial correlation is {partCorr}")
Output
The dataset is currentGrade hours examScore
0 82 4 88
1 88 3 85
2 75 6 76
3 74 5 70
4 93 4 92
5 97 5 94
6 83 8 89
7 90 7 85
8 90 4 90
9 80 6 93
The partial correlation is n r CI95% r2 adj_r2 p-val BF10 power
pearson 10 0.191 [-0.5, 0.73] 0.036 -0.238 0.598 0.438 0.082
Python NumPy Programs »