Home »
Python »
Python Programs
How to Center Data in Python (With Examples)
By Shivang Yadav Last updated : November 22, 2023
Centering data
Centering data involves subtracting a constant value from each data point in a dataset. This constant value is typically the mean (average) of the dataset. Centering data can be useful for various reasons, including simplifying interpretation and analysis, removing bias or constant terms, and preparing data for certain statistical techniques.
Steps/Algorithm
Let's see the steps to center dataset calculation:
Example
Let's see an example,
Data Set : [10, 15, 20, 25, 30]
Mean (μ) = (10 + 15 + 20 + 25 + 30) / 5 = 20
= Centered Data Point 1 = 10 - 20 = -10
= Centered Data Point 2 = 15 - 20 = -5
= Centered Data Point 3 = 20 - 20 = 0
= Centered Data Point 4 = 25 - 20 = 5
= Centered Data Point 5 = 30 - 20 = 10
The resulting centered data values are: [-10, -5, 0, 5, 10].
By entering the data in this way, you make the mean of the centered data equal to zero, and you remove the constant term, which can be helpful in various statistical analyses and interpretations.
Now, since we have cleared the basic logic of the center data. Now, create a program using the NumPy library to perform this calculation.
Python program to center the values of NumPy array
import numpy as np
# function to return the distance
# from center for every data
center_function = lambda x: x - meanVal
# Creating nunpy array and printing the data
dataSet = np.array([10, 15, 20, 25, 30])
print(f"The value of the data set are \n{dataSet}")
# finding the mean value of the data
meanVal = dataSet.mean()
centerData = center_function(dataSet)
print(f"The array of centered data values is \n{centerData}")
Output
The value of the data set are
[10 15 20 25 30]
The array of centered data values is
[-10. -5. 0. 5. 10.]
The same function can be used to calculate the center data for variables in a column of a Pandas DataFrame.
Python program to center the columns of a Pandas DataFrame
import pandas as pd
# create DataFrame
dataFr = pd.DataFrame(
{
"x": [10, 20, 23, 43, 56, 90],
"y": [17, 45, 60, 77, 89, 100],
"z": [3, 13, 13, 16, 18, 29],
}
)
print(f"The value of the data set are \n{dataFr}")
centerData = dataFr.apply(lambda x: x - x.mean())
# view centered DataFrame
print(f"The value of the data set are \n{centerData}")
Output
The value of the data set are
x y z
0 10 17 3
1 20 45 13
2 23 60 13
3 43 77 16
4 56 89 18
5 90 100 29
The value of the data set are
x y z
0 -30.333333 -47.666667 -12.333333
1 -20.333333 -19.666667 -2.333333
2 -17.333333 -4.666667 -2.333333
3 2.666667 12.333333 0.666667
4 15.666667 24.333333 2.666667
5 49.666667 35.333333 13.666667
Python NumPy Programs »