Home »
Python »
Python Programs
How to calculate trimmed mean in Python?
By Shivang Yadav Last updated : November 22, 2023
Trimmed Mean
A statistical measure of central tendency is calculated by removing a specified percentage of the smallest and largest values from a dataset and then computing the mean (average) of the remaining values. Trimming is done to reduce the impact of outliers or extreme values on the calculated mean, making the measure more robust to extreme data points.
Steps/Algorithm
The steps to calculate trimmed mean in Python are:
- Sort the data in ascending order.
- The percentage of data to be trimmed from both ends is to be determined. This percentage is typically denoted by p. For instance, if 10% of data is to be trimmed from each end, p would be 10%.
- Calculate the number of data points to trim from each end. This can be done by multiplying p by the total number of data points and dividing by 100. Let's call this n.
- Now, remove the first n data points and the last n data points from the sorted dataset.
- Calculate the mean (average) of the remaining data points.
Trimmed means are useful when dealing with datasets that contain outliers or extreme values that can skew the traditional mean. By removing a specified portion of extreme values, the trimmed mean provides a more robust estimate of central tendency. The choice of the percentage to trim “p” depends on the specific characteristics of your data and the extent to which you want to reduce the influence of outliers. Common choices include 5%, 10%, and 20%, but the selection of p should be based on the context of your analysis and the nature of your data.
Calculating Trimmed Mean
To calculate trimmed mean, use the trim_mean() method of the scipy library.
Syntax
Below is the syntax of trim_mean() method -
trim_mean(data, fractionTrim)
Here,
- data is the set of data whose mean needs to be trimmed. This data can be an array or multiple array.
- fractionTrim is the fraction by which the mean is to be trimmed.
Example 1: Calculate trimmed mean of an array
# Python program to calculate trimmer mean
# of an array
from scipy import stats
meanArray = [2, 15, 9, 10, 14, 18, 3, 13, 17, 11, 1, 8]
print(f"The values of the array are \n{meanArray}")
trimMean = stats.trim_mean(meanArray, 0.25)
print(f"The trimmed mean is \n{trimMean}")
Output
The values of the array are
[2, 15, 9, 10, 14, 18, 3, 13, 17, 11, 1, 8]
The trimmed mean is
10.833333333333334
The same method can be implemented on multiple array data structures. The syntax will intake multiple arrays instead of single one.
Example 2: Calculate trimmed mean of multiple arrays
# Python program to perform trimmed mean operation
# on multiple arrays
from scipy import stats
import pandas as pd
boundaries = pd.DataFrame(
{"fours": [5, 2, 3, 1, 9, 3, 1, 6], "sixes": [2, 1, 0, 0, 5, 1, 4, 2]}
)
print(f"The values of the array are \n{boundaries}")
trimMean = stats.trim_mean(boundaries[["fours", "sixes"]], 0.05)
print(f"The trimmed mean is \n{trimMean}")
Output
The values of the array are
fours sixes
0 5 2
1 2 1
2 3 0
3 1 0
4 9 5
5 3 1
6 1 4
7 6 2
The trimmed mean is
[3.75 1.875]
Python SciPy Programs »