Home »
Python »
Python Programs
Python Pandas - Sort by group aggregate and column
In this article, we are going to sort by group aggregate and column, which means we are going to sort the values of a column by the aggregate sum of another column and then by the value of another column.
By Pranit Sharma Last updated : September 26, 2023
Pandas is a special tool that allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data.
What is sorting?
Sorting refers to rearranging a series or a sequence in a particular fashion (ascending, descending, or in any specific pattern). Sorting in pandas DataFrame is required for effective analysis of the data.
Problem statement
Suppose, we have a column named "A" that contains some string values, we will sort this column with aggregate sum values of another column names "B" which contains some numerical values. Then we will sort these values by another column named "C" which contains some Boolean values.
Sorting by group aggregate and column
To sort by group aggregate and column, we will apply df.groupby() on column "A", we will then use the key of this result, this key is nothing but column "B" and applies the transform and sort method. We will repeat the same process with column "C".
Let us understand with the help of an example,
Python program to sort by group aggregate and column
# Importing pandas package
import pandas as pd
# Creating a dictionary
d = {
'A':['Oranges','Bananas','Guavas','Mangoes','Apples'],
'B':[212.212,3312.3121,1256.3452,2565.565,748.237],
'C':[False,True,True,False,False]
}
# Creating DataFrame
df = pd.DataFrame(d)
# Display Original DataFrames
print("Created DataFrame: 2\n",df,"\n")
# Grouping by column A by aggregate sum of B
df['result'] = df.groupby('A')['B'].transform(sum)
# Sorting by C
df = df.sort_values(['result','C'], ascending=[True, False]).drop('result', axis=1)
# Display modified DataFrame
print("Modified DataFrame:\n",df)
Output
The output of the above program is:
Python Pandas Programs »