Home »
Python »
Python Programs
How to perform data binning in Python?
By Shivang Yadav Last updated : November 21, 2023
Data Binning
Data binning or discretization or bucketing is a data preprocessing technique where continuous data is divided into discrete bins or intervals. This process is useful for reducing the impact of small fluctuations in the data and can make it easier to analyze and visualize.
Data Binning in Python
Python programming language used in machine learning and AI. For this Python has added many libraries with methods to perform such tasks with efficiency. For performing data binning in Python, use the qcut() method present in the pandas library. The qcut() method converts the Discretize variable into equal-sized buckets based on rank or based on sample quantiles.
Syntax
pandas.qcut(x, q, labels=None, retbins=False, precision=3, duplicates='raise')
Parameters
- x: The array or dataframe used for binning
- q: Number of Quantity
- label: takes in an array which acts as a label for resulting bins. Values - array/ False.
- retbinsbool: optional parameter, that states whether the function returns (bin, bool) or not. Can be useful if bins are given as a scalar.
- precision: optional parameter, that states the precision at which to store and display the bins labels.
- duplicates: optional parameter, states whether to raise ValueError or drop non-unique values when the bin edges are not unique.
Python program to perform data binning
In this program, we have a dataframe and we are performing the data binning using the pandas.qcut() method.
import pandas as pd
# creating a DataFrame
matchData = pd.DataFrame(
{"runs": [4, 7, 12, 8, 50, 13, 100], "dots": [2, 15, 4, 7, 21, 18, 51]}
)
# perform data binning on points variable
matchData["points_bin"] = pd.qcut(matchData["runs"], q=3)
print("Binned Data is\n", matchData)
Output
The output of the above program is:
Binned Data is
runs dots points_bin
0 4 2 (3.999, 8.0]
1 7 15 (3.999, 8.0]
2 12 4 (8.0, 13.0]
3 8 7 (3.999, 8.0]
4 50 21 (13.0, 100.0]
5 13 18 (8.0, 13.0]
6 100 51 (13.0, 100.0]
Data binning in Python using labels
We can add quantifiers i.e. the number of bins and provide them with a label for binning. For this, we need to add the q and label parameters with values.
Python program to perform data binning with labels
import pandas as pd
# creating a DataFrame
matchData = pd.DataFrame(
{"runs": [4, 7, 12, 8, 50, 13, 100], "dots": [2, 15, 4, 7, 21, 18, 51]}
)
# perform data binning on points variable
matchData["points_bin"] = pd.qcut(
matchData["runs"], q=[0, 0.2, 0.4, 0.6, 0.8, 1], labels=["A", "B", "C", "D", "E"]
)
print("Binned Data is\n", matchData)
Output
The output of the above program is:
Binned Data is
runs dots points_bin
0 4 2 A
1 7 15 A
2 12 4 C
3 8 7 B
4 50 21 E
5 13 18 D
6 100 51 E
Python Pandas Programs »