Home »
Python »
Python Programs
Python Pandas: Convert a column of list to dummies
Given a Pandas DataFrame, where each column contains a list, we need to create a series of dummy columns.
Submitted by Pranit Sharma, on August 10, 2022
Pandas is a special tool that allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data.
What are columns and dummy columns?
Columns are the different fields that contain their particular values when we create a DataFrame. We can perform certain operations on both rows & column values.
Dummy columns in Pandas contain categorical data into dummy or indicator variables. These are used for data analysis. In most cases, this is a feature of any action being described.
Problem statement
Given a Pandas DataFrame, where each column contains a list, we need to create a series of dummy columns.
Convert a column of list to dummies
To get a dummy column, we must use pandas.get_dummies(), this method returns all the dummy values of each column passed as a series inside it.
pandas.get_dummies() Method
This method converts categorical variable into dummy/indicator variables.
Syntax:
pandas.get_dummies(
data, prefix=None,
prefix_sep='_',
dummy_na=False,
columns=None,
sparse=False,
drop_first=False,
dtype=None
)
Parameters:
- data: Value which is to be manipulated.
- prefix: If we want any string to put before the value, we Pass a list with a length equal to the number of columns when calling get_dummies on a DataFrame. The default value is None.
- prefix_sep: It is the separator used between value and its prefix.
- columns: Column names in the DataFrame that needs to be
- encoded: The default value is None.
Let us understand with the help of an example,
Python program to convert a column of list to dummies
# Importing pandas package
import pandas as pd
# Creating a series
s = pd.Series({0: ['One', 'Two', 'Three'], 1:['Three'],
2: ['Two', 'Three', 'Five'],
3: ['One', 'Three'],
4: ['Two', 'Five']})
# Display original Series
print("Original Series :\n",s,"\n")
# Using pd.get_dummies() method
result = pd.get_dummies(s.apply(pd.Series).stack()).sum(level=0)
# display result
print("Result:\n",result)
Output
The output of the above program is:
Python Pandas Programs »