Label encoding across multiple columns in scikit-learn

Python Pandas | Label Encoding: Learn about the label encoding across multiple columns in scikit-learn. By Pranit Sharma Last updated : September 20, 2023

Label Encoding is the process of converting the labels into a number format so as to make them available to the machine in a machine-readable form. Machine learning algorithms can then decide in a better way how those labels must be operated.

Here, we will use sklearn library, which is used for applying machine learning process, usually the algorithms used for training the model and testing the model falls under sklearn library.

We will now understand with the help of an example that how we can do label encoding across multiple columns in sklearn.

Problem statement

Given a Pandas DataFrame, we have to perform label encoding across multiple columns using scikit-learn.

Label encoding across multiple columns in scikit-learn

For this purpose, we will import preprocessing from sklearn library which will use Labelencoder() method along with the .fit_transform property in order to achieve label encoding.

ADVERTISEMENT

Let us understand with help of an example,

Python program for lencoding across multiple columns in scikit-learn

# Importing pandas package
import pandas as pd

# Importing Sklearn library
import sklearn

from sklearn import preprocessing

# Creating a dictionary
d={
    "Name":['Hari','Mohan','Neeti','Shaily','Ram','Umesh','Shirish','Rashmi','Pradeep','Neelam','Jitendra','Manoj','Rishi'],
    "Age":[25,36,26,21,30,33,35,40,39,45,42,39,48],
    "Gender":['Male','Male','Female','Female','Male','Male','Male','Female','Male','Female','Male','Male','Male'],
    "Profession":['Doctor','Teacher','Singer','Student','Engineer','CA','Cricketer','Teacher','Teacher','Politician',
                 'Doctor','Manager','Clerk'],
    "Title":['Mr','Mr','Ms','Ms','Mr','Mr','Mr','Ms','Mr','Ms','Mr','Mr','Mr'],
    "Salary":[200000,50000,500000,0,100000,75000,10000000,50000,50000,200000,200000,150000,15000],
    "Location":['Amritsar','Indore','Mumbai','Bhopal','Gurugram','Pune','Banglore','Ranchi','Surat','Chennai','Shimla','Kolkata','Raipur'],
    "Marriage Status":[0,1,1,0,1,0,0,1,1,1,0,1,0]
}

# Now we will create DataFrame
df=pd.DataFrame(d)

# Encoding all the columns
df.apply(preprocessing.LabelEncoder().fit_transform)

# Viewing the created DataFrame
print("Created DataFrame:\n")
print(df,"\n\n")

Output

The output of the above program is:

Example: Label encoding across multiple columns

Python Pandas Programs »

Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.