Home »
Python »
Python Programs
How to remove illegal characters so a dataframe can write to Excel?
Given a pandas dataframe, we have to remove illegal characters so a dataframe can write to Excel.
Submitted by Pranit Sharma, on November 23, 2022
Pandas is a special tool that allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data.
An xlsx file is a spreadsheet file that can be created with certain software, especially with the popularly known MS Excel.
Problem statement
Suppose we are given a data frame with some columns and these columns are of string type there are some characters in the column that the Excel writer does not like, we need to find a way to iterate through a data frame and remove these characters so that the data frame can be written into an excel sheet.
Removing illegal characters so a dataframe can write to Excel
For this purpose, we will simply use the df.applymap() method inside which we will use a Lambda function and write a comprehension statement where we will iterate through a dataframe and check if the value is a string, we will first encode the string and then decode the string.
Let us understand with the help of an example,
Python program to remove illegal characters so a dataframe can write to Excel
# Importing pandas package
import pandas as pd
# Importing numpy package
import numpy as np
# Creating dataframe
df = pd.DataFrame({'A':['/','?','*','[',']']})
# Display original DataFrame
print("Original DataFrame:\n",df,"\n")
# encoding and decoding
res = df.applymap(lambda x: x.encode('unicode_escape').
decode('utf-8') if isinstance(x, str) else x)
# Display result
print("Result:\n",res)
# Writing to an excel file
df.to_excel('illegal.xlsx')
print("Excel file created")
Output
The output of the above program is:
Python Pandas Programs »