Home »
Python »
Python Programs
Removing newlines from messy strings in pandas dataframe cells
Given a pandas dataframe, we have to remove newlines from a messy strings in its cells.
Submitted by Pranit Sharma, on November 15, 2022
Overview
Pandas is a special tool that allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data.
The string is a group of characters, these characters may consist of all the lower case, upper case, and special characters present on the keyboard of a computer system. A string is a data type and the number of characters in a string is known as the length of the string.
What is messy string?
By messy string, we mean that this string contains the new line expression but there is no line space in this string which makes it difficult to understand and analyze.
We need to remove this new line expression and replace it with white space so that it could become readable.
Problem statement
Suppose we are given a DataFrame with sting-type columns and this column has a messy string.
Removing newlines from messy strings in pandas dataframe cells
For removing newlines from messy strings, you can use pandas.DataFrame.replace() method and pass the new line expression and white space as parameters. This method finds a specified value on a DataFrame and replaces it with another value on all columns & rows.
Let us understand with the help of an example,
Python program to remove newlines from a messy strings in its cells
# Importing pandas package
import pandas as pd
# Importing numpy package
import numpy as np
# Creating a dictionary
d = {'A':['This\nis\na\ntutorial\nof\nincludehelp']}
# Creating DataFrame
df = pd.DataFrame(d)
# Display dataframe
print('Original DataFrame:\n',df,'\n')
# Removing new line expression
df = df.replace('\n',' ', regex=True)
# Display new DataFrame
print("New DF:\n",df)
Output
The output of the above program is:
Python Pandas Programs »