Concatenate strings from several rows using pandas groupby

Given a Pandas DataFrame, we have to concatenate strings from several rows using pandas groupby. By Pranit Sharma Last updated : September 17, 2023

Pandas is a special tool which allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structure in pandas. DataFrames consists of rows, columns and the data.

A string in pandas can also be converted into pandas DataFrame with the help StringIO method.

Problem statement

Given a Pandas DataFrame, we have to concatenate strings from several rows using pandas groupby. Suppose we have a product name ‘Frooti’ in our DataFrame and its sales in number. We may have this product multiple times in the DataFrame.

Concatenating strings from several rows using pandas groupby

For this purpose, we will use the df.groupby() method by passing the set of the columns whose value we have to concatenate and a new column name. Also, we will apply the lambda expression with the .join() method. Consider the following code statement to achieve this task:

df.groupby(['Name','Age'])['Messege'].apply(lambda x: ','.join(x)).reset_index()
Note

To work with pandas, we need to import pandas package first, below is the syntax:

import pandas as pd

To work with StringIO, we need to import io package from StringIO first, below is the syntax:

from io import StringIO

Let us understand with the help of an example,

Python program to concatenate strings from several rows using pandas groupby

# Importing pandas package
import pandas as pd

# Importing StringIO module from io module
from io import StringIO

# Creating a string
string= StringIO("""
Name;Age;Messege
Harry;20;OHH!!
Tom;23;JUST DO
Harry;20;YEAH
Nancy;20;WOW!
Tom;23;IT
""")

# Reading String in form of csv file
df=pd.read_csv(string, sep=";")

# Printing the DataFrame
print("String into DataFrame:\n\n",df,"\n\n")

# Applying function to concatenate strings
result = df.groupby(['Name','Age'])['Messege'].apply(lambda x: ','.join(x)).reset_index()

# Display result
print("Concatenated strings where colum values are same:\n\n",result)

Output

The output of the above program is:

Example: Concatenate strings from several rows

Python Pandas Programs »

Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.