Pandas dataframe select rows where a list-column contains any of a list of strings

Given a pandas dataframe, we have to select rows where a list-column contains any of a list of strings.
By Pranit Sharma Last updated : September 03, 2023

Pandas is a special tool that allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data.

A list is a collection of heterogeneous elements and it is mutable. Tuples are also another built-in data type of python used to store heterogeneous elements.

The string is a group of characters, these characters may consist of all the lower case, upper case, and special characters present on the keyboard of a computer system. A string is a data type and the number of characters in a string is known as the length of the string.

Problem statement

We are given a DataFrame with multiple columns, where a specific column only contains the list of strings. We need to extract a DataFrame containing only those rows, that contain some specific strings enclosed in a list.

Selecting rows where a list-column contains any of a list of strings

  • Step 1: Create a dictionary, and convert it into the DataFrame.
  • Step 2: Define some specific strings in a list.
  • Step 3: To select rows where a list column contains any of a list of strings, use the isin() method by passing the list of the specific strings that you created. Use the following code statement to achieve it:

Let us understand with the help of an example,

Python program to select rows where a list-column contains any of a list of strings

# Importing pandas package
import pandas as pd

# Importing numpy package
import numpy as np

# Creating a dictionary
d = {
    'code':[1,2,3,4],
    'flowers':[['lily','rose'],['lotus','rose','sunflower'],['lily'],['orchid']]
}

# Creating DataFrame
df = pd.DataFrame(d)

# Display dataframe
print('Original DataFrame:\n',df,'\n')

# Defining some specific strings
specific = ['rose','lily']

# Selecting some specific strings
res = df[pd.DataFrame(df.flowers.tolist()).isin(specific).any(1).values]

# Display result
print("Result:\n",res)

Output

The output of the above program is:

Example: Select rows where a list-column contains any of a list of strings

Python Pandas Programs »

Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.