Home »
Python »
Python Programs
Pandas dataframe select rows where a list-column contains any of a list of strings
Given a pandas dataframe, we have to select rows where a list-column contains any of a list of strings.
By Pranit Sharma Last updated : September 03, 2023
Pandas is a special tool that allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data.
A list is a collection of heterogeneous elements and it is mutable. Tuples are also another built-in data type of python used to store heterogeneous elements.
The string is a group of characters, these characters may consist of all the lower case, upper case, and special characters present on the keyboard of a computer system. A string is a data type and the number of characters in a string is known as the length of the string.
Problem statement
We are given a DataFrame with multiple columns, where a specific column only contains the list of strings. We need to extract a DataFrame containing only those rows, that contain some specific strings enclosed in a list.
Selecting rows where a list-column contains any of a list of strings
- Step 1: Create a dictionary, and convert it into the DataFrame.
- Step 2: Define some specific strings in a list.
- Step 3: To select rows where a list column contains any of a list of strings, use the isin() method by passing the list of the specific strings that you created. Use the following code statement to achieve it:
Let us understand with the help of an example,
Python program to select rows where a list-column contains any of a list of strings
# Importing pandas package
import pandas as pd
# Importing numpy package
import numpy as np
# Creating a dictionary
d = {
'code':[1,2,3,4],
'flowers':[['lily','rose'],['lotus','rose','sunflower'],['lily'],['orchid']]
}
# Creating DataFrame
df = pd.DataFrame(d)
# Display dataframe
print('Original DataFrame:\n',df,'\n')
# Defining some specific strings
specific = ['rose','lily']
# Selecting some specific strings
res = df[pd.DataFrame(df.flowers.tolist()).isin(specific).any(1).values]
# Display result
print("Result:\n",res)
Output
The output of the above program is:
Python Pandas Programs »