Pandas dataframe select rows where a list-column contains any of a list of strings

Given a pandas dataframe, we have to select rows where a list-column contains any of a list of strings.
By Pranit Sharma Last updated : September 03, 2023

Pandas is a special tool that allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data.

A list is a collection of heterogeneous elements and it is mutable. Tuples are also another built-in data type of python used to store heterogeneous elements.

The string is a group of characters, these characters may consist of all the lower case, upper case, and special characters present on the keyboard of a computer system. A string is a data type and the number of characters in a string is known as the length of the string.

Problem statement

We are given a DataFrame with multiple columns, where a specific column only contains the list of strings. We need to extract a DataFrame containing only those rows, that contain some specific strings enclosed in a list.

Selecting rows where a list-column contains any of a list of strings

  • Step 1: Create a dictionary, and convert it into the DataFrame.
  • Step 2: Define some specific strings in a list.
  • Step 3: To select rows where a list column contains any of a list of strings, use the isin() method by passing the list of the specific strings that you created. Use the following code statement to achieve it:

Let us understand with the help of an example,

Python program to select rows where a list-column contains any of a list of strings

# Importing pandas package import pandas as pd # Importing numpy package import numpy as np # Creating a dictionary d = { 'code':[1,2,3,4], 'flowers':[['lily','rose'],['lotus','rose','sunflower'],['lily'],['orchid']] } # Creating DataFrame df = pd.DataFrame(d) # Display dataframe print('Original DataFrame:\n',df,'\n') # Defining some specific strings specific = ['rose','lily'] # Selecting some specific strings res = df[pd.DataFrame(df.flowers.tolist()).isin(specific).any(1).values] # Display result print("Result:\n",res)

Output

The output of the above program is:

Example: Select rows where a list-column contains any of a list of strings

Python Pandas Programs »

Advertisement
Advertisement

Comments and Discussions!

Load comments ↻


Advertisement
Advertisement
Advertisement



Copyright © 2024 www.includehelp.com. All rights reserved.