Random Sample of a subset of a dataframe in Pandas

Learn, how to create random sample of a subset of a dataframe in Python Pandas? By Pranit Sharma Last updated : October 03, 2023

Pandas is a special tool that allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data.

Problem statement

Suppose, we are given a DataFrame with a large number of entries and we need to split some data from the subset of the entire DataFrame.

Random Sample of a subset of a dataframe

For this purpose, we will use pandas.DataFrame.sample() method. It is used to return a random sample of items from an object.

Syntax:

DataFrame.sample(
    n=None, 
    frac=None, 
    replace=False, 
    weights=None, 
    random_state=None, 
    axis=None, 
    ignore_index=False
    )

Parameter(s):

  • n: Number of items from the axis to return.
  • frac: fraction of axis to be returned.
  • replace: bool value

ADVERTISEMENT

Let us understand with the help of an example,

Python program to create random sample of a subset of a dataframe

# Importing pandas package
import pandas as pd

# Creating a list
l = [[1, 2], [3, 4], [5, 6], [7, 8]]

# Creating a DataFrame
df = pd.DataFrame(l,columns=['A','B'])

# Display original DataFrame
print("Original Dataframe:\n",df,"\n")

# Getting a sample
res = df.sample(2)

# Display this random sample
print("Sample of subset:\n",res,"\n")

Output

The output of the above program is:

Example: Random Sample of a subset of a dataframe

Python Pandas Programs »

Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.