Home »
Python »
Python Programs
Stratified Sampling in Pandas
Python Pandas | Stratified Sampling: Learn, how to generate stratified samples of size n from a dataset?
By Pranit Sharma Last updated : September 17, 2023
Pandas is a special tool that allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data.
Python Pandas | Stratified Sampling
Stratified random sampling is a method of sampling that involves the division of a population into smaller subgroups known as strata. To generate a stratified sample, we need to pass min when passing the number to the sample.
We can use the groupby() method and apply a lambda function on the grouped object to find the samples.
Let us understand with the help of an example,
Python program to demonstrate the example of stratified sampling in pandas
# Importing pandas package
import pandas as pd
# Creating two dictionaries
d1 ={'A':[1, 1, 1, 2, 2, 2, 2, 3, 4, 4],'B':[i for i in range(10)] }
# Creating DataFrame
df = pd.DataFrame(d1)
# Display the DataFrame
print("Original DataFrame:\n",df,"\n\n")
# Finding stratified samples
res = df.groupby('A', group_keys=False).apply(lambda x: x.sample(min(len(x), 2)))
# Display result
print("Result:\n",res)
Output
The output of the above program is:
Python Pandas Programs »