What is the difference between join and merge in Pandas?

Learn about the main differences between join and merge in Python Pandas. By Pranit Sharma Last updated : September 20, 2023

Pandas is a special tool which allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structure in pandas. DataFrames consists of rows, columns and the data.

Pandas provides numerous ways to combine two Series or DataFames in order to perform effective and efficient data analytics. Sometimes our required data is not present in a single DataFrame and in that case we need to combine two or more DataFrames.

Difference between join() and merge() methods in Pandas

Pandas merge() and pandas join() are both the methods of combining or joining two DataFrames but the key difference between is that join() method allows us to combine the DataFrames on the basis of the index i.e., the row value, whereas the merge() method allows us to combine the DataFrames on the basis of specific columns instead of index values.

Advertisement

Example to demonstrate the difference between join() and merge() methods

For better understanding, let us first create a DataFrame,

# Importing pandas package
import pandas as pd

# Creating a Dictionary
dict1 = {
    'Name':['Amit Sharma','Bhairav Pandey','Chirag Bharadwaj','Divyansh Chaturvedi','Esha Dubey'],
    'Age':[20,20,23,19,18]
}

dict2 = {
    'Name':['Jatin Prajapati','Rahul Shakya','Gaurav Dixit','Pooja Sharma','Mukesh Jha'],
    'Age':[21,20,21,19,23]
}

# Creating a DataFrame
df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)

# Display DataFrame
print("DataFrame1:\n",df1,"\n")
print("DataFrame2:\n",df2,"\n")

Output:

Example 1: difference b/w join and merge in Pandas

Now we will apply merge() and join() separately on these DataFrames to understand the functional difference.

Advertisement

join() Method

# Using join method
df_join = df1.join(df2, lsuffix='_')

# Display method
print(df_join)

Output:

Example 2: difference b/w join and merge in Pandas

Here, the join method combines the two DataFrames on the basis of their indexes, and we can observe from the above example, that the second DataFrame is simply added to the first DataFrame with properly aligned rows. Also, since our column names are same for both the DataFrames, we have assigned a left suffix to the first DataFrame to distinguish the two DataFrames and to prevent from overlapping.

Advertisement

merge() Method

# Using merge method
df_merged = df1.merge(df2, on='Age', how='outer')

# Display result
print(df_merged)

Output:

Example 3: difference b/w join and merge in Pandas

Python Pandas Programs »

Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.