Home »
Python
Pandas DataFrame in Python (With Examples)
Python | Pandas DataFrame: In this tutorial, we are going to learn about the Pandas DataFrame with syntax, examples of creation DataFrame, indexing, accessing, etc.
By Sapna Deraje Radhakrishna, on December 24, 2019
Python | Pandas DataFrame
A Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects.
Syntax to Create a DataFrame
Consider the below given statement to create a Pandas DataFrame in Python:
class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
Example 1: Create a Pandas DataFrame
import numpy as np
import pandas as pd
from numpy.random import randn
np.random.seed(101)
df = pd.DataFrame(randn(5,4), ['A','B','C','D','E'],['W','X','Y','Z'])
print(df)
Output
W X Y Z
A 2.706850 0.628133 0.907969 0.503826
B 0.651118 -0.319318 -0.848077 0.605965
C -2.018168 0.740122 0.528813 -0.589001
D 0.188695 -0.758872 -0.933237 0.955057
E 0.190794 1.978757 2.605967 0.683509
In the above example, each of the columns is a series and the respective rows are the common index-labels.
Example 2: Indexing and Selection in a DataFrame
In order to do indexing and selection, the approach followed is,
print(df['W'])
'''
Output:
A 2.706850
B 0.651118
C -2.018168
D 0.188695
E 0.190794
Name: W, dtype: float64
'''
print(type(df['W']))
'''
Output:
<class 'pandas.core.series.Series'>
'''
The above explains that dataframe is a bunch of series with common index-labels. Another approach to retrieve the series from the dataframe is following the SQL way (less preferred way),
print(df.W)
'''
Output:
A 2.706850
B 0.651118
C -2.018168
D 0.188695
E 0.190794
Name: W, dtype: float64
'''
Example 3: Getting Multiple Columns from a DataFrame
print(df[['W','X']])
'''
Output:
W X
A 2.706850 0.628133
B 0.651118 -0.319318
C -2.018168 0.740122
D 0.188695 -0.758872
E 0.190794 1.978757
'''
print(df[list('W''X')])
'''
Output:
W X
A 2.706850 0.628133
B 0.651118 -0.319318
C -2.018168 0.740122
D 0.188695 -0.758872
E 0.190794 1.978757
'''
Example 4: Create a New Columns in the DataFrame
df['new'] = df['X']+df['Y']
print(df)
'''
Output:
W X Y Z new
A 2.706850 0.628133 0.907969 0.503826 1.536102
B 0.651118 -0.319318 -0.848077 0.605965 -1.167395
C -2.018168 0.740122 0.528813 -0.589001 1.268936
D 0.188695 -0.758872 -0.933237 0.955057 -1.692109
E 0.190794 1.978757 2.605967 0.683509 4.584725
'''
Example 5: Remove a Column from the DataFrame
# doesn't remove from the object df
df.drop('W', axis=1)
print(df)
'''
Output:
W X Y Z new
A 2.706850 0.628133 0.907969 0.503826 1.536102
B 0.651118 -0.319318 -0.848077 0.605965 -1.167395
C -2.018168 0.740122 0.528813 -0.589001 1.268936
D 0.188695 -0.758872 -0.933237 0.955057 -1.692109
E 0.190794 1.978757 2.605967 0.683509 4.584725
'''
df = df.drop('W', axis=1)
print(df)
'''
Output:
X Y Z new
A 0.628133 0.907969 0.503826 1.536102
B -0.319318 -0.848077 0.605965 -1.167395
C 0.740122 0.528813 -0.589001 1.268936
D -0.758872 -0.933237 0.955057 -1.692109
E 1.978757 2.605967 0.683509 4.584725
'''
# use inplace = True to retain the changes
df.drop('X', axis=1, inplace = True)
print(df)
'''
Output:
Y Z new
A 0.907969 0.503826 1.536102
B -0.848077 0.605965 -1.167395
C 0.528813 -0.589001 1.268936
D -0.933237 0.955057 -1.692109
E 2.605967 0.683509 4.584725
'''
Example 6: Remove a Row from the DataFrame
df.drop('E', axis=0, inplace = True)
print(df)
'''
Output:
Y Z new
A 0.907969 0.503826 1.536102
B -0.848077 0.605965 -1.167395
C 0.528813 -0.589001 1.268936
D -0.933237 0.955057 -1.692109
'''
Example 7: Shape of the DataFrame
To order to explain the reasoning behind the value 0 and 1 to axis, we have to know the shape of the dataframe
print(df)
'''
Output:
Y Z new
A 0.907969 0.503826 1.536102
B -0.848077 0.605965 -1.167395
C 0.528813 -0.589001 1.268936
D -0.933237 0.955057 -1.692109
'''
print(df.shape)
'''
Output:
(4, 3)
'''
The return type of shape is a tuple, and in above example the 0th index of tuple (4) refers to number of rows and 1st index of tuple (3) refers to the number of columns and hence the value given to axis as 0 or 1 while deleting the row/column.
Example 8: Select Rows from a DataFrame
print(df)
'''
Output:
Y Z new
A 0.907969 0.503826 1.536102
B -0.848077 0.605965 -1.167395
C 0.528813 -0.589001 1.268936
D -0.933237 0.955057 -1.692109
'''
# here the argument is the location based index
print(df.loc['B'])
'''
Output:
Y -0.848077
Z 0.605965
new -1.167395
Name: B, dtype: float64
'''
# here the argument is the numerical based index of the row
print(df.iloc[1] )
'''
Output:
Y -0.848077
Z 0.605965
new -1.167395
Name: B, dtype: float64
'''
Example 9: Select Subsets of Rows and Columns from a DataFrame
print(df)
'''
Output:
Y Z new
A 0.907969 0.503826 1.536102
B -0.848077 0.605965 -1.167395
C 0.528813 -0.589001 1.268936
D -0.933237 0.955057 -1.692109
'''
# row, column
print(df.loc['C','Y'])
'''
Output: 0.5288134940893595
'''
# pass the list of rows and columns to get the subsets
print(df.loc[['B','C'],['Y','Z']])
'''
Output:
Y Z
B -0.848077 0.605965
C 0.528813 -0.589001
'''