Home »
Python
Python Pandas – Missing Data
Python Pandas: In this tutorial, we are going to learn about the working of the Missing Data in Python Pandas.
Submitted by Sapna Deraje Radhakrishna, on January 09, 2020
While using pandas, if there is a missing data point, pandas will automatically fill in that missing point with NULL or NAN.
Let's first define a dataFrame using Numpy and Pandas.
import numpy as np
import pandas as pd
d = {'A':[1,2,np.nan],'B':[3,np.nan,np.nan],'C':[4,5,6]}
df = pd.DataFrame(d)
print(df)
Output
A B C
0 1.0 3.0 4
1 2.0 NaN 5
2 NaN NaN 6
Pandas provide following options to work with the missing data,
Drop NAN values
# drops rows with null or NAN values
print(df.dropna())
'''
A B C
0 1.0 3.0 4
'''
# drops columns with null or NAN values
print(df.dropna(axis=1))
'''
C
0 4
1 5
2 6
'''
Specify a threshold to not drop any number of non-NA values.
# Does not remove the 2nd row because,
# it has less than 2 NAN values.
print(df.dropna(thresh=2))
'''
A B C
0 1.0 3.0 4
1 2.0 NaN 5
'''
Fill missing values
print(df.fillna('empty'))
'''
A B C
0 1 3 4
1 2 empty 5
2 empty empty 6
'''