Home »
Python
MultiIndex/Multi-level / Advance Indexing dataFrame | Pandas DataFrame
Here, we are going to learn about the MultiIndex/Multi-level / Advance Indexing dataFrame | Pandas DataFrame in Python.
Submitted by Sapna Deraje Radhakrishna, on January 06, 2020
MultiIndex dataFrame
import numpy as np
import pandas as pd
from numpy.random import randn
# create multi index
outside = ['G1','G1','G1','G2','G2','G2']
inside = [1,2,3,1,2,3]
# returns list of tuples
hier_index = list(zip(outside, inside))
hier_index = pd.MultiIndex.from_tuples(hier_index)
print(hier_index)
# create a dataframe with multiindex
df = pd.DataFrame(randn(6,2), hier_index, ['A','B'])
print(df)
Output
MultiIndex([('G1', 1),
('G1', 2),
('G1', 3),
('G2', 1),
('G2', 2),
('G2', 3)],
)
A B
G1 1 -1.253186 -0.118868
2 -0.179920 0.963057
3 -0.002503 1.253380
G2 1 -1.764094 -0.008036
2 -0.370147 0.982670
3 0.366474 1.063459
Call data from multi-level dataframe
print(df.loc['G1'])
'''
Output:
A B
1 0.086479 -1.367359
2 -0.044115 -1.299178
3 1.232692 -0.000859
'''
print(df.loc['G1'].loc[1])
'''
Output:
A -0.318528
B 0.324337
Name: 1, dtype: float64
'''
Name the index
By default, the indices are unnamed.
print(df)
'''
Output:
A B
G1 1 -0.647964 0.441856
2 -0.972269 0.564140
3 -0.014831 -1.052456
G2 1 -0.906410 -0.050427
2 0.093160 2.356613
3 0.744535 0.737698
'''
print(df.index.names) # returns no names
'''
Output:
[None, None]
'''
In order to name the indices,
df.index.names=['Groups','Num']
print(df)
'''
Output:
Groups Num
G1 1 -0.711317 -0.750429
2 0.595646 -0.581978
3 -0.108562 0.473462
G2 1 0.092306 -1.446577
2 -1.524394 0.951849
3 1.177215 -0.680476
'''
Grabbing information from multi-level dataframe
print(df)
'''
Output:
Groups Num
G1 1 -0.711317 -0.750429
2 0.595646 -0.581978
3 -0.108562 0.473462
G2 1 0.092306 -1.446577
2 -1.524394 0.951849
3 1.177215 -0.680476
'''
print(df.loc['G2'].loc[2]['B'])
# Output: -0.5843171213312147
Cross section function
This method returns a cross section of rows or columns from a series of data frame and is used when we work on multi-level index. Cross section has the ability to skip or go inside a multilevel index.
Syntax:
DataFrame.xs(self, key, axis=0, level=None, drop_level=True)[source]
print(df.xs('G1'))
'''
Output:
A B
Num
1 0.083378 -1.039373
2 -0.996403 -0.431392
3 1.403288 -1.020174
'''
Consider to return all values with Num index = 1. This can be tricky with .loc() however with .xs(), this can be achieved as mentioned in below example,
print(df)
'''
Output:
A B
Groups Num
G1 1 -0.423602 -0.171019
2 -0.373488 0.666210
3 2.019311 -0.621289
G2 1 0.844339 -1.068294
2 -0.778810 -0.885449
3 1.972061 -0.344479
'''
print(df.xs(1, level='Num'))
'''
Output:
A B
Groups
G1 -0.423602 -0.171019
G2 0.844339 -1.068294
'''