Home » Python

MultiIndex/Multi-level / Advance Indexing dataFrame | Pandas DataFrame

Here, we are going to learn about the MultiIndex/Multi-level / Advance Indexing dataFrame | Pandas DataFrame in Python.
Submitted by Sapna Deraje Radhakrishna, on January 06, 2020

MultiIndex dataFrame

import numpy as np
import pandas as pd
from numpy.random import randn

# create multi index
outside = ['G1','G1','G1','G2','G2','G2']
inside = [1,2,3,1,2,3]

# returns list of tuples
hier_index = list(zip(outside, inside)) 
hier_index = pd.MultiIndex.from_tuples(hier_index)

print(hier_index)

# create a dataframe with multiindex
df = pd.DataFrame(randn(6,2), hier_index, ['A','B'])
print(df)

Output

MultiIndex([('G1', 1),
            ('G1', 2),
            ('G1', 3),
            ('G2', 1),
            ('G2', 2),
            ('G2', 3)],
           )
             A         B
G1 1 -1.253186 -0.118868
   2 -0.179920  0.963057
   3 -0.002503  1.253380
G2 1 -1.764094 -0.008036
   2 -0.370147  0.982670
   3  0.366474  1.063459

Call data from multi-level dataframe

print(df.loc['G1'])

'''
Output:
          A         B
1  0.086479 -1.367359
2 -0.044115 -1.299178
3  1.232692 -0.000859
'''
print(df.loc['G1'].loc[1])

'''
Output:
A   -0.318528
B    0.324337
Name: 1, dtype: float64
'''

Name the index

By default, the indices are unnamed.

print(df)
'''
Output:
             A         B
G1 1 -0.647964  0.441856
   2 -0.972269  0.564140
   3 -0.014831 -1.052456
G2 1 -0.906410 -0.050427
   2  0.093160  2.356613
   3  0.744535  0.737698
'''

print(df.index.names) # returns no names
'''
Output:
[None, None]
'''

In order to name the indices,

df.index.names=['Groups','Num']
print(df)

'''
Output:
Groups Num
G1     1   -0.711317 -0.750429
       2    0.595646 -0.581978
       3   -0.108562  0.473462
G2     1    0.092306 -1.446577
       2   -1.524394  0.951849
       3    1.177215 -0.680476
'''

Grabbing information from multi-level dataframe

print(df)

'''
Output:
Groups Num
G1     1   -0.711317 -0.750429
       2    0.595646 -0.581978
       3   -0.108562  0.473462
G2     1    0.092306 -1.446577
       2   -1.524394  0.951849
       3    1.177215 -0.680476
'''

print(df.loc['G2'].loc[2]['B'])
# Output: -0.5843171213312147

Cross section function

This method returns a cross section of rows or columns from a series of data frame and is used when we work on multi-level index. Cross section has the ability to skip or go inside a multilevel index.

Syntax:

    DataFrame.xs(self, key, axis=0, level=None, drop_level=True)[source]
print(df.xs('G1'))

'''
Output:
            A         B
Num
1    0.083378 -1.039373
2   -0.996403 -0.431392
3    1.403288 -1.020174
'''

Consider to return all values with Num index = 1. This can be tricky with .loc() however with .xs(), this can be achieved as mentioned in below example,

print(df)
'''
Output:
                   A         B
Groups Num
G1     1   -0.423602 -0.171019
       2   -0.373488  0.666210
       3    2.019311 -0.621289
G2     1    0.844339 -1.068294
       2   -0.778810 -0.885449
       3    1.972061 -0.344479
'''

print(df.xs(1, level='Num'))
'''
Output:
               A         B
Groups
G1     -0.423602 -0.171019
G2      0.844339 -1.068294
'''


Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.