Home »
Python
Python for data analysis – Pandas
Python | Data analysis using Pandas: In this tutorial, we are going to learn about the Data analysis using Pandas, which is an open source library build on top of NumPy.
By Sapna Deraje Radhakrishna Last updated : December 21, 2023
Pandas Overview
- Pandas is an open-source library built on top of NumPy
- It allows for fast analysis and data cleaning and preparation
- It excels in performance and productivity
- It also has built-in visualization features
- It can work with data from a wide variety of sources
How to install Pandas?
Using PIP, you can install pandas library by using the pip install pandas command.
Below is the example of running this command:
(venv) -bash-4.2$ pip install pandas
Requirement already satisfied: pandas in ./venv/lib/python3.6/site-packages (0.25.1)
Requirement already satisfied: python-dateutil>=2.6.1 in ./venv/lib/python3.6/site-packages (from pandas) (2.8.0)
Requirement already satisfied: pytz>=2017.2 in ./venv/lib/python3.6/site-packages (from pandas) (2019.2)
Requirement already satisfied: numpy>=1.13.3 in ./venv/lib/python3.6/site-packages (from pandas) (1.17.2)
Requirement already satisfied: six>=1.5 in ./venv/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas) (1.12.0)
venv) -bash-4.2$
Pandas Series
One-dimensional ndarray with axis labels, including time series. It is capable of holding data of any type. The axis labels are collectively known as an index. Series is very similar to a NumPy array, built on NumPy array object. However, the difference being a series can be indexed by labels.
Syntax
Below is the syntax to create a pandas.series() method:
class pandas.Series(
data=None,
index=None, dtype=None,
name=None,
copy=False,
fastpath=False
)
Creating Pandas Series
A pandas series is created by using the pandas.series() method.
Example
Below snippets shows examples of creating a series,
import numpy as np
import pandas as pd
labels = ['a','e','i','o'] #python list
data = [1,2,3,4] #python list
arr = np.array(data) #NumPy array
d = {'a':1,'b':2,'c':3} #python dict
# creating a series object with default index
print(pd.Series(data = data))
# creating a series object with labels as index
print(pd.Series(data = data, index = labels))
# creating a series with NumPy array
print(pd.Series(arr,index = labels))
# creating a series with dictionary,
# here the key becomes the index
print(pd.Series(d))
# Series can also hold built-in func
print(pd.Series(data = [sum, print, len]))
Output
0 1
1 2
2 3
3 4
dtype: int64
a 1
e 2
i 3
o 4
dtype: int64
a 1
e 2
i 3
o 4
dtype: int64
a 1
b 2
c 3
dtype: int64
0 <built-in function sum>
1 <built-in function print>
2 <built-in function len>
dtype: object
Operations on Pandas Series
1. Create two series object
import pandas as pd
ser1 = pd.Series([1,2,3,4],['Delhi','Bangalore','Mysore', 'Pune'])
print(ser1)
ser2 = pd.Series([1,2,5,4],['Delhi','Bangalore','Vizag','Pune'])
print(ser2)
Output
Delhi 1
Bangalore 2
Mysore 3
Pune 4
dtype: int64
Delhi 1
Bangalore 2
Vizag 5
Pune 4
dtype: int64
2. Retrieve the information from the series
To retrieve the information from the series, is similar to the Python dictionary, pass on the index-label of the given data type. In the above example, the index-label is of type String.
print(ser1['Delhi'])
# Output: 1
3. Adding Two Pandas Series
Now let's trying adding the two series,
print(ser1+ser2)
'''
Output:
Bangalore 4.0
Delhi 2.0
Mysore NaN
Pune 8.0
Vizag NaN
dtype: float64
'''
The pandas, adds the values of the index-labels. In case the match is not found, it will be put a NaN (null value). When the operations are performed on series or any NumPy/Pandas based object, the integers will be converted to float.