Python Pandas – Data Input and Output

In this tutorial, we are going to learn about the Data Input and Output in Python Pandas – Read a CSV file, write to CSV file, reading from HTML file, etc.
Submitted by Sapna Deraje Radhakrishna, on February 03, 2020

Pandas as a library can read and write data to a wide variety of sources. In this article, we would concentrate on the following,

CSV
Excel
HTML
SQL

In order to work with HTML files and SQL database, along with pandas, we would need to install the below library as well,

pip install sqlalchemy
pip install lxml
pip install html5lib
pip install beautifulsoup4
pip install openpyxl

Note: In the below example, the reference files are located in the same folder as the python files are.

Read a CSV file

import numpy as np
import pandas as pd

# example is the csv file name
df = pd.read_csv('example')
print(df)

'''
Output:
    a   b   c   d
0   0   1   2   3
1   4   5   6   7
2   8   9  10  11
3  12  13  14  15
'''

Pandas has built in function to read from numerous type of file format, such as csv , clipboard, html, json etc.

Write to CSV file

In the above example, we have the csv content assigned to a dataFrame variable called df.

# here index = False coz I don't want 
# to save the index as coulmn
df.to_csv('test_csv', index=False) 
pd.read_csv('test_csv')

'''
Output:
    a   b   c   d
0   0   1   2   3
1   4   5   6   7
2   8   9  10  11
3  12  13  14  15
'''

Reading from Excel file

While the pandas can read the data from the Excel file, it is important to note that 'pandas' can not read or import the formulas or images or macros. Trying to read them can cause pandas to crash.

Also, there can be an error such as requesting us to install the xlrd library in case it is not available. [pip install xlrd].

pd.read_excel('Excel_Sample.xlsx')

'''
Output:
   Unnamed: 	0   	a   b   c   d
0           	0   	0   1   2   3
1           	1   	4   5   6   7
2           	2   	8   9  10  11
3           	3  	12  13  14  15
'''

df.to_excel('Excel_Sample_2.xlsx',sheet_name='Sheet1')

HTML input

Pandas read_html() function will read tables off of a webpage and return a list of DataFrame objects,

df = pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')
print(df[0] )


    Bank Name City ST CERT Acquiring Institution Closing Date Updated Date Loss Share Type Agreement Terminated Termination Date 
    0 First CornerStone Bank King of Prussia PA 35312 First-Citizens Bank & Trust Company May 6, 2016 July 12, 2016 none NaN NaN
    1 Trust Company Bank Memphis TN 9956 The Bank of Fayette County April 29, 2016 August 4, 2016 none NaN NaN
    2 North Milwaukee State Bank Milwaukee WI 20364 First-Citizens Bank & Trust Company March 11, 2016 June 16, 2016 none NaN NaN
    3 Hometown National Bank Longview WA 35156 Twin City Bank October 2, 2015 April 13, 2016 none NaN NaN
    4 The Bank of Georgia Peachtree City GA 35259 Fidelity Bank October 2, 2015 April 13, 2016 none NaN NaN
    5 Premier Bank Denver CO 34112 United Fidelity Bank, fsb July 10, 2015 July 12, 2016 none NaN NaN
    6 Edgebrook Bank Chicago IL 57772 Republic Bank of Chicago May 8, 2015 July 12, 2016 none NaN NaN
    7 Doral BankEn Espanol San Juan PR 32102 Banco Popular de Puerto Rico February 27, 2015 May 13, 2015 none NaN NaN
    8 Capitol City Bank & Trust Company Atlanta GA 33938 First-Citizens Bank & Trust Company February 13, 2015 April 21, 2015 none NaN NaN
    9 Highland Community Bank Chicago IL 20290 United Fidelity Bank, fsb January 23, 2015 April 21, 2015 none NaN NaN

SQL

The pandas.io.sql module provides a collection of query wrappers to both facilitate data retrieval and to reduce dependency on DB-specific API. Database abstraction is provided by SQLAlchemy if installed. Also, a driver library is required for the database. Examples of such drivers are psycopg2 for PostgreSQL or pymysql for MySQL. For SQLite, this is included in Python's standard library by default.

If SQLAlchemy is not installed, a fallback is only provided for SQLite (and for MySQL for backward compatibility, but this is deprecated and will be removed in a future version). This mode requires a Python database adapter that respects the Python DB-API.

See also some cookbook examples for some advanced strategies.

The key functions are,

read_sql_table(table_name, con[, schema, ...]) : Read SQL database table into a DataFrame.
read_sql_query(sql, con[, index_col, ...]) : Read SQL query into a DataFrame.
read_sql(sql, con[, index_col, ...]) : Read SQL query or database table into a DataFrame.
DataFrame.to_sql(name, con[, flavor, ...]) : Write records stored in a DataFrame to a SQL database.

from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')
df.to_sql('data', engine)
sql_df = pd.read_sql('data',con=engine)
print(sql_df)

'''
Output:
   index         0   a   b   c   d
0      0           0   0   1   2   3
1      1           1   4   5   6   7
2      2           2   8   9  10  11
3      3           3  12  13  14  15
'''

Comments and Discussions!

Load comments ↻

Advertisement
Advertisement
Advertisement

Top MCQs

Top Programs/Examples

About

Student's Section

Tech Learning @ Home