Home »
Python
Python Pandas – Data Input and Output
In this tutorial, we are going to learn about the Data Input and Output in Python Pandas – Read a CSV file, write to CSV file, reading from HTML file, etc.
Submitted by Sapna Deraje Radhakrishna, on February 03, 2020
Pandas as a library can read and write data to a wide variety of sources. In this article, we would concentrate on the following,
In order to work with HTML files and SQL database, along with pandas, we would need to install the below library as well,
- pip install sqlalchemy
- pip install lxml
- pip install html5lib
- pip install beautifulsoup4
- pip install openpyxl
Note: In the below example, the reference files are located in the same folder as the python files are.
Read a CSV file
import numpy as np
import pandas as pd
# example is the csv file name
df = pd.read_csv('example')
print(df)
'''
Output:
a b c d
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
'''
Pandas has built in function to read from numerous type of file format, such as csv , clipboard, html, json etc.
Write to CSV file
In the above example, we have the csv content assigned to a dataFrame variable called df.
# here index = False coz I don't want
# to save the index as coulmn
df.to_csv('test_csv', index=False)
pd.read_csv('test_csv')
'''
Output:
a b c d
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
'''
Reading from Excel file
While the pandas can read the data from the Excel file, it is important to note that 'pandas' can not read or import the formulas or images or macros. Trying to read them can cause pandas to crash.
Also, there can be an error such as requesting us to install the xlrd library in case it is not available. [pip install xlrd].
pd.read_excel('Excel_Sample.xlsx')
'''
Output:
Unnamed: 0 a b c d
0 0 0 1 2 3
1 1 4 5 6 7
2 2 8 9 10 11
3 3 12 13 14 15
'''
df.to_excel('Excel_Sample_2.xlsx',sheet_name='Sheet1')
HTML input
Pandas read_html() function will read tables off of a webpage and return a list of DataFrame objects,
df = pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')
print(df[0] )
Bank Name | City | ST | CERT | Acquiring Institution | Closing Date | Updated Date | Loss Share Type | Agreement Terminated | Termination Date | |
0 | First CornerStone Bank | King of Prussia | PA | 35312 | First-Citizens Bank & Trust Company | May 6, 2016 | July 12, 2016 | none | NaN | NaN |
1 | Trust Company Bank | Memphis | TN | 9956 | The Bank of Fayette County | April 29, 2016 | August 4, 2016 | none | NaN | NaN |
2 | North Milwaukee State Bank | Milwaukee | WI | 20364 | First-Citizens Bank & Trust Company | March 11, 2016 | June 16, 2016 | none | NaN | NaN |
3 | Hometown National Bank | Longview | WA | 35156 | Twin City Bank | October 2, 2015 | April 13, 2016 | none | NaN | NaN |
4 | The Bank of Georgia | Peachtree City | GA | 35259 | Fidelity Bank | October 2, 2015 | April 13, 2016 | none | NaN | NaN |
5 | Premier Bank | Denver | CO | 34112 | United Fidelity Bank, fsb | July 10, 2015 | July 12, 2016 | none | NaN | NaN |
6 | Edgebrook Bank | Chicago | IL | 57772 | Republic Bank of Chicago | May 8, 2015 | July 12, 2016 | none | NaN | NaN |
7 | Doral BankEn Espanol | San Juan | PR | 32102 | Banco Popular de Puerto Rico | February 27, 2015 | May 13, 2015 | none | NaN | NaN |
8 | Capitol City Bank & Trust Company | Atlanta | GA | 33938 | First-Citizens Bank & Trust Company | February 13, 2015 | April 21, 2015 | none | NaN | NaN |
9 | Highland Community Bank | Chicago | IL | 20290 | United Fidelity Bank, fsb | January 23, 2015 | April 21, 2015 | none | NaN | NaN |
SQL
The pandas.io.sql module provides a collection of query wrappers to both facilitate data retrieval and to reduce dependency on DB-specific API. Database abstraction is provided by SQLAlchemy if installed. Also, a driver library is required for the database. Examples of such drivers are psycopg2 for PostgreSQL or pymysql for MySQL. For SQLite, this is included in Python's standard library by default.
If SQLAlchemy is not installed, a fallback is only provided for SQLite (and for MySQL for backward compatibility, but this is deprecated and will be removed in a future version). This mode requires a Python database adapter that respects the Python DB-API.
See also some cookbook examples for some advanced strategies.
The key functions are,
- read_sql_table(table_name, con[, schema, ...]) : Read SQL database table into a DataFrame.
- read_sql_query(sql, con[, index_col, ...]) : Read SQL query into a DataFrame.
- read_sql(sql, con[, index_col, ...]) : Read SQL query or database table into a DataFrame.
- DataFrame.to_sql(name, con[, flavor, ...]) : Write records stored in a DataFrame to a SQL database.
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')
df.to_sql('data', engine)
sql_df = pd.read_sql('data',con=engine)
print(sql_df)
'''
Output:
index 0 a b c d
0 0 0 0 1 2 3
1 1 1 4 5 6 7
2 2 2 8 9 10 11
3 3 3 12 13 14 15
'''