Home »
R Language
Data Frames in the R Programming Language
In this tutorial, we are going to learn about the Data Frames in the R Language, characteristics of data frames, how to create a data frame, extracting the data from the data frame, etc.
Submitted by Bhavya Sri Khandrika, on December 07, 2020
Data Frames
Data frame is nothing but a two-dimensional array. The structure resembles the arrays or a table. Each column of this particular data frame consists of variables and the corresponding row will contain the set of values for that specific column considered. The data frame is also taken as the list in R. The data frames are considered as the special type of list in which each component of the data frame consists of the identical length. We all are acquainted with the fact that the data frames are widely employed in R with a prime motive to store the values of the variables. In addition to that, the data frames can also store the data present in the form of tables, vectors that correspond to a list in a data frame. In simple terms, the data frames are more precisely stated as the list with the equal no of vectors that means the vectors are of the same length in the data frame.
Characteristics of Data Frames
Coming to the characteristics of the data frames here are some of them listed in the below lines. Take a look over them:
- The names of the columns should not be empty.
- The row names must be unique.
- The data stored in a data frame can be either a numeric value, factors, or even character type too.
- The number of data items stored in every column should be the same. That means each column should have an equal number of data items in them.
How to create a Data frame?
That function is a frame() function used to create a data frame in R. Using this function one can easily create a data frame with the desired parameters in it. The frame() function has a provision such that it can store any type of data types like that of numeric values or characters, in addition to those even integers that can be stored in a data frame.
Example 1:
# Data frame for student data…
student.data<- data.frame(
roll_no. = c (1011:1017),
student_name = c("Shiva", "Arpita", "Rishitha", "Gunjan", "Suman", "Ramya", "Divya"),
percentage = c(75.7, 90.03, 67, 54.98, 87.2, 89.99, 92.04), stringsAsFactors = FALSE )
# Printing the data frame.
print(student.data)
Output:
roll_no. student_name percentage
1 1011 Shiva 75.70
2 1012 Arpita 90.03
3 1013 Rishitha 67.00
4 1014 Gunjan 54.98
5 1015 Suman 87.20
6 1016 Ramya 89.99
7 1017 Divya 92.04
Example 2:
Here, we take the data from a survey that is based on the animals in a zoo. Our task is to create a data frame with the labels as the name of the animal, the date it entered the zoo, age of the animal, along with the weight of the animal. For this let us consider the character vector along with the other integer and numeric vectors.
# Creating the data frame for the following
# data using the frame() function.
ani.data<- data.frame(
animal_id = c (1:5),
animal_name = c("zebra","elephant","giraffe"," tiger","ostrich"),
age = c(5,4,6,8,7),
entered_date = as.Date(c("2007-05-03", "2010-08-02", "2008-11-25", "2014-03-07", "2006-02-16")),
stringsAsFactors = FALSE )
# Printing the data frame considered above.
print(ani.data)
Output:
animal_id animal_name age entered_date
1 1 zebra 5 2007-05-03
2 2 elephant 4 2010-08-02
3 3 giraffe 6 2008-11-25
4 4 tiger 8 2014-03-07
5 5 ostrich 7 2006-02-16
Extracting the Data From the Data Frame
One of the main important parameters while working with the programs on R is the data. Therefore, proper care must be taken to make sure that the data is extracted completely from the data frame. Also, the central idea of this extracting process is to perform the manipulation of the data considered.
There are three ways in which the data can be extracted. They are:
- Extracting data available in the columns using column name.
- Extracting data by using the rows, using row names.
- Extracting data using particular rows that are corresponding to the columns.
The following example will depict the exact concept of the extraction of the data that is available in the data frames in R.
Extracting the particular columns from the data frame considered
# Creating the data frame using the frame() function in the R.
emp.data<- data.frame(
employee_id = c (1:7),
employee_name = c("Shiva","Arpita","Rishitha","Gunjan","Suman","Ramya","Divya"),
sal = c(683.6,817.2,671.9,925.6,783.65,782.67,927.54),
starting_date = as.Date(c("2012-04-06", "2013-08-20", "2014-06-11", "2014-09-25", "2015-02-27", "2013-02-19", "2012-05-12" )), stringsAsFactors = FALSE )
# Extracting the particular columns from the data frame considered
final <- data.frame(emp.data$employee_id,emp.data$sal)
print(final)
Output:
emp.data.employee_id emp.data.sal
1 1 683.60
2 2 817.20
3 3 671.90
4 4 925.60
5 5 783.65
6 6 782.67
7 7 927.54
Extracting the rows and columns from the data frame as per the user requirement
# Creating the data frame using the frame() function in the R
emp.data<- data.frame(
employee_id = c (1:7),
employee_name = c("Shiva","Arpita","Rishitha","Gunjan","Suman","Ramya","Divya"),
sal = c(683.6,817.2,671.9,925.6,783.65,782.67,927.54),
starting_date = as.Date(c("2012-04-06", "2013-08-20", "2014-06-11", "2014-09-25",
"2015-02-27","2013-02-19","2012-05-12")),
stringsAsFactors = FALSE
)
# Extracting the third row from the considered data frame
final <- emp.data[3,]
print(final)
# Extracting the rows from the above considered data frame
final <- emp.data[2:5,]
print(final)
# Extracting 2nd and 5th row corresponding to the 3rd and 4th column
final <- emp.data[c(2,5),c(3,4)]
print(final)
Output:
employee_id employee_name sal starting_date
3 3 Rishitha 671.9 2014-06-11
employee_id employee_name sal starting_date
2 2 Arpita 817.20 2013-08-20
3 3 Rishitha 671.90 2014-06-11
4 4 Gunjan 925.60 2014-09-25
5 5 Suman 783.65 2015-02-27
sal starting_date
2 817.20 2013-08-20
5 783.65 2015-02-27
Modifications on the Data Frames
R allows programmers to perform the modifications on the data frames. Like that of matrix modification, one can also modify the data items in the data frames in the R. This task can be accomplished by reassigning the data items in the data frames. As a part of accomplishing modifications, one can add or delete rows and columns to the existing data frame. As a part of mutations, one can perform the following on the existing data frames. They are:
- Add columns to an existing data frame using the cbind() function. The cbind() function adds the new column vector to the prevailing data frame.
- Rows can be added to the data frame using rbind() function.
- To delete the existing columns/rows, simply reassign them with a NULL value.
Example:
Let's workout with the rbind() function and cbind() function with a sample example:
# Creating the data frame using the frame() function in the R
emp.data<- data.frame(
employee_id = c (1:7),
employee_name = c("Shivam","Arya","Rishi","Arvind","Arjun","Ram","Dheeraj"),
favcolor = c("pink","yellow","green","blue","orange","purple","red"),
starting_date = as.Date(c("2003-04-06", "2007-08-20", "2004-06-11", "2012-09-25",
"2011-02-27","2014-02-19","2010-05-12")),
stringsAsFactors = FALSE
)
#Adding the row in the data frame
x <- list(8,"Vardhan","black","2013-02-06")
rbind(emp.data,x)
#Adding the column in the data frame
y <- c("Hyderabad","Lucknow","paris","Dhargha","Meerut","Banglore","Chennai")
cbind(emp.data,Address=y)
Output:
employee_id employee_name favcolor starting_date
1 1 Shivam pink 2003-04-06
2 2 Arya yellow 2007-08-20
3 3 Rishi green 2004-06-11
4 4 Arvind blue 2012-09-25
5 5 Arjun orange 2011-02-27
6 6 Ram purple 2014-02-19
7 7 Dheeraj red 2010-05-12
8 8 Vardhan black 2013-02-06
employee_id employee_name favcolor starting_date Address
1 1 Shivam pink 2003-04-06 Hyderabad
2 2 Arya yellow 2007-08-20 Lucknow
3 3 Rishi green 2004-06-11 paris
4 4 Arvind blue 2012-09-25 Dhargha
5 5 Arjun orange 2011-02-27 Meerut
6 6 Ram purple 2014-02-19 Bangalore
7 7 Dheeraj red 2010-05-12 Chennai
Binding two data frames using rbind()
Consider the below code:
# Creating the data frame.
emp.data<- data.frame(
employee_id = c (1:5),
employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
sal = c(623.3,515.2,611.0,729.0,843.25),
starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
print(emp.data)
# Creating another data frame using the frame() function
# in the R Programming Language.
# Creating the data frame.
emp.newdata<- data.frame(
employee_id = c (1:7),
employee_name = c("Shiva","Aryan","Rishin","Arvinda","Arnav","Ramesh","Dhana"),
sal = c(683.6,817.2,671.9,925.6,783.65,782.67,927.54),
starting_date = as.Date(c("2012-04-06", "2013-08-20", "2014-06-11", "2014-09-25",
"2015-02-27","2013-02-19","2012-05-12")),
stringsAsFactors = FALSE
)
print(emp.newdata)
# Bind the two data frames.
emp.finaldata <- rbind(emp.data,emp.newdata)
print(emp.finaldata)
Output:
employee_id employee_name sal starting_date
1 1 Shubham 623.30 2012-01-01
2 2 Arpita 515.20 2013-09-23
3 3 Nishka 611.00 2014-11-15
4 4 Gunjan 729.00 2014-05-11
5 5 Sumit 843.25 2015-03-27
employee_id employee_name sal starting_date
1 1 Shiva 683.60 2012-04-06
2 2 Aryan 817.20 2013-08-20
3 3 Rishin 671.90 2014-06-11
4 4 Arvinda 925.60 2014-09-25
5 5 Arnav 783.65 2015-02-27
6 6 Ramesh 782.67 2013-02-19
7 7 Dhana 927.54 2012-05-12
employee_id employee_name sal starting_date
1 1 Shubham 623.30 2012-01-01
2 2 Arpita 515.20 2013-09-23
3 3 Nishka 611.00 2014-11-15
4 4 Gunjan 729.00 2014-05-11
5 5 Sumit 843.25 2015-03-27
6 1 Shiva 683.60 2012-04-06
7 2 Aryan 817.20 2013-08-20
8 3 Rishin 671.90 2014-06-11
9 4 Arvinda 925.60 2014-09-25
10 5 Arnav 783.65 2015-02-27
11 6 Ramesh 782.67 2013-02-19
12 7 Dhana 927.54 2012-05-12
Summary() Function
In certain cases, the programmer needs to find a statistical summary of the prevailing data. Among the available advantages of using R, one can also find the nature of the input data in the particular data frame.
In order to solve all such difficulties, the team behind the R development made a successful attempt by including the summary() function which will assist the programmer in extracting the statistical summary along with the nature of the considered data. To accomplish this particularly the summary() function considers the data frame to be a single parameter and then returns the required statistical message of the input data considered to the user.
We have seen the above code in which the rows and columns are apprehended to the subsisting data frame. Now in this example, you will go through the concept of deleting the rows as well as the columns as per the user request.
To understand the above concept, take a look over the below code:
# Creating the data frame using the frame () function
# in the R Programming Language.
emp.data<- data.frame(
employee_id = c (1:7),
employee_name = c("Shivam","Arya","Rishi","Arvind","Arjun","Ram","Dheeraj"),
favcolor = c("pink","yellow","green","blue","orange","purple","red"),
starting_date = as.Date(c("2003-04-06", "2007-08-20", "2004-06-11", "2012-09-25",
"2011-02-27","2014-02-19","2010-05-12")),
stringsAsFactors = FALSE
)
print(emp.data)
# Deleting the existing rows from the available data frame
# here the third row will be deleted in the output
emp.data<-emp.data[-3,]
print(emp.data)
# Deleting the existing columns from the available data frame
# this code will delete the column corresponding to the employee_id
emp.data$employee_id<-NULL
print(emp.data)
Output:
employee_id employee_name favcolor starting_date
1 1 Shivam pink 2003-04-06
2 2 Arya yellow 2007-08-20
3 3 Rishi green 2004-06-11
4 4 Arvind blue 2012-09-25
5 5 Arjun orange 2011-02-27
6 6 Ram purple 2014-02-19
7 7 Dheeraj red 2010-05-12
employee_id employee_name favcolor starting_date
1 1 Shivam pink 2003-04-06
2 2 Arya yellow 2007-08-20
4 4 Arvind blue 2012-09-25
5 5 Arjun orange 2011-02-27
6 6 Ram purple 2014-02-19
7 7 Dheeraj red 2010-05-12
employee_name favcolor starting_date
1 Shivam pink 2003-04-06
2 Arya yellow 2007-08-20
4 Arvind blue 2012-09-25
5 Arjun orange 2011-02-27
6 Ram purple 2014-02-19
7 Dheeraj red 2010-05-12