Home »
Machine Learning/Artificial Intelligence
Attribute Relation File Format (ARFF)
In this tutorial, we will learn about the Attribute Relation File Format (ARFF), how to use it for Machine Learning in Java?
By Raunak Goswami Last updated : April 17, 2023
What is Attribute Relation File Format (ARFF)?
The Attribute Relation File Format (ARFF) is a file format (an ASCII text file format). An ARFF file describes a list of instances sharing a set of attributes. The ARFF file format was developed by the computer science department of the University of Waikato, as the name suggests the file contains a list of attributes and one class attribute.
Sections (portions) of an AARF File Format
The AARF (Attribute Relation File Format) is broadly divided into two sections (portions): Header field and Data field.
1. Header field
The header field describes the name of the attributes, type of relation and their datatypes that are present in the data file the main difference between them .CSV and .arff file are that the in .CSV files you will find the values of the attributes just below their name but in .arff files, the name of the attributes are specified separately followed by the data which is present in a separate data field. The basic syntax for writing the attribute name In the header portion is as follows:
@attribute <attribute-name> <datatype>
The image below shows an example of .arff file format,
The following example is a data set contains the head-brain relation of the various users. From the picture above one can easily identify the number of attributes along with the type of data that they contain in our example all the data in all four attributes are in the form of number i.e. numeric. Apart from being numeric, the data type can be of the form of nominal, string type and data type specification.
2. Data field
This field contains the data values of the attributes mentioned above in the attribute field these are the values will be used by our model to perform prediction and to determine the amount of accuracy that can be provided in the result of our model. The data present is separated by the comas under the heading of @data. The data as mentioned above in the attributes field can be as follows:
- Numerical
- Nominal
- String
- Date-time format
The .CSV file, that I have used can be downloaded from here: headbrain7.csv
Java code to convert a .CSV file format to .AARF file format
Below is the code is written in Java in eclipse IDE for converting the .CSV file into .arff file format make sure you have set the path to the weka.jar file if you haven’t, then just have a look at my previous article: Weka Tutorial.
import java.io.File;
import java.io.IOException;
import weka.*;
import weka.core.Instances;
import weka.core.converters.ArffSaver;
import weka.core.converters.CSVLoader;
public class Main {
public static void main(String[] args) throws IOException {
// load the CSV file
CSVLoader load = new CSVLoader();
loader.setSource(new File("headbrain.csv"));
Instances data = load.getDataSet(); //get instances object
ArffSaver save = new ArffSaver();
save.setInstances(data); //set the dataset we want to convert
save.setFile(new File("C:\\Users\\Logan\\Desktop\\ML\\headbrain.arff"));
System.out.println("The .arff file format is as follows");
save.writeBatch();
System.out.println(data);
}
}
Output