In order to work with Weka Explorer, it is essential to load the data into the application. The data may in different formats such as CSV, Text, JSON and so on.
WEKA supports a wide range of file formats to load the data.
The following is the complete list of file formats
☛ Arff data files(*.arff)
☛ Arff data files(*arff.gz)
☛ C4.5 data files(*.names)
☛ C4.5 ata files(*.data)
☛ CSV data files(*.csv)
☛ JSON instance files(*.json)
☛ JSON instance files(*.json.gz)
☛ libsvm data files(*.libsvm)
☛ Matlab ASCII files(*.m)
☛ svm light data files(*.dat)
☛ Binary Serialized instances(*.bsi)
☛ XRFF data files(*.xrff)
☛ XRFF data files(*.xrff.gz)
The following screen displays all supported file formats in dropdown list at the bottom of window.
As we see the WEKA supports various formats of data to load, among those formats the most commonly used data formats are Arff data files(*.arff) and CSV data files(*.csv).
✍NOTE : The default data format of WEKA is Arff data files (*.arff).
An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of instances sharing a set of attributes.
The ARFF file format has mainly two sections, those are
• Header section
• Data section
The Header section of the ARFF file contains the name of the relation, a list of the attributes and their types.
The relation name is defined as the first line in the ARFF file.
@RELATION <relation-name>
- where <relation-name> is a string. The relation name must be quoted if the name includes spaces.
The attribute specifies name of the attribute along with type.
@ATTRIBUTE <attribute-name> <datatype>
- where the <attribute-name> must start with an alphabet. The attribute name must be quoted if the name includes spaces.
Numeric attributes can be real or integer numbers.
Nominal values are defined by providing the possible values: { nominal-value1, nominal-value2, nominal-value3,… }
String attributes allow us to define attributes holding textual values.
Date attribute defined as follows
@ATTRIBUTE <name> date [<date-format>]
- where <name> is the name for the attribute and <date-format> is an optional string. The default date-format string is yyyy-MM-dd’T’HH:mm:ss.
% Title: Student Database % % Sources: % (a) Creator: Mr.T.M % (b) Date: Oct, 2023 % @RELATION student @ATTRIBUTE sid NUMERIC @ATTRIBUTE age NUMERIC @ATTRIBUTE gender {male, female}
In the above example,
- The lines which start with % are treated as comments.
- @RELATION specifies the name of the relation.
- @ATTRIBUTE specifies name of the attribute along with type and possible values.
The Data section of the ARFF file contains the list of data values (instance data) separated by comma.
@DATA 101,20,male 102,19,female 103,?,male
In the above, there are 3 instances with numeric and nominal values. And the symbol ? indicates missing values.
✍NOTE : The @RELATION, @ATTRIBUTE and @DATA declarations are case insensitive. i.e @RELATION and @relation are treated as same in ARFF file format.
The following is the complete ARFF file
1) Downloading and/or installation of WEKA Data Mining toolkit. View Solution
2) Start working with WEKA tool kit and understand the features of WEKA tool kit. View Solution
3) Loading Data from different sources in WEKA. View Solution
4) Various File Formats supported by WEKA. And Study the ARFF file format. View Solution
5) Demonstration of creating a Student dataset (student.arff) using WEKA tool in Data Mining. View Solution
6) Demonstration of creating a Weather dataset (weather.arff) using WEKA tool in Data Mining View Solution
7) Explore the available data sets in WEKA tool kit. View Solution
8) Load a dataset from the available data sets in the WEKA tool. View Solution
9) From the loaded dataset(weather.arff), observe the attribute names, attribute types, number of records in the dataset, Identify the class attribute (if any), and visualize the data in various dimensions. View Solution
10) Conversion of a Text file into ARFF (Attribute-Relation File Format) using Weka tool. View Solution