Menu

DATA MINING - (LAB PROGRAMS)


Aim:

 Various File Formats supported by WEKA. And Study the ARFF file format.

Solution :

Various File Formats supported by WEKA

Various File Formats supported by WEKA :

In order to work with Weka Explorer, it is essential to load the data into the application. The data may in different formats such as CSV, Text, JSON and so on.

WEKA supports a wide range of file formats to load the data.


The following is the complete list of file formats

Arff data files(*.arff)
Arff data files(*arff.gz)
C4.5 data files(*.names)
C4.5 ata files(*.data)
CSV data files(*.csv)
JSON instance files(*.json)
JSON instance files(*.json.gz)
libsvm data files(*.libsvm)
Matlab ASCII files(*.m)
svm light data files(*.dat)
Binary Serialized instances(*.bsi)
XRFF data files(*.xrff)
XRFF data files(*.xrff.gz)


The following screen displays all supported file formats in dropdown list at the bottom of window.


 

As we see the WEKA supports various formats of data to load, among those formats the most commonly used data formats are Arff data files(*.arff) and CSV data files(*.csv).

NOTE : The default data format of WEKA is Arff data files (*.arff).


Study the ARFF file format:


An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of instances sharing a set of attributes.

The ARFF file format has mainly two sections, those are

Header section

Data section

Header section:

The Header section of the ARFF file contains the name of the relation, a list of the attributes and their types.

@RELATION Declaration

The relation name is defined as the first line in the ARFF file.

format:

  @RELATION <relation-name>

- where <relation-name> is a string. The relation name must be quoted if the name includes spaces.

@ATTRIBUTE Declaration

The attribute specifies name of the attribute along with type.

format:

   @ATTRIBUTE <attribute-name> <datatype>

- where the <attribute-name> must start with an alphabet. The attribute name must be quoted if the name includes spaces.

Weka supports the following four datatypes:

1. Numeric attributes:

Numeric attributes can be real or integer numbers.

2. Nominal attributes:

Nominal values are defined by providing the possible values: { nominal-value1, nominal-value2, nominal-value3,… }

3. String attributes:

String attributes allow us to define attributes holding textual values.

4. Date attributes:

Date attribute defined as follows

  @ATTRIBUTE <name> date [<date-format>]

- where <name> is the name for the attribute and <date-format> is an optional string. The default date-format string is yyyy-MM-dd’T’HH:mm:ss.


Example of Header Section:

   % Title: Student Database
   % 
   % Sources:
   %      (a) Creator: Mr.T.M
   %      (b) Date: Oct, 2023
   % 

   @RELATION student

   @ATTRIBUTE sid NUMERIC
   @ATTRIBUTE age NUMERIC
   @ATTRIBUTE gender {male, female}

In the above example,

- The lines which start with % are treated as comments.

- @RELATION specifies the name of the relation.

- @ATTRIBUTE specifies name of the attribute along with type and possible values.

Data section:

The Data section of the ARFF file contains the list of data values (instance data) separated by comma.


Example of Body Section:

   @DATA
   101,20,male
   102,19,female
   103,?,male

In the above, there are 3 instances with numeric and nominal values. And the symbol ? indicates missing values.

NOTE : The @RELATION, @ATTRIBUTE and @DATA declarations are case insensitive. i.e @RELATION and @relation are treated as same in ARFF file format.

The following is the complete ARFF file


Filename: student. arff


 

Related Content :

Data Mining Lab Programs

1) Downloading and/or installation of WEKA Data Mining toolkit.   View Solution

2) Start working with WEKA tool kit and understand the features of WEKA tool kit.   View Solution

3) Loading Data from different sources in WEKA.   View Solution

4) Various File Formats supported by WEKA. And Study the ARFF file format.   View Solution

5) Demonstration of creating a Student dataset (student.arff) using WEKA tool in Data Mining.   View Solution

6) Demonstration of creating a Weather dataset (weather.arff) using WEKA tool in Data Mining   View Solution

7) Explore the available data sets in WEKA tool kit.   View Solution

8) Load a dataset from the available data sets in the WEKA tool.   View Solution

9) From the loaded dataset(weather.arff), observe the attribute names, attribute types, number of records in the dataset, Identify the class attribute (if any), and visualize the data in various dimensions.   View Solution

10) Conversion of a Text file into ARFF (Attribute-Relation File Format) using Weka tool.   View Solution