DM Menu


Types of Data




What Kinds of Data Can Be Mined

As a general technology, data mining can be applied to any kind of data as long as the data are meaningful for a target application.

The following are the most basic forms of data for mining.

Basic forms of data for mining

  • Database Data (or) Relational database
  • Data warehouse data
  • Transactional data

other forms of data for mining

  • Multimedia Database
  • Spatial Database
  • World Wide Web
  • Text data (Flat File)
  • Time series database
diffrent types of data in data mining

Database Data (or) Relational database

A database system, also called a database management system (DBMS), consists of a collection of interrelated data, known as a database, and a set of software programs to manage and access the data.

A relational database: is a collection of tables, each of which is assigned a unique name, each table consists of a set of attributes (columns or fields) and usually stores a large set of tuples (records or rows). Each tuple in a relational table represents an object identified by a unique key and described by a set of attribute values.

Example:

A relational database

Data warehouse data

A data warehouse is a repository of information collected from multiple sources, stored under a unified schema, and usually residing at a single site. Data warehouses are constructed via a process of data cleaning, data integration, data transformation, data loading, and periodic data refreshing.

data warehouse

A data warehouse is defined as the collection of data integrated from multiple sources. Later this data can be mined for decision making.

A data warehouse is usually modelled by a multidimensional data structure, called a data cube, in which each dimension corresponds to an attribute or a set of attributes in the schema, and each cell stores the value of some aggregate measure such as count or sum. A data cube provides a multidimensional view of data and allows the precomputation and fast access of summarized data.

Example:

data cube in data warehouse

Transactional data

Transactional database is a collection of data organized by time stamps, date etc to represent transaction in databases. In general, each record in a transactional database captures a transaction, such as a customer’s purchase, a flight booking, or a user’s clicks on a web page.

A transaction typically includes a unique transaction identity number (trans ID) and a list of the items making up the transaction, such as the items purchased in the transaction.

This type of database has the capability to roll back or undo operation when a transaction is not completed or committed. And it follows ACID property of DBMS.

Example:

TID          Items
T1      Bread, Coke, Milk
T2      Popcorn, Bread
T3      Popcorn, Coke, Egg, Milk
T4      Popcorn, Bread, Egg, Milk
T5      Coke, Egg, Milk
Fig: Transactional data

Multimedia database

The multimedia databases are used to store multimedia data such as images, animation, audio, video along with text. This data is stored in the form of multiple file types like .txt(text), .jpg(images), .swf(videos), .mp3(audio) etc.

Multimedia data in data warehouse

Spatial database

A spatial database is a database that is enhanced to store and access spatial data or data that defines a geometric space. These data are often associated with geographic locations and features, or constructed features like cities. Data on spatial databases are stored as coordinates, points, lines, polygons and topology.

Spatial database in data warehouse

World Wide Web

The World Wide Web is a collection of documents and resources such as audio, video, and text. It identifies all this by URLs of the web browsers which are linked through HTML pages. Online shopping, job hunting, and research are some uses.

It is the most heterogeneous repository as it collects data from multiple resources. And it is dynamic in nature as Volume of data is continuously increasing and changing.

diffrent types of www data in data mining

Text data (Flat File)

Flat files are a type of structured data that are stored in a plain text format. They are called “flat” because they have no hierarchical structure, unlike a relational database table. Flat files typically consist of rows and columns of data, with each row representing a single record and each column representing a field or attribute within that record. They can be stored in various formats such as CSV, tab-separated values (TSV) and fixed-width format.

  • Flat files is defined as data files in text form or binary form with a structure that can be easily extracted by data mining algorithms.
  • Data stored in flat files have no relationship or path among themselves, like if a relational database is stored on flat file, then there will be no relations between the tables.

Example:

Text data

Time series database

Time-series data is a sequence of data points collected over time intervals, allowing us to track changes over time. Time-series data can track changes over milliseconds, days, or even years.

A time series database (TSDB) is a database optimized for time-stamped or time series data. Time series data are simply measurements or events that are tracked, monitored, downsampled, and aggregated over time. This could be server metrics, application performance monitoring, network data, sensor data, events, clicks, trades in a market, and many other types of analytics data.

Example:

Fig:
Time series data

What is Data Mart?

A Data Mart is focused on a single functional area of an organization and contains a subset of data stored in a Data Warehouse. A Data Mart is a abbreviated version of Data Warehouse and is designed for use by a specific department, unit or set of users in an organization. E.g., Marketing, Sales, HR or finance. It is often controlled by a single department in an organization.


Next Topic :Data Mining Functionalities