As a general technology, data mining can be applied to any kind of data as long as the data are meaningful for a target application.
The following are the most basic forms of data for mining.
A database system, also called a database management system (DBMS), consists of a collection of interrelated data, known as a database, and a set of software programs to manage and access the data.
A relational database: is a collection of tables, each of which is assigned a unique name, each table consists of a set of attributes (columns or fields) and usually stores a large set of tuples (records or rows). Each tuple in a relational table represents an object identified by a unique key and described by a set of attribute values.
Example:
A data warehouse is a repository of information collected from multiple sources, stored under a unified schema, and usually residing at a single site. Data warehouses are constructed via a process of data cleaning, data integration, data transformation, data loading, and periodic data refreshing.
A data warehouse is defined as the collection of data integrated from multiple sources. Later this data can be mined for decision making.
A data warehouse is usually modelled by a multidimensional data structure, called a data cube, in which each dimension corresponds to an attribute or a set of attributes in the schema, and each cell stores the value of some aggregate measure such as count or sum. A data cube provides a multidimensional view of data and allows the precomputation and fast access of summarized data.
Example:
Transactional database is a collection of data organized by time stamps, date etc to represent transaction in databases. In general, each record in a transactional database captures a transaction, such as a customer’s purchase, a flight booking, or a user’s clicks on a web page.
A transaction typically includes a unique transaction identity number (trans ID) and a list of the items making up the transaction, such as the items purchased in the transaction.
This type of database has the capability to roll back or undo operation when a transaction is not completed or committed. And it follows ACID property of DBMS.
Example:
TID Items T1 Bread, Coke, Milk T2 Popcorn, Bread T3 Popcorn, Coke, Egg, Milk T4 Popcorn, Bread, Egg, Milk T5 Coke, Egg, MilkFig: Transactional data
The multimedia databases are used to store multimedia data such as images, animation, audio, video along with text. This data is stored in the form of multiple file types like .txt(text), .jpg(images), .swf(videos), .mp3(audio) etc.
A spatial database is a database that is enhanced to store and access spatial data or data that defines a geometric space. These data are often associated with geographic locations and features, or constructed features like cities. Data on spatial databases are stored as coordinates, points, lines, polygons and topology.
The World Wide Web is a collection of documents and resources such as audio, video, and text. It identifies all this by URLs of the web browsers which are linked through HTML pages. Online shopping, job hunting, and research are some uses.
It is the most heterogeneous repository as it collects data from multiple resources. And it is dynamic in nature as Volume of data is continuously increasing and changing.
Flat files are a type of structured data that are stored in a plain text format. They are called “flat” because they have no hierarchical structure, unlike a relational database table. Flat files typically consist of rows and columns of data, with each row representing a single record and each column representing a field or attribute within that record. They can be stored in various formats such as CSV, tab-separated values (TSV) and fixed-width format.
Example:
Time-series data is a sequence of data points collected over time intervals, allowing us to track changes over time. Time-series data can track changes over milliseconds, days, or even years.
A time series database (TSDB) is a database optimized for time-stamped or time series data. Time series data are simply measurements or events that are tracked, monitored, downsampled, and aggregated over time. This could be server metrics, application performance monitoring, network data, sensor data, events, clicks, trades in a market, and many other types of analytics data.
Example:
Fig:A Data Mart is focused on a single functional area of an organization and contains a subset of data stored in a Data Warehouse. A Data Mart is a abbreviated version of Data Warehouse and is designed for use by a specific department, unit or set of users in an organization. E.g., Marketing, Sales, HR or finance. It is often controlled by a single department in an organization.