Data mining is a very important process where potentially useful and previously unknown information is extracted from large volumes of data. There are several components involved in the data mining process.
The major components of any data mining system are data source, data warehouse server, data mining engine, pattern evaluation module, graphical user interface and knowledge base.
Database, data warehouse, World Wide Web (WWW), text files and other documents are the actual sources of data. You need large volumes of historical data for data mining to be successful.
Organizations usually store data in databases or data warehouses. Data warehouses may contain one or more databases, text files, spreadsheets, or other kinds of information repositories. Sometimes, data may reside even in plain text files or spreadsheets. World Wide Web or the Internet is another big source of data.
The data needs to be cleaned, integrated, and selected before passing it to the database or data warehouse server. As the data is from different sources and in different formats, it cannot be used directly for the data mining process because the data might not be complete and reliable. So, first data needs to be cleaned and integrated.
The database or data warehouse server contains the actual data that is ready to be processed. Hence, the server is responsible for retrieving the relevant data based on the data mining request of the user.
The data mining engine is the core component of any data mining system. It consists of several modules for performing data mining tasks including association, classification, characterization, clustering, prediction, time-series analysis etc.
The pattern evaluation module is mainly responsible for the measure of interestingness of the pattern by using a threshold value. It interacts with the data mining engine to focus the search towards interesting patterns.
The graphical user interface module provides the communication between the user and the data mining system. This module helps the user use the system easily
and efficiently without knowing the real complexity behind the process.
When the user specifies a query or a task, this module interacts with the data mining system and displays the result in an easily understandable manner.The knowledge base is helpful in the whole data mining process. It might be useful for guiding the search or evaluating the interestingness of the result patterns. The knowledge base might even contain user beliefs and data from user experiences that can be useful in the process of data mining.
The data mining engine might get inputs from the knowledge base to make the result more accurate and reliable. The pattern evaluation module interacts with the knowledge base on a regular basis to get inputs and also to update it.