DM Menu


Knowledge Discovery from Data (KDD)




The need of data mining is to extract useful information from large datasets and use it to make predictions or better decision-making. Nowadays, data mining is used in almost all places where a large amount of data is stored and processed.

For examples: Banking sector, Market Basket Analysis, Network Intrusion Detection.

Data Mining also known as Knowledge Discovery from Data or KDD.

Knowledge Discovery from Data (KDD) Process

KDD is a process that involves the extraction of useful, previously unknown, and potentially valuable information from large datasets.

The KDD process is an iterative process and it requires multiple iterations of the above steps to extract accurate knowledge from the data.


Knowledge Discovery from Data

The following steps are included in KDD process:

  1. Data Cleaning
  2. Data Integration
  3. Data Selection
  4. Data Transformation
  5. Data Mining
  6. Pattern Evaluation
  7. Knowledge Representation

Data Cleaning

Data cleaning is defined as removal of noisy and irrelevant/ inconsistent data from data collection.

  • Cleaning in case of Missing values.
  • Cleaning noisy data, where noise is a random or variance error.

In this step, the noise and inconsistent data is removed.


Data Integration

Data integration is defined as heterogeneous data from multiple data sources combined in a common source (Data Warehouse).

i.e., In this step, multiple data sources may be combined as single data source.

A popular trend in the information industry is to perform data cleaning and data integration as a data preprocessing step, where the resulting data are stored in a data warehouse.


Data Selection

Data selection is defined as the process where data relevant to the analysis is decided and retrieved from the data collection. This step in the KDD process is identifying and selecting the relevant data for analysis.


Data Transformation

Data Transformation is defined as the process of transforming data into appropriate form required by mining procedure. This step involves reducing the data dimensionality, aggregating the data, normalizing it, and discretizing it to prepare it for further analysis.


Data Mining

This is the heart of the KDD process and involves applying various data mining techniques to the transformed data to discover hidden patterns, trends, relationships, and insights. A few of the most common data mining techniques include clustering, classification, association rule mining, and anomaly detection.


Pattern Evaluation

After the data mining, the next step is to evaluate the discovered patterns to determine their usefulness and relevance. This involves assessing the quality of the patterns, evaluating their significance, and selecting the most promising patterns for further analysis.


Knowledge Representation

This step involves representing the knowledge extracted from the data in a way humans can easily understand and use. This can be done through visualizations, reports, or other forms of communication that provide meaningful insights into the data.


Next Topic :Data Mining architecture