DM Menu


Data mining Task primitives




A data mining task can be specified in the form of a data mining query, which is input to the data mining system. A data mining query is defined in terms of data mining task primitives. These primitives allow the user to interactively communicate with the data mining system during the mining process to discover interesting patterns.

Here is the list of Data Mining Task Primitives

  • Set of task relevant data to be mined.
  • Kind of knowledge to be mined.
  • Background knowledge to be used in discovery process.
  • Interestingness measures and thresholds for pattern evaluation.
  • Representation for visualizing the discovered patterns.

Set of task relevant data to be mined

This specifies the portions of the database or the set of data in which the user is interested.

This portion includes the following

  • Database Attributes
  • Data Warehouse dimensions of interest

For example, suppose that you are a manager of All Electronics in charge of sales in the United States and Canada. You would like to study the buying trends of customers in Canada. Rather than mining on the entire database. These are referred to as relevant attributes.

Kind of knowledge to be mined

This specifies the data mining functions to be performed, such as

  • Characterization& Discrimination
  • Association
  • Classification
  • Clustering
  • Prediction
  • Outlier analysis

For instance, if studying the buying habits of customers in Canada, you may choose to mine associations between customer profiles and the items that these customers like to buy.

Data mining Task primitives

Background knowledge to be used in discovery process

Users can specify background knowledge, or knowledge about the domain to be mined. This knowledge is usefulfor guiding the knowledge discovery process, and for evaluating the patterns found. User beliefs about relationship in the data.

There are several kinds of background knowledge. Concept hierarchies are a popular form of background knowledge, which allow data to be mined at multiple levels of abstraction.

Example:

An example of a concept hierarchy for the attribute (or dimension) age is shown in the following Figure.

hierarchy for the attribute age

In the above, the root node represents the mostgeneral abstraction level, denoted as all.

Interestingness measures and thresholds for pattern evaluation

The Interestingness measures are used to separateinteresting and uninteresting patterns from the knowledge.They may be used to guide the mining process, or after discovery, to evaluate the discovered patterns. Different kinds of knowledge may have different interestingness measures.

For example, interesting measures for association rules include support and confidence.

Representation for visualizing the discovered patterns

This refers to the formin which discovered patterns are to be displayed. Users can choose from different forms for knowledge presentation, such as

  • rules, tables, reports, charts, graphs, decision trees, and cubes.

Next Topic :Integration of Data mining system with a Data warehouse