Association rule mining (or) Frequent item set mining finds interesting associations and relationships (correlations) in large transactional or relational data sets. This rule shows how frequently an item set occurs in a transaction. A typical example is Market Based Analysis.
Market Based Analysis is one of the key techniques used by large relations to show associations between items. It allows retailers to identify relationships between the items that people buy together frequently.
This process analyzes customer buying habits by finding associations between the different items that customers place in their “shopping baskets”.
The discovery of these associations can help retailers develop marketing strategies by gaining insight into which items are frequently purchased together by customers. For instance, if customers are buying milk, how likely are they to also buy bread (and what kind of bread) on the same trip to the supermarket? This information can lead to increased sales by helping retailers do selective marketing and plan their shelf space.
Understanding these buying patterns can help to increase sales in several ways. If there is a pair of items, X and Y, which are frequently bought together:
Association rule: If there is a pair of items, X and Y, which are frequently bought together then association rule is represented as X ⇒ Y.
For example, the information that customers who purchase computers also tend to buy antivirus software at the same time is represented as
Association rules analysis is a technique to discover how items are associated to each other. There are three measure to discover interestingness of association rules. Those are:
Support: The support of an item / item set is the number of transactions in which the item / item set appears, divided by the total number of transactions.
Formula:
Where, A, B are items and N is the total number of transactions.
Example:Table-1 Example Transactions
TID | Items |
---|---|
T1 | Bread, Coke, Milk |
T2 | Popcorn, Bread |
T3 | Bread, Egg, Milk. |
T4 | Egg, Bread, Coke, Milk |
T5 | Egg, Apple |
Example: Support of item Coke:
Example: Support of item set Bread, Milk:
Confidence: This says that how likely item B is purchased when item A is purchased, expressed as {A → B}. The Confidence of items (A and B) is the frequency or number of transactions in which the items (A and B) appear, divided by the frequency or number of transactions in which the item (A) appears.
Formula:
Example: From the Table-1, the confidence of {Bread → Milk} is
The confidence of {Bread → Milk} is
Support {Bread, Milk} = 3 / 5 = 0.6
Support {Bread} = 4 / 5 = 0.8
Lift: This says that how likely item B is purchased when item A is purchased, expressed as an association rule {A → B}. The lift is a measure to predict the performance of an association rule (targeting model).
If lift value is:
Formula:
Example: From the Table-1, the lift of {Bread → Milk} is
Support {Bread, Milk} =3 / 5 = 0.6
Support {Bread} = 4 / 5 = 0.8
Support {Milk} = 3 / 5 = 0.6
The Lift value is greater than 1 means that item Milk is likely to be bought if item Bread is bought.
Example: To find Support, Confidence and Lift measures on the following transactional data set.
Table-2: Example Transactions
TID | Items |
---|---|
T1 | Bread, Milk |
T2 | Bread, Diaper, Burger, Eggs |
T3 | Milk, Diaper, Burger, Coke |
T4 | Bread, Milk, Diaper, Burger |
T5 | Bread, Milk, Diaper, Coke |
Number of transactions = 5.
Support:
Confidence:
Lift:
While lift value less than 1 means that item ‘Milk’ is unlikely to be bought if item ‘Bread’ is bought.
While lift value less than 1 means that item ‘Burger’ is unlikely to be bought if item ‘Milk’ is bought.
While lift value less than 1 means that item ‘Diaper’ is unlikely to be bought if itemset ‘Bread, Milk’ is bought.
While lift value greater than 1 means that item ‘Burger’ is likely to be bought if itemset ‘Milk, Diaper’ is bought.