Apriori algorithm was the first algorithm that was proposed for frequent itemset mining. It was introduced by R Agarwal and R Srikant.
Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties.
Apriori algorithm uses frequent itemsets to generate association rules. To improve the efficiency of level-wise generation of frequent itemsets, an important property is used called Apriori property which helps by reducing the search space.
Apriori algorithm is a sequence of steps to be followed to find the most frequent itemset in the given database. A minimum support threshold is given in the problem or it is assumed by the user.
The steps followed in the Apriori Algorithm of data mining are:
Join Step: This step generates (K+1) itemset from K-itemsets by joining each item with itself.
Prune Step: This step scans the count of each item in the database. If the candidate item does not meet minimum support, then it is regarded as infrequent and thus it is removed. This step is performed to reduce the size of the candidate itemsets.
The above join and the prune steps iteratively until the most frequent itemsets are achieved.
Consider the following dataset and find frequent item sets and generate association rules for them. Assume that minimum support threshold (s = 50%) and minimum confident threshold (c = 80%).
Transaction | List of items |
---|---|
T1 | I1, I2, I3 |
T2 | I2, I3, I4 |
T3 | I4, I5 |
T4 | I1, I2, I4 |
T5 | I1, I2, I3, I5 |
T6 | I1, I2, I3, I4 |
Finding frequent item sets:
Support threshold=50% ⇒ 0.5*6 = 3 ⇒ min_sup = 3
Step-1:
(i) Create a table containing support count of each item present in dataset – Called C1 (candidate set).
Item | Count |
---|---|
I1 | 4 |
I2 | 5 |
I3 | 4 |
I4 | 4 |
I5 | 2 |
(ii) Prune Step: Compare candidate set item’s support count with minimum support count. The above table shows that I5 item does not meet min_sup = 3, thus it is removed, only I1, I2, I3, I4 meet min_sup count.
This gives us the following item set L1.
Item | Count |
---|---|
I1 | 4 |
I2 | 5 |
I3 | 4 |
I4 | 4 |
Step-2:
(i) Join step: Generate candidate set C2 (2-itemset) using L1.And find out the occurrences of 2-itemset from the given dataset.
Item | Count |
---|---|
I1, I2 | 4 |
I1, I3 | 3 |
I1, I4 | 2 |
I2, I3 | 4 |
I2, I4 | 3 |
I3, I4 | 2 |
(ii) Prune Step: Compare candidate set item’s support count with minimum support count. The above table shows that item sets {I1, I4} and {I3, I4} does not meet min_sup = 3, thus those are removed.
This gives us the following item set L2.
Item | Count |
---|---|
I1, I2 | 4 |
I1, I3 | 3 |
I2, I3 | 4 |
I2, I4 | 3 |
Step-3:
(i) Join step: Generate candidate set C3 (3-itemset) using L2.And find out the occurrences of 3-itemset from the given dataset.
Item | Count |
---|---|
I1, I2, I3 | 3 |
I1, I2, I4 | 2 |
I1, I3, I4 | 1 |
I2, I3, I4 | 2 |
(ii) Prune Step: Compare candidate set item’s support count with minimum support count. The above table shows that itemset {I1, I2, I4}, {I1, I3, I4} and {I2, I3, I4} does not meet min_sup = 3, thus those are removed. Only the item set {I1, I2, I3} meet min_sup count.
Thus, we have discovered all the frequent item-sets. Now we need to generate strong association rules (satisfies the minimum confidence threshold) from frequent item sets. For that we need to calculate confidence of each rule.
The given Confidence threshold is 80%.
The all possible association rules from the frequent item set {I1, I2, I3} are:
{I1, I2} ⇒ {I3}
= (3/ 4)* 100 = 75% (Rejected)
{I1, I3} ⇒ {I2}
= (3/ 3)* 100 = 100% (Selected)
{I2, I3} ⇒ {I1}
= (3/ 4)* 100 = 75% (Rejected)
{I1} ⇒ {I2, I3}
= (3/ 4)* 100 = 75% (Rejected)
{I2} ⇒ {I1, I3}
= (3/ 5)* 100 = 60% (Rejected)
{I3} ⇒ {I1, I2}
= (3/ 4)* 100 = 75% (Rejected)
This shows that the association rule {I1, I3} ⇒ {I2} is strong if minimum confidence threshold is 80%.
TID | Items |
---|---|
T1 | I1, I2, I5 |
T2 | I2, I4 |
T3 | I2, I3 |
T4 | I1, I2, I4 |
T5 | I1, I3 |
T6 | I2, I3 |
T7 | I1, I3 |
T8 | I1, I2, I3, I5 |
T9 | I1, I2, I3 |
Consider the above dataset and find frequent item sets and generate association rules for them. Assume that Minimum support count is 2 and minimum confident threshold (c = 50%).
TID | Items |
---|---|
T1 | {milk, bread} |
T2 | {bread, sugar} |
T3 | {bread, butter} |
T4 | {milk, bread, sugar} |
T5 | {milk, bread, butter} |
T6 | {milk, bread, butter} |
T7 | {milk, sugar} |
T8 | {milk, sugar} |
T9 | {sugar, butter} |
T10 | {milk, sugar, butter} |
T11 | {milk, bread, butter} |
Consider the above dataset and find frequent item sets and generate association rules for them. Assume that Minimum support count is 3 and minimum confident threshold (c = 60%).