Tuesday, June 18, 2013

Some of the common techniques in finding patterns with data mining include Association rule mining which consists of finding frequent item sets from which strong association rules are generated.Associations can be analyzed to uncover correlation rules which give statistical information.
Frequent pattern mining can be categorized based on completeness, levels and dimensions of data, types of values, kinds of rules, patterns. Frequent pattern mining can be classified into frequent itemset mining, sequential pattern mining, structured pattern mining etc. Algorithms for frequent itemset mining can be of three types :
1) apriori-like algorithms: The Apriori algorithm mines frequent item sets for Boolean association rules. Based on the property that the non-empty subsets of a frequent itemset must also be frequent, the kth iteration, it forms frequent k-itemset candidates based on the frequent (k-1) itemsets.
2) frequent pattern based algorithms: FP-Growth does not generate any candidates but constructs a highly compact data structure (FP-tree) and uses fragment growth.
3) algorithms that use vertical data format transform a given data set of transactions in the horizontal data format of TID-itemset into the vertical data format of item-TID_set. Then it mines using the Apriori property and additional optimization techniques such as diffset.
These same methods can be extended for the mining of closed frequent itemsets from which the set of frequent itemsets can easily be derived. These include additional techniques such as item merging, sub-itemset pruning and item skipping.
These techniques can be extended to multilevel association rules and multidimensional association rules.
Techniques for mining multidimensional association rules can be categorized according to their treatment of quantitative attributes. For example, they can be discretized statically based on predefined concept hierarchies. Quantitative association rules can be mined where quantitative attributes are discretized dynamically based on binning/clustering.
Association rules should be augmented with a correlation measure such as lift, all_confidence and cosine.
Constraint-based rule mining refines the search for rules by providing meta rules and constraints which can be antimonotonic, monotonic, succint, convertible and inconvertible. Association rules should not be used for prediction without training.
 

No comments:

Post a Comment