Sunday, June 23, 2013

A closer look at decision tree induction
Decision tree can be built from training data using this kind of algorithms. The non-leaf nodes denote a test on an attribute and the leaf node denotes a class label. Attribute values of each tuple are evaluated before a class-label is assigned to it. Decision trees can be applied to high-dimensional data because multiple attributes can be added to the tree.
Tuples from the training data are class-labeled and these are used to build the decision tree. Attribute selection measures are used to select the attribute that best partitions the tuples into distinct cases. The set of candidate attributes and the attribute selection method that best partitions the data tuples into individual classes are available as input.
Generate_Decision_Tree(N, attribute_list):
First we create a node N.
If all the tuples in D are all of the same class C then return N as a leaf node labeled with class C.
If attribute list is empty then return the label that is majority
Apply attribute_selection_method(D, attribute_list) to find the best splitting criterion. The splitting criterion tells us which attribute to test at node N by determining the best way to partition the tuples. It also tells us which branches to grow from node N with respect to the outcomes of the chosen test. The partitions are kept as pure as possible i.e. they belong to the same class. A partition is pure if all of the tuples in it belong to the same class.
Label the node N with splitting criterion
A branch is grown from each of the outcomes of the splitting criterion. The tuples in D are partitioned accordingly.
If splitting Attribute is discrete valued and there are more than one splits possible, then set the attribute list to the remainder without the splitting attribute i.e. remove the splitting attribute.
foreach outcome j of splitting criterion
  partition the tuples and grow subtrees for each partition
  let  Dj is the set of data tuples satisfying the outcome j
  if Dj is empty then attach a leaf labeled with the majority class in D to node N;
  else attach the node returned by Generate_Decision_Tree(Dj, attribute_list) to node N;
end for
return N
 

No comments:

Post a Comment