In the experimental results for MCMM, we find cluster coherency. The datasets used by MCMM involved the Reuters document collection. Two small subsets of
documents were used. The first consisted of 983 documents with 9 labels involving 372 dimensions and the second consisted of 240 article with 3 labels and
360 dimensions or words. This dataset was chosen so that MCMM could first be run on pre-labeled data before running in an unsupervised manner. There was a
70-30% split in each data set for training and test purposes. For comparision, a naive Bayesian classifier was also run. With the MCMM, the cluster
activities produce a ranking for category assignment instead of a binary decision which allows us to compare the tradeoff between precision and recall. If
we increase the threshold for activity, there would be more precision but less recall. It was found that the precision and recall were great for all except
three labels which seemed more of an exception than the norm. The documents from the first set in the reuter collections from the mentioned three labels had
multiple labels assigned to them. However, it was also found that the Naive Bayes Classifier performed better. And the performance was compared against a
much larger reuters collection.
MCMM was expected to have better performance in the unsupervised mode.
documents were used. The first consisted of 983 documents with 9 labels involving 372 dimensions and the second consisted of 240 article with 3 labels and
360 dimensions or words. This dataset was chosen so that MCMM could first be run on pre-labeled data before running in an unsupervised manner. There was a
70-30% split in each data set for training and test purposes. For comparision, a naive Bayesian classifier was also run. With the MCMM, the cluster
activities produce a ranking for category assignment instead of a binary decision which allows us to compare the tradeoff between precision and recall. If
we increase the threshold for activity, there would be more precision but less recall. It was found that the precision and recall were great for all except
three labels which seemed more of an exception than the norm. The documents from the first set in the reuter collections from the mentioned three labels had
multiple labels assigned to them. However, it was also found that the Naive Bayes Classifier performed better. And the performance was compared against a
much larger reuters collection.
MCMM was expected to have better performance in the unsupervised mode.
No comments:
Post a Comment