In today's post we will continue our discussion on the algorithms and the issues encountered: We list some of them here.
One shot versus hierarchical tree classifier:. In the former, the distinction between all the classes are made in one-stage and is especially helpful when the number of features is large. In the latter, the classification structure is a binary tree where the most obvious discriminations are done first and the subtle ones done later. When the number of features per node is smaller than the total number of features, this is must faster.
Parametric versus non-parametric classification: Both rely on knowing the forms of the class conditional densities so that the decision rules can be written. If the class conditional densities are Gaussian, parametric techniques can be applied. So the decision rules can be optimal or plug-ins. However, non-parametric rules that are not based on pattern class distributions could also be very helpful.
Dimensionality and Sample size relationship When the number of a features for a vector is very large, the classifier designer has to choose a smaller feature set. This is often referred to as the curse of dimensionality and results in some loss of fidelity. It is overcome by choosing number of training samples per class to be at least five to ten times the number of features.
Feature Selection : Even among the set of possible features, the ones that are selected depend upon computational ease and cost considerations. Moreover, determining the optimal subset of features of size m from the d available features requires an exhaustive search over all possible subsets. Some intelligent ones to alleviate these have been proposed and I've mentioned a few in my previous posts.
Error Estimation : In order for us to know if the samples are correctly classified, we measure the errors with classification. We do this by splitting the available samples into training sets and test sets. The classifier is designed using the training set and then evaluated on the samples from the test sets. Precision and recall are two metrics with which the classifier performance is evaluated. These are also referred to as the hold-out method and leave-one-out method. The split between the training sets and test sets has been investigated.
So far we have discussed the geometrical or statistical pattern recognition algorithms.
The second paradigm of pattern recognition algorithms are based on structural or syntactic approach. When the number of features required to establish a decision boundary is very large, its helpful to view such patterns as being composed of simple sub-patterns. A sub pattern could itself be built from simpler parts with grammatical techniques.
One shot versus hierarchical tree classifier:. In the former, the distinction between all the classes are made in one-stage and is especially helpful when the number of features is large. In the latter, the classification structure is a binary tree where the most obvious discriminations are done first and the subtle ones done later. When the number of features per node is smaller than the total number of features, this is must faster.
Parametric versus non-parametric classification: Both rely on knowing the forms of the class conditional densities so that the decision rules can be written. If the class conditional densities are Gaussian, parametric techniques can be applied. So the decision rules can be optimal or plug-ins. However, non-parametric rules that are not based on pattern class distributions could also be very helpful.
Dimensionality and Sample size relationship When the number of a features for a vector is very large, the classifier designer has to choose a smaller feature set. This is often referred to as the curse of dimensionality and results in some loss of fidelity. It is overcome by choosing number of training samples per class to be at least five to ten times the number of features.
Feature Selection : Even among the set of possible features, the ones that are selected depend upon computational ease and cost considerations. Moreover, determining the optimal subset of features of size m from the d available features requires an exhaustive search over all possible subsets. Some intelligent ones to alleviate these have been proposed and I've mentioned a few in my previous posts.
Error Estimation : In order for us to know if the samples are correctly classified, we measure the errors with classification. We do this by splitting the available samples into training sets and test sets. The classifier is designed using the training set and then evaluated on the samples from the test sets. Precision and recall are two metrics with which the classifier performance is evaluated. These are also referred to as the hold-out method and leave-one-out method. The split between the training sets and test sets has been investigated.
So far we have discussed the geometrical or statistical pattern recognition algorithms.
The second paradigm of pattern recognition algorithms are based on structural or syntactic approach. When the number of features required to establish a decision boundary is very large, its helpful to view such patterns as being composed of simple sub-patterns. A sub pattern could itself be built from simpler parts with grammatical techniques.
No comments:
Post a Comment