Monday, June 10, 2013

machine learning

Machine Learning

Unsupervised learning is clustering. The class labels of training data are unknown and given a set of measurements, the data is clustered
Supervised learning is classification : The training data has both measurements and labels indicating the class of observations. New data is classified based on the training data.
Classification constructs a model. A model is a set of rules such as if this condition, then this label. The classification model is used to predict categorical class labels or estimating accuracy of the model.
Prediction is computation such as a formula on the attributes of data and models continuous-valued functions.
Classification tasks include induction from training set and deduction from test set. A learning algorithm is used to build a learning model. When the model is trained, it can be used for deduction from test set.
The speed at which a classifier can be used is time it takes to construct the model and then to use it for classification.
Classification models are evaluated based on accuracy, speed, robustness, scalability, interpretability, and other such measures.
Classification algorithms are called supervised learning because a supervisor prepares a set of examples of a target concept and a set of tasks that finds a hypothesis that explains a target concept which is then used and measured as performance of how accurately the hypotheis explains the example
If we define the problem domain as set of X then classification is a function c that maps X to a result set D which is going to be analyzed.
if we consider a set of <x,d> pairs  with x belonging to X and d belonging to D, then this set called Experience E is explained by a hypothesis h and the set of all such hypotheses is H. The goodness of H is the percentage of examples that are correctly explained.
Given the examples in E, supervised learning algorithms search the hypotheses H  for the one that best explains the examples in E. The type of hypothesis required influences the search algorithm. The more complex the representation, the more complex the search algorithm. Search can go from general to specific and from specific to general. It's better to work with single dimensions and boolean d aka concepts.
Inductive Bias is the set of assumptions that together with the training data deductively justify the classification assigned by the learner to future instances. There can be a number of hypotheses consistent with the training data. Each learning algorithm has an inductive bias that affects the hypothesis selection. Inductive Bias can be based on  language-syntax, heuristic-based-semantics, rank based  preference and restriction based on search space.
[Courtesy : Prof. Pier Luca Lanzi lecture notes]

No comments:

Post a Comment