Monday, November 23, 2015

In Naïve Bayes, the classic algorithm is as follows:  
Estimate probability of a training vector given a condition exists, (Aka likelihood) 
Estimate Probability of a training vector given  condition does not exist. 
We also calculate probability that the condition exists (Aka prior) and the condition doesn't exists. These are referred to as weights to the above. 
this is applied to entire data.  

Now we modify the previous algorithm for incremental data as follows: 

For new data: 
Calculate As above 

For old data: 
 Sum the previous calculated Target directly with the new target 

This is because the denominator remains the same and the weights remain the same. 
We now discuss discriminant analysis which uses concepts similar to Bayes.  In Bayes, we expressed the conditional probabilities. In DA we use the conditional probability and the prior for each target variable.  This method applies to normal distribution data only. Consequently we calculate the mean and the standard deviation. For computing the mean incrementally as and when data becomes available,  we could find the mean of the new data and take the weighted average of the previous and current mean. For computing the standard deviation as and when data becomes available, we use variance from the previous data and the variance from the new data and take their weighted mean. Then we take the square root of the result to find the new standard deviation. In other words we do the same as what we did for finding mean incrementally because the support variables or their squares are both in summation form. We argued earlier that summation form yields itself to parallelization as well as incremental updates. 
Let us find out now how to build discriminant analysis incrementally.thus far we have said that mean, standard deviation and probability can each be calculated incrementally. And DA can be expressed in a form that has two components : the first component is based on squared differences from the mean as numerator and standard deviation as denominator and the second component is based  on the natural logarithm of the probability. Both the components clearly rely on parameters that can be found incrementally. Consequently, the DA can be calculated at every stage. 







No comments:

Post a Comment