The streaming algorithms are helpful so long as the incoming stream is viewed as a sequence of word vectors. However, word vectorization itself must happen prior to clustering. Some view this as a drawback of this method of algorithms because word vectorization is done with the help of neural nets and softmax classifier over the entire document and there are ways to use different layers of the neural net to form regions of interest. Till date there has been no application of a hybrid mix of detecting regions of interest in a neural net layer with the help of stream-based clustering.  There is, however, a way to separate the stages of word vector formation from the stream-based clustering if all the words have previously well-known word vectors that may be looked up from something similar to a dictionary.  
The kind of algorithms that work on stream of data points are not restricted to the above algorithms. It could involve a cost function. For example, stochastic gradient descent also hoped to avoid the batch processing and repeated iterations of the batch that came to refine the gradient.  The overall cost function is written with a term that measures how the hypothesis is holding on the current datapoint. To fit the next data point, the parameter is modified. As it scans the data points, it may wander off but eventually reach near the optimum. The iteration over the scanning is repeated so that the dependence of the sequence of data points is eliminated. For this reason, the datapoints are initially shuffled which goes along the lines of a bag of word vectors rather than the sequence of words in a narrative. Generally, the overall iterations of all data points are restricted to a small number, say 10, and the algorithm makes improvements to gradients from one datapoint to another without having to wait for all of the batch.  
The steps to take in order to utilize this method for text mining, we first perform feature extraction of the text. Then we use the stochastic gradient descent method as a classifier to learn the distinct features and Finally we can measure the performance using the F1 score. Again, vector, classifier and evaluator become the three-stage processing in this case. 
No comments:
Post a Comment