In the Multiple cause mixture model, we mentioned there can be more than one mapping functions between activity m and weights c. One example was the soft disjunction function which explains that the likelihood for any given word to appear in a document only increases with the presence of activity in multiple topic nodes capturing the document's topical content. Here the direction for flow is from activity in clusters to topics in node. The inverse flow can also be described. Any prediction vector r can be combined with a binary vector d representing the words present in the document dj with an objective function Sum-j[log (djrj + (1-dj)(1-rj)] By getting higher values for this function, we have a way to find the vector of cluster activities m that optimize this function. Now given a corpus of documents, indexed by i, the global objective is the aggregation of the individual objective functions. When the global is maximized, we arrive at the set of weights c reflecting clusters of words that co-occur in documents.
Training begins by initializing a single cluster centroid at a random point. For this initial cluster centroid and for later stages when there are more centroids, the maximization occurs in two steps. First the cluster centroids are fixed and the local maximum is found over all data points. In the second step, the cluster activation values are fixed at the values found in the previous step and the gradient ascent is performed until the over the cluster centers. Both the steps are repeated until the function cannot be maximized further.
Training begins by initializing a single cluster centroid at a random point. For this initial cluster centroid and for later stages when there are more centroids, the maximization occurs in two steps. First the cluster centroids are fixed and the local maximum is found over all data points. In the second step, the cluster activation values are fixed at the values found in the previous step and the gradient ascent is performed until the over the cluster centers. Both the steps are repeated until the function cannot be maximized further.
No comments:
Post a Comment