We will continue the discussion on the Multiple Cause Mixture Model (MCMM). We looked at the global objective function as the aggregation of all the local objective functions pertaining to individual documents. We saw that the maximization of the objective function is performed in two steps :
1) First the cluster centroids c are fixed and then the mi,k value is found for a local maximum.
2) Second, the cluster activation values mi,k are fixed and then the cj,k are found for a local maximum.
These steps are repeated until the objective function cannot be maximized further.
When the centers stabilize, one of the cluster centers is split into two and the steps are repeated.
This increase in the number of cluster centers continues until the addition of a cluster center does not increase the objective function.
Note that the first step to find the activity vectors mi can be offloaded as in the case of supervised learning where a teacher provides these activity vectors and the algorithm focuses on finding the cluster centers.
we can represent this solution in a diagram with the cluster centers in a separate layer above the layer representing words. Activity m in cluster layer topic nodes flows top-down to cause activity in nodes r which represents the predictions for how likely the words are to appear in the document. So measurements from observed dj flow up and the predictions flow down during iterations
The assumptions made by the model are as follows:
First, all input data is binary. For text categorization
Second, terms are selected based on Zipf's law which states that the number of times a word appears is invesely proportional to that number of times.
1) First the cluster centroids c are fixed and then the mi,k value is found for a local maximum.
2) Second, the cluster activation values mi,k are fixed and then the cj,k are found for a local maximum.
These steps are repeated until the objective function cannot be maximized further.
When the centers stabilize, one of the cluster centers is split into two and the steps are repeated.
This increase in the number of cluster centers continues until the addition of a cluster center does not increase the objective function.
Note that the first step to find the activity vectors mi can be offloaded as in the case of supervised learning where a teacher provides these activity vectors and the algorithm focuses on finding the cluster centers.
we can represent this solution in a diagram with the cluster centers in a separate layer above the layer representing words. Activity m in cluster layer topic nodes flows top-down to cause activity in nodes r which represents the predictions for how likely the words are to appear in the document. So measurements from observed dj flow up and the predictions flow down during iterations
The assumptions made by the model are as follows:
First, all input data is binary. For text categorization
Second, terms are selected based on Zipf's law which states that the number of times a word appears is invesely proportional to that number of times.
No comments:
Post a Comment