Friday, October 25, 2013


we will continue to solve our objective function for the overlapping clusters so that documents can have fuzzy labels. Our fuzzy SKWIC objective function had similar components as the one we discussed earlier i.e the first component is the error to the clusters and the second component is the feature weights. In addition, the first component now had the fuzzy membership degrees aggregated on various clusters and terms. The objective function was subject to the condition that the feature weights will range between 0 and 1 and the sum of all feature weights will equal 1. The second component has a tau parameter that lets us tune the function so that both components have equal orders of magnitude. This we mentioned was required to keep both components equally relevant. Each cluster has its owns set of feature weights Vi = vi1, .. vin. 
Now we use the same Lagrange multiplier technique to optimize the objective function with respect to V. Since the rows of V are independent of each other, we reduce the objective function to C independent problems.
Then we find the partial differential equations by setting the gradient of the objective function to zero. We do this with respect to both the multiplier and the feature weights.
Solving the differential equations for feature weight vik we get two components where the first term is the default value again 1/n in this case and the next component is the bias and this one has the fuzzy membership degree included in the aggregated dot product.
The choice of tau is very important since it reflects the importance of the second term relative to the first term. The consequence of choosing a small tau will result in choosing only one keyword per cluster with feature weight 1 whereas choosing a larger value will result in all the words in a cluster to be relevant and assigned equal weights.
Since the second component of the objective function does not depend on the fuzzy membership, the update equation of the fuzzy membership is similar to Fuzzy C Means which is explained as the inverse of the cluster-wise aggregation of the ratio of aggregated cosine based distances raised to m-1.



No comments:

Post a Comment