Today we will look at the distance measures used with the Fuzzy SKWIC, specifically the aggregated sum of cosine based distances along the ith dimension. For a program to implement it, we will review what we need to save from our computations for use later so we can be more efficient.
The aggregated sum of cosine based distance we said was computed as Dwcij = 1/n - xjk.cik where i ranges from 1 to C, j ranges from 1 to N and k ranges from 1 to n. n is the total number of terms in a collection of N documents. For this we maintain a C x N matrix. When we have computed the individual cosine based distances, we will be computing their weighted aggregated sum and will need this D'wcij to be saved. The feature weights are initialized so we compute this weighted aggregated sum for the first iteration. But note that the Dwcij is used when updating the feature weights. Dwcij could change when the cluster center changes but for the duration of iterations, (our iterations are to update the cluster centers) we can use the same Dwcij. Looking at the equation for updating the feature weights vik = 1/n + (1 /2tau)(SumN(uij^m)[D'wcij/n - Dwcij], we note that it uses both the weighted aggregated sum from the previous iteration and the Dwcij we computed. With a suitable value for tau and initializing the fuzzy memberships and m, we are good to compute the feature weights. The tau changes in each iteration and it serves to maintain the same order of magnitude in the objective function we discussed, so we initialize the values in the equation to update tau for each iteration with what we have in the first iteration. Since tau affects the feature weights, we will need to adjust the negative feature relevance values. So far all the variables except for the cosine based distance along the individual dimension are subject to change in each iteration.
The aggregated sum of cosine based distance we said was computed as Dwcij = 1/n - xjk.cik where i ranges from 1 to C, j ranges from 1 to N and k ranges from 1 to n. n is the total number of terms in a collection of N documents. For this we maintain a C x N matrix. When we have computed the individual cosine based distances, we will be computing their weighted aggregated sum and will need this D'wcij to be saved. The feature weights are initialized so we compute this weighted aggregated sum for the first iteration. But note that the Dwcij is used when updating the feature weights. Dwcij could change when the cluster center changes but for the duration of iterations, (our iterations are to update the cluster centers) we can use the same Dwcij. Looking at the equation for updating the feature weights vik = 1/n + (1 /2tau)(SumN(uij^m)[D'wcij/n - Dwcij], we note that it uses both the weighted aggregated sum from the previous iteration and the Dwcij we computed. With a suitable value for tau and initializing the fuzzy memberships and m, we are good to compute the feature weights. The tau changes in each iteration and it serves to maintain the same order of magnitude in the objective function we discussed, so we initialize the values in the equation to update tau for each iteration with what we have in the first iteration. Since tau affects the feature weights, we will need to adjust the negative feature relevance values. So far all the variables except for the cosine based distance along the individual dimension are subject to change in each iteration.
No comments:
Post a Comment