we now summarize the Fuzzy SKWIC algorithm below
Fix the number of clusters C;
Fix m, m can range from 1 to infinity; m goes with fuzzy memberships so suitable value can be chosen to amplify that.
Initialize the centers by randomly selecting C documents.
Initialize the fuzzy partition matrix U = C x N matrix = [uii] where uii is the fuzzy memberships subject to the following constraints: a) uij is bounded between 0 and 1. b)sum of uij for j = 1 to N is bounded by 0 to N and c) sum of uij for clusters 1 to C must equal 1.
REPEAT
1) Find the cosine distance along the individual dimensions k as 1/n - xjk.cik where the cik is the kth component of the ith cluster center vector. Do this for i ranging from 1 to C, j ranging from 1 to N and k ranging from 1 to n ( n is the total number of terms in a collection of N documents)
2) Before we find the aggregated cosine distance, update the relevance weights vik by using the solution from partial differential equations where vik is assigned a default value of 1/n and a bias involving no fuzzy memberships and only the aggregated differences in cosine distances.
3) Now compute the aggregated cosine based distance for each cluster i from 1 to C and each document j from 1 to N using the relevance weight and the cosine distances along the individual dimensions and aggregating them
4) If the aggregated cosine based distance from above turns out to be negative, adjust it so that it does not affect the sign of the fuzzy memberships. We adjust by updating it with the minimum magnitude of the aggregated cosine based distance
5) Update the partition matrix Uk by the ratio of the aggregated cosine based distances Dwcij and Dwckj and the ratio summed for each cluster. The result is then inverted to update the matrix.
6) Update the centers by taking into account the fuzzy memberships. Specifically, we set cik to 0 when the relevance weight is zero and set it to fuzzy membership normalized average xjk. The fuzzy membership is amplified by raising to power m and aggregated over the number of document j ranging from 1 to N in both the numerator and the denominator for normalizing the document frequency vector
7) Update the tau for the ith cluster by using the cosine based distance along the individual dimensions using the cosine based distance along individual dimension and the feature weights summed for all the terms and divided by the sum of the square of the feature weights. This is then multiplied by the sum of the amplified fuzzy memberships and a constant K. We find the value of tau by repeating the iterations with the values for the fuzzy membership, the feature weight, cluster center cik from the previous iteration.
(Continue the iterations) UNTIL centers stabilize.
Note that we have used the fuzzy memberships in the initialization and steps 4,5,6 and 7. So fuzzy SKWIC improves the partition matrix, the center updates and the choice of tau in each iteration.
Fix the number of clusters C;
Fix m, m can range from 1 to infinity; m goes with fuzzy memberships so suitable value can be chosen to amplify that.
Initialize the centers by randomly selecting C documents.
Initialize the fuzzy partition matrix U = C x N matrix = [uii] where uii is the fuzzy memberships subject to the following constraints: a) uij is bounded between 0 and 1. b)sum of uij for j = 1 to N is bounded by 0 to N and c) sum of uij for clusters 1 to C must equal 1.
REPEAT
1) Find the cosine distance along the individual dimensions k as 1/n - xjk.cik where the cik is the kth component of the ith cluster center vector. Do this for i ranging from 1 to C, j ranging from 1 to N and k ranging from 1 to n ( n is the total number of terms in a collection of N documents)
2) Before we find the aggregated cosine distance, update the relevance weights vik by using the solution from partial differential equations where vik is assigned a default value of 1/n and a bias involving no fuzzy memberships and only the aggregated differences in cosine distances.
3) Now compute the aggregated cosine based distance for each cluster i from 1 to C and each document j from 1 to N using the relevance weight and the cosine distances along the individual dimensions and aggregating them
4) If the aggregated cosine based distance from above turns out to be negative, adjust it so that it does not affect the sign of the fuzzy memberships. We adjust by updating it with the minimum magnitude of the aggregated cosine based distance
5) Update the partition matrix Uk by the ratio of the aggregated cosine based distances Dwcij and Dwckj and the ratio summed for each cluster. The result is then inverted to update the matrix.
6) Update the centers by taking into account the fuzzy memberships. Specifically, we set cik to 0 when the relevance weight is zero and set it to fuzzy membership normalized average xjk. The fuzzy membership is amplified by raising to power m and aggregated over the number of document j ranging from 1 to N in both the numerator and the denominator for normalizing the document frequency vector
7) Update the tau for the ith cluster by using the cosine based distance along the individual dimensions using the cosine based distance along individual dimension and the feature weights summed for all the terms and divided by the sum of the square of the feature weights. This is then multiplied by the sum of the amplified fuzzy memberships and a constant K. We find the value of tau by repeating the iterations with the values for the fuzzy membership, the feature weight, cluster center cik from the previous iteration.
(Continue the iterations) UNTIL centers stabilize.
Note that we have used the fuzzy memberships in the initialization and steps 4,5,6 and 7. So fuzzy SKWIC improves the partition matrix, the center updates and the choice of tau in each iteration.
No comments:
Post a Comment