We saw that the objective function is expressed in cluster center errors and feature weights. Each cluster i is allowed to have its own set of feature weights Vi = [vi1, vi2, ..., vin] The feature weights are of at most unit length and after clustering, the sum of all feature weights vi1 +vi2 ... + vin = 1. If we denote the objective function by J, then by setting the gradient of J to zero, we get the partial derivative equations we wanted to solve. The partial derivatives are with respect to the Lagrange multiplier in the first equation and with respect to feature weight in the second equation.
The objective function has reached its final state when it does not change with regard to the changes in the multiplier or the feature weight, That is why we set the gradient to zero and then solve for the feature weight vik.
We solve this equation mathematically to write an equation in terms of vik. This is only one time. The resulting equation for vik has two parts. The first part is a constant 1/n and is the default value if all attributes/keywords are treated equally and no discrimination is performed. The second term is a bias that can be either positive or negative. It comprises of the aggregated difference between the average and the actual cosine based distances along the individual dimensions.It is positive for compact attributes where the distance along this dimension is, on the average, less than the total distance using all of its dimensions. If an attribute is very compact, it is much more relevant to the cluster and could even be the center. Since the individual cosine based distance along a term-wise individual dimension could be negative, we allow them to be dissimilar and shows up as negative value. The resulting bias only increases emphasizing that dimension further. If the total aggregate dissimilarity becomes negative, then it no longer forms part of the cluster since we partition the data based on minimum distance. Thus we have obtained the attribute weight vik and are comfortable with the second component of the objective function.
The choice of the tuning parameter tau with the objective function is important because it reflects the importance of the second term with respect to the first term. If the tau for the ith cluster is very small then only one keyword in that cluster will be relevant. The rest will receive zero weights. If it is large, then all the terms in that cluster will be relevant and may receive equal weights. The tau could be chosen such that both the terms of the objective function have the same order of magnitude.
We compute this tuning parameter tau numerically in iterations. In iteration t we use the aggregated cluster center errors from t-1 iteration divided by the sum of the squared feature weights again from
t-1 iteration. The resulting fraction is taken with a constant for tau to adjust the magnitude.
The feature relevance values could change to a value in excess of 1
In our iterations we may find the equations to change a lot initially however there is little or no changes towards the later iterations.
The objective function has reached its final state when it does not change with regard to the changes in the multiplier or the feature weight, That is why we set the gradient to zero and then solve for the feature weight vik.
We solve this equation mathematically to write an equation in terms of vik. This is only one time. The resulting equation for vik has two parts. The first part is a constant 1/n and is the default value if all attributes/keywords are treated equally and no discrimination is performed. The second term is a bias that can be either positive or negative. It comprises of the aggregated difference between the average and the actual cosine based distances along the individual dimensions.It is positive for compact attributes where the distance along this dimension is, on the average, less than the total distance using all of its dimensions. If an attribute is very compact, it is much more relevant to the cluster and could even be the center. Since the individual cosine based distance along a term-wise individual dimension could be negative, we allow them to be dissimilar and shows up as negative value. The resulting bias only increases emphasizing that dimension further. If the total aggregate dissimilarity becomes negative, then it no longer forms part of the cluster since we partition the data based on minimum distance. Thus we have obtained the attribute weight vik and are comfortable with the second component of the objective function.
The choice of the tuning parameter tau with the objective function is important because it reflects the importance of the second term with respect to the first term. If the tau for the ith cluster is very small then only one keyword in that cluster will be relevant. The rest will receive zero weights. If it is large, then all the terms in that cluster will be relevant and may receive equal weights. The tau could be chosen such that both the terms of the objective function have the same order of magnitude.
We compute this tuning parameter tau numerically in iterations. In iteration t we use the aggregated cluster center errors from t-1 iteration divided by the sum of the squared feature weights again from
t-1 iteration. The resulting fraction is taken with a constant for tau to adjust the magnitude.
The feature relevance values could change to a value in excess of 1
In our iterations we may find the equations to change a lot initially however there is little or no changes towards the later iterations.
No comments:
Post a Comment