Saturday, June 7, 2014

I've been looking at some objective functions for graph based similarity between words and I came across "Graph based Unsupervised Learning of Word Similarities using Heterogeneous Feature Types" by Saluja and Navratil. This seems a simple way to express the similarity. The graph they use is bipartite between words and feature sets. Edges connect two words or two feature sets or a word and a feature set.
To learn the graph, the following objective function in minimized:
Omega(Wv, Wf, Wz) = a0 Sum ( Wfp fq - (Wfp fq)-initial )^2
                                   + a1 Sum over vi Sum over fp ( Zvi fp - Zvi-initial fp) ^2
                                   + a2 Sum over vi, vj Sum over fp fq  Zvi fp Zvj fq ( Wvi, vj - Wfp,fq) ^ 2
                                   + a3 Sum over vi, vj  Sum over fp, fq Wvivj WfpWfq (Zvi,fp - Zvj, fq) ^ 2
where
Wfp,fq is the current similarity between feature fp and fq
Wvi vj is the current similarity between word vi and vj
Zvi, fp is the current importance weight of feature fp for word vi
a0 to a3 are constants that add up to 1 for the relative importance of each term
The first two terms are described as minimizing the l2-norm between the initial and the current values of Wfp,fq and Zvi, fp
The third term minimizes the similarity between the words vi and j and the features fp and fq
The fourth term minimizes the similarity between the importance weights in proportion to the similarities
As you can see the terms are progressive and enable smoothing over the two graphs.
But can we express this with a different equation that enables Laplacian smoothing and matrices
The Laplacian matrix sometimes called the admittance matrix or Kirchoff matrix of a graph G which is undirected and unweighted, is an n * n symmetric matrix with one row and column for each node defined by L = D - A where  D is the degree matrix, which is the diagonal matrix formed from the vertex degrees and A is the adjacency matrix.
A normalized vector of the Laplacian matrix of a graph is implemented by
L(G) = 1 if i == j and di != 0
         = -1 / sq rt (di.dj) if i and j are adjacent
         = 0 otherwise.
The Laplacian matrix measures to what extent a graph differs at one vertex from its values at nearby vertices. 

No comments:

Post a Comment