In today's post I will discuss applying centrality to keyword extraction. In this article, I try to explain how to extract keywords
using eigenvector centrality. This
method gives us a way to measure the importance of the keywords. First, the
words are represented as nodes in a graph. Then words that are connected to
each other are given an edge in the graph. The edges are undirected because we
only want the adjacency information. The connection between words is given by
mutual information. Mutual information is
based on pairwise grouping of words and this fits right in with our
connections. The mutual information is calculated as the probability of the
co-occurrence times the logarithm of the ratio of the probability of
co-occurrence to the probabilities of the independent occurrences of the pair
of words. With a threshold for the mutual information as a percentage of the
range that it falls in, we can select the edges for the graph. This gives us a
set of nodes and edges which we represent with an adjacency matrix.
Using the adjacency matrix for a given graph with vertex V
and edges E, we find the eigenvector centrality
score of the vertex v as 1/
lambda Sum (Xt) which can be written in
vector notation as Ax = Lambda X. Lambda is the eigenvalue for which many
different eigenvector exist. The highest one gives the maximum importance of
the node in the network and is said to be found by power iteration.
Power iteration can
work for very large matrix and finds the highest eigenvector for Ax = Lambdax
By iteration we mean
we start with an approximate eigenvector b0
In each iteration, we
multiply the eigenvector with the adjacency matrix and normalize it. For each
iteration, the following is suggested:
Initialize vector b, deviation and n
While ( deviation is greater than tolerance)
Temporary_vector = A * b
Deviation = abs(norm(b)) – n
n = norm(x)
b = Temporary_vector / n
return vector b
No comments:
Post a Comment