Before we discuss word similarity measures and the OOV term normalization that we wanted to continue from the previous post, I want to take a minute to mention Centrality of the graph which is another useful concept that can help with the word similarity. Centrality is best discussed in Sociology and I refer from the notes of the course in UCLA by Prof. McFarland and Motoki Watabe.
There are three different centrality measures:
1) degree of centrality
2) closeness centrality
3) betweenness centrality
Given a graph with n nodes, the centrality of a node, refers to the number of edges attached to the node. To normalize the score for each node, we divide the number of edges attached to each node by n-1.
First, the degree of centrality:
If we have a graph A with four vertices that contains only one central node to which all other nodes attach, then that central node has a centrality of 3 and the other have a centrality of 1.
The degree of centrality is calculated as [ (3-3) + (3 -1) + (3 -1) + (3 -1) ] / max-possible-which-is-(n-1)(n-2) in this case = 6/6 = 1
For a graph B with all the four vertices where each node is connected to every other as in a square with diagonals, each of the vertices now have a centrality of score of 3 and the degree of centrality for the overall graph is [ (3-3) + (3 -3) + (3 -3) + (3 -3) ] / (n-1)(n-2) = 0
The numerator is calculated as the ratio of the individual sum of the differences between the max centrality score and that of individual scores
It may appear that the higher the normalized score the higher a node's likely relevance in the graph. However this is not always true. Cook et al 1983 demonstrated that this is not the case. We will look into the applications of each measure as we define it.
To calculate the closeness centrality, we find the farness of each node and take its inverse. The farness of a node s is defined as the sum of its distance to all other nodes. The more central a node is the lower its total distance to all other nodes. Closeness can be taken as a measure of how fast the information can spread from a node to all other nodes sequentially.
For network A, the centrality score for node 1 is 3 / 3 and for the other nodes is 3 / 5
Since the maximum score in the above is 1,
the closeness centrality for network A is [( 1 - 1 ) + ( 1 - 3/5 ) + (1 - 3 / 5) + (1 - 3 / 5 )] / [n(n-1)/2 / 5] = 1
and for network B, the centrality score for each node is uniformly 3 / 3 = 1
and closeness centrality for the overall network B is (0 + 0 + 0 + 0) / (6/5) = 0
To calculate the betweenness centrality, we find the number of times a vertex acts as a bridge in the shortest path between two other nodes. It was introduced as a measure for the controllability of a node in the dissemination of a network.
For network A, the centrality score for the most connected node is 3 / 3 = 1 and for all the other nodes is 0
The maximum score in the above is 3 / 3 = 1
The betweenness score for network A is [(1-1) + (1-0) + (1-0) + (1-0)]/ [(n - 1)(n-2)/2] = 3 /3 = 1
and
for network B the centrality score is for each node is the same which is 0 since the node can be bypassed.
The maximum score is 0
and the overall score for network B is 0 / 3 = 0
Thus we see that in all the three cases, network A is more centralized than network B.
For information flow, network A has high controllability and high efficiency in that there is no redundant edges. At the same time, network A is riskier.
For network B, there is high flexibility and better tolerance to failures. However the cost from redundant edges is higher.
We saw different scores for vertices. To determine the admittance matrix for the vertices, we have to calculate a different kind of score.
There are three different centrality measures:
1) degree of centrality
2) closeness centrality
3) betweenness centrality
Given a graph with n nodes, the centrality of a node, refers to the number of edges attached to the node. To normalize the score for each node, we divide the number of edges attached to each node by n-1.
First, the degree of centrality:
If we have a graph A with four vertices that contains only one central node to which all other nodes attach, then that central node has a centrality of 3 and the other have a centrality of 1.
The degree of centrality is calculated as [ (3-3) + (3 -1) + (3 -1) + (3 -1) ] / max-possible-which-is-(n-1)(n-2) in this case = 6/6 = 1
For a graph B with all the four vertices where each node is connected to every other as in a square with diagonals, each of the vertices now have a centrality of score of 3 and the degree of centrality for the overall graph is [ (3-3) + (3 -3) + (3 -3) + (3 -3) ] / (n-1)(n-2) = 0
The numerator is calculated as the ratio of the individual sum of the differences between the max centrality score and that of individual scores
It may appear that the higher the normalized score the higher a node's likely relevance in the graph. However this is not always true. Cook et al 1983 demonstrated that this is not the case. We will look into the applications of each measure as we define it.
To calculate the closeness centrality, we find the farness of each node and take its inverse. The farness of a node s is defined as the sum of its distance to all other nodes. The more central a node is the lower its total distance to all other nodes. Closeness can be taken as a measure of how fast the information can spread from a node to all other nodes sequentially.
For network A, the centrality score for node 1 is 3 / 3 and for the other nodes is 3 / 5
Since the maximum score in the above is 1,
the closeness centrality for network A is [( 1 - 1 ) + ( 1 - 3/5 ) + (1 - 3 / 5) + (1 - 3 / 5 )] / [n(n-1)/2 / 5] = 1
and for network B, the centrality score for each node is uniformly 3 / 3 = 1
and closeness centrality for the overall network B is (0 + 0 + 0 + 0) / (6/5) = 0
To calculate the betweenness centrality, we find the number of times a vertex acts as a bridge in the shortest path between two other nodes. It was introduced as a measure for the controllability of a node in the dissemination of a network.
For network A, the centrality score for the most connected node is 3 / 3 = 1 and for all the other nodes is 0
The maximum score in the above is 3 / 3 = 1
The betweenness score for network A is [(1-1) + (1-0) + (1-0) + (1-0)]/ [(n - 1)(n-2)/2] = 3 /3 = 1
and
for network B the centrality score is for each node is the same which is 0 since the node can be bypassed.
The maximum score is 0
and the overall score for network B is 0 / 3 = 0
Thus we see that in all the three cases, network A is more centralized than network B.
For information flow, network A has high controllability and high efficiency in that there is no redundant edges. At the same time, network A is riskier.
For network B, there is high flexibility and better tolerance to failures. However the cost from redundant edges is higher.
We saw different scores for vertices. To determine the admittance matrix for the vertices, we have to calculate a different kind of score.
No comments:
Post a Comment