Thursday, May 29, 2014

In the paper to discover interesting messages spread across Twitter by using Link analysis, Yang, Lee and Rim, the retweets of messages are leveraged as implicit relationships between Twitter users.
However they look at more than just the sheer number of retweets  and use Link analysis.
The retweet count has been used by Hong et al as a measure of popularity and to present classifiers for predicting whether and how often new tweets will be presented in the future Alonso et al used the presence of a URL link as a single highly effective feature for distinguishing interesting tweets with more than eighty percent accuracy.  However this paper attempts to go beyond that by modeling the Twitter as a graph consisting of user and tweet nodes implicitly connected by retweet links, when one user retweets  what another user retweeted. They use a variation of the HITS algorithm that exploits the retweet link structure as an indicator of how interesting an individual tweet is. What is particularly interesting is that this paper does not treat all retweet links as equal but that some have more importance than others. They demonstrate their study on the real Twitter data. They score the interesting tweets with a ranking.
 They model the Twitter structure as a directed graph G with nodes N and directional edges E.
 The Graph G has two subgraphs one based only on the user nodes and another based only on the tweet nodes. Instead of running HITS on the tweet subgraph, they run it on the user subgraph and let tweets inherit the scores of their publishers.
 They first take the sum of the weighted  hub scores corresponding to each user with the weights based on user counts and this sum is treated as the authority score. Then they update the hub scores with the weights as before for the authority score but for a given user. Thus they do this iteratively. In each iteration, the scores tend to converge. After each iteration, the scores are normalized between 0 and 1 by dividing  each of them by the square root of the sum of squares of all authority/hub values. At the end we have a users authority score and hub score. This mechanism dampens the influence of users who devote most of their retweet activities towards very few other users and increase the weights of users who retweet to many more. The weights are the ratio of all other users that a user retweeted to all retweet outlinks from the user. 

No comments:

Post a Comment