Thursday, June 20, 2013

Graph Mining
This involves finding frequent sub-graph patterns  and performing some of the mining operations discussed earlier such as categorization, classification, and cluster analysis over large graph data sets. These approaches are of two types - Apriori based and pattern growth based approaches. The Apriori based approach generates candidates by extracting the nodes at each level using a breadth first approach. The pattern growth approach is more flexible with respect to the search method.  Graph mining uses efficient graph index structures and can be used to search for similarity based on the structure and not the values. Using structure has more benefits than comparing values. Classification and cluster analysis of graph data can now be integrated with pattern mining process.
An example for a graph for mining is the social network which is a heterogenous and multirelational data set represented by a graph. They can be arbitrarily large. They tend to follow the densification power law, which states that the network become increasingly dense over time. They also exhibit a property where the effective diameter decreases as the network grows. Nodes that are facing out and those facing in referred to as out-degrees and in-degrees are distributed such that there are a large number of outliers. The growth of these networks are often modeled and referred to with the analogy of a forest fire.
Link mining is an interdisciplinary pursuit between social networks, link analysis, hypertext and web mining, graph mining, relational learning, and inductive logic programming. The tasks here involve classifying objects based on links, prediction based on object types, link types and link existence, estimating link cardinality, predicting whether two objects are the same and detecting cluster objects, other tasks include. Some other tasks may involve finding sub-graphs within networks and finding metadata from unstructured data.
Link prediction uses measures for the proximity of network nodes and ranks links based on these measures. Some examples include shortest path, and the number of neighbors that two nodes share.  This degree of connections is exploited in viral marketing where more money is spent on the individual who can advertise by positive word of mouth to more number of individuals. Similarly in newsgroups, people respond to posts more when they disagree than when they agree. So graph partitioning can be used to mine newsgroups by grouping authors into opposite camps. Most community mining methods assume that there is only one kind of relation in the network and in addition assume that the results are independent of the users information needs, however there may be multiple relations and relations that matter more depending on the users information needs. Furthermore, a combination of such hidden relations might reveal a hidden community within this network.
 

No comments:

Post a Comment