Tuesday, August 16, 2016

Skip Graphs 
Conventional Skip graphs are based on skip-lists that provide the additional benefits of using a balanced tree. Thus they improve upon searching peer-to-peer networks and by providing the ability to perform queries based on key-ordering as opposed to the hash table based functionality.   
Conventional Skip graphs also introduce new algorithms for searching, constructing and inserting into the new graph. However, reducing the graph to fewer nodes helps us provide approximate answers faster while retaining the original algorithms. Moreover they can be used in conjunction with the original graph because the simpler representation is logically overlayed on top of the original graph.  
With fewer nodes and aggregated statistics per node, it is easier to answer questions about how many nodes satisfy a given criteria especially when we want to return a good short ranked list and not require the correctness of the entire result set. This is often the case for similarity based measures. 
simpler graph that has fewer nodes than the original will also fit in memory. This makes processing faster. We can select those nodes that have higher centrality. Then we can add links that represent an aggregated link between these higher order nodes. A query involving a simple where clause as in a SQL statement will yield results that are expectedly more important nodes than all the others. This is useful in case where we want to find connectivity. Similarity measures can rely on connectivity distances between two nodes. The higher centrality nodes may have to partition the graphs based on centrality to aid with the connection information.  A simple graph with fewer nodes is  a new graph of existing nodes and has no overlap with the existing original graph. It answers some of the queries directly with the nodes and edges in the graph using traditional graph algorithms. 
A skip list is a tower of increasingly sparse linked lists sorted by key.  The lists at the higher level act as express lanes that allow the sequence of nodes to be traversed quickly.  A skip graph, as per James Aspnes and Gauri Shah, is distinguished from a skip list in that there may be many lists at a particular level and every node participates in one of these lists until the nodes are splintered into singletons. A skip graph is equivalent to a collection of upto n skip lists that happen to share some of their lower levels. In distributed computing, the graph has to tolerate arbitrary node failures and it should be usable even when a huge fraction of the nodes fail. Therefore each node that appears in higher level skip lists occur based on some membership vector generally using prefix memberships to a word.  This form of membership makes a skip graph a trie of skip lists. A membership vector is a random variable. In other words, there is an edge between x and y whenever x and y are adjacent in some list.  
We can also have a tower of centrality levels and where the lowest level represents the original graph that may be very large in size and the topmost consisting of far less number of nodes. The nodes of higher levels recur in lower levels. 
The search operation in a conventional skip graph starts at the top most level of the node seeking a key and proceeds along the same level without overshooting the key, continuing at a lower level if required until it reaches the lowest level where the key will be found. The same procedure holds for the proposed skip graph because the edges appear at a current level if there was connectivity at underlying levels and we proceed at lower level when we overshoot the preceding level.  
Skip graphs can also support range queries because now we have to find only one of the nodes in the interval and then broadcast throughout the m nodes in the interval by flooding. This is a technique used in conventional skip graphs 
The range query algorithm is somewhat like this 
Void findOne(Node n, int searchOp, node startNode, int searchKey, int level) 
{ 
If (n.key == searchKey) then 
    Inform n to start_node 
If n.key < searchKey then 
    While level >= 0 do 
      If (nRlevel).key < searchKey then 
          Recursively call at nRlevel 
           Break; 
      Else level = level –1 
Else 
      While level >= 0 do 
       If (nLlevel).key  > searchKey then 
           Recursively call at nLlevel 
           Break; 
       Else level = level –1 
If (level < 0) then 
   Inform n to startNode  
} 
Void findallInrange(ref List<Node> result, int min, int max) 
{ 
// all the nodes at the lowest level between min and max are the candidates 
FindOne say at min 
Then binary chop between min and max at lowest level. 
} 
The nodes in a conventional skip graph occur at least once in one of the skip lists at every level. The nodes in the proposed graph occur selectively at different levels but maintain connectivity information as aggregation of underlying lower level connections. This is ideal for graphs that don't change much over time or where changes are incremental, possibly in a time series database or data warehouse. 

In addition, we can also have a full set of nodes at every level although the nodes appear in different skip lists at each level. This gives the advantage that the connectivity need not be aggregated to higher levels but the memberships are based on increasing path lengths. This adds more information than the proposed graph. Redundancy is not a problem as storage is cheap more so than ever.  
Skip graphs are effective on time series data such as in warehouses. The data gets accumulated over time in an incremental manner and there is little or no change to already accumulated data. 

#codingexercise

Get the maximum difference between a node and its ancestor in a binary tree:

int getMaxDiff(node root, ref int diff)
{
If (root == null)
Return INT_MAX;
Int left = GetMaxDiff(root.left, ref diff);
Int right = GetMaxDiff(root.right, ref diff);
Int val = min(left, right);
Diff = max( diff, root.data – val);
Return min(val, root.data);
}

Get the lowest common ancestor of nodes having values n1 and n2 in a binary tree:

Node getLCA( node root, int n1, int n2)
{
If (root == null) return null;
If (root.data == n1 || root.data == n2) return root;
Int left = getLCA(root.left, n1, n2);
Int right = getLCA(root.right, n1, n2);
If (left && right) return root;
If (left) return left;
If (right)return right;
Return null;

}

No comments:

Post a Comment