Cluster computing

Tuesday, March 8, 2016

Today we continue reading the paper "Horton: Online Query Execution Engine For Large Distributed Graphs" by Sarwat, Elnikety, He and Kliot.
This paper talks about large graphs that don't fit on a single server. The paper introduces Horton a graph query execution engine. We were reviewing its architecture.
It has four components - the graph client library, the graph coordinator, graph partitions and the graph manager. The graph client library sends queries to the graph coordinator which prepares an execution plan for the query. The graph partition manages a set of graph nodes and edges. Horton is able to scale out mainly because of graph partitions. The graph manager provides an administrative interface to manage the graph with chores like loading and adding and removing servers.
We review these in detail.
The graph coordinator receives the query from the client library, validates it and parses it, optimizes a query plan and translates it into a finite state machine which is sent to the partitions for parallel execution. The validation involves checks against the nodes and edge predicates. The optimization involves taking a query in a parsed regular expression form, enumerating the different execution plans and choosing the one with the lowest cost. It uses a dynamic programming framework to enumerate the execution plans and to choose the one with the lowest cost. One example of optimization is to order the predicates. A good choice of predicate order can save costs that are orders of magnitude different from each other. If n is the number of predicates in the query, this method finds a suitable choice in O(n^3). After the query is optimized, the query plan is translated into a finite state machine The state machine not only expresses the execution efficiently but also packages it in a way that can be easily sent to local graph partitions for execution by their local engines. The communication to send state machine and get results is done in an asynchronous manner which allows for the results to be directly streamed from a graph partition machine to the client without involving the graph co-ordinator.
#codingquestion
In the previous examples we used NChooseK as a recursive function to find K from N. If we were to partition the data, we could rewrite this

void NChooseKPartitioned(int N, int K, ref List<int> array, int start, int count)
{
m = N/2;
n = K/2;
return NChooseK(m, n, ref List<int> array, start, count) + NChooseK(N-m, K-n, ref List<int> array, start, count);
}

To compare if two ominous are the same by change of orientation or position
Int compareTo (int [,] matrix1, int [,] matrix2)
{
For (int i=0; I < 360; I += 90)
{
var rotated = rotate (matrix1, i);
If (compare (matrix2, rotated) == 0)
Return 0;
}
Return count-of-1 (matrix1) - count-of-1 (matrix2);
}

Cluster computing

Tuesday, March 8, 2016

No comments:

Post a Comment