Cluster computing

Sunday, March 6, 2016

Today we continue reading the paper "Horton: Online Query Execution Engine For Large Distributed Graphs" by Sarwat, Elnikety, He and Kliot.
This paper talks about large graphs. These graphs don't fit on a single server. Map-Reduce and Pregel are not good enough for processing interactive adhoc queries against large graphs.Horton is a distributed interactive query execution engine for large graphs. We were reviewing that large graphs pose a different challenge from a number of small graphs. The latter has examples in bioinfomatics and chemoinformatics. Moreover, the conventional solution for querying large graphs based on offline batch processing is different from the corresponding online interactive query requirements. To manage large graphs online while supporting querying with small latency, a new system was required to be built. Horton provides a graph query execution engine and a graph query optimizer that is able to efficiently process online queries on large graphs. Further it uses a design decision to partition the graph among several servers in order to facilitate online processing. Horton stores the graph partitions in main memory to offer fast query response time.
We now review Horton data model and query language. A graph database stores not only the nodes or entities but also the relationships or connections and these connections can have rich attributes.
Horton too supports rich graph data. Nodes represent entities and have types as well as a set of key value pairs representing the data associated with this entity. Edges have type and data as well. There is no restriction on the kind or amount of data associated with nodes and edges. Horton manages both directed and undirected graph.If the graph is directed, each node stores both inbound and outbound edges
Queries are in the form of regular language expressions. A query is a sequence of node predicates and edge predicates. Each predicates can contain conjunctions and disjunctions on node and edge attributes as well as closures. For example, a query to retrieve a list of all the photos in which John is tagged is expressed as Photos-tagged-John. In this case all paths in the graph that satisfy the query expression are returned. To get a specific part of the path such as a node, a SELECT statement is used.
Select photo from Photo-Tagged-John. Here the node photo is the only one but there could have been other nodes and edges in the query
The syntax is helpfully closer to SQL unlike Cypher. This was also a design decision In general, graph databases perform better than relational in highly interconnected data where a nearly online data warehouse is required.
#codejam continued
Fill a linear array of size n with k elements of value 1

void NChooseK(int N, int K, ref List<int> array, int start, int count)
{
if (count ==k) { // print array and return; }
if (start == N) return;
for (int i=0; i < 2; i++)
{
if (i == 0) NChooseK(N,K, ref array, start+1, count);
else{
array[i] = 1;
NChooseK(N,K, ref array, start+1, count+1);
array[i] = 0;
}
}
}

Cluster computing

Sunday, March 6, 2016

No comments:

Post a Comment