Cluster computing

Saturday, March 5, 2016

Today we start reading the paper "Horton: Online Query Execution Engine For Large Distributed Graphs" by Sarwat, Elnikety, He and Kliot.
This paper talks about large graphs. These graphs don't fit on a single server. Map-Reduce and Pregel are not good enough for processing interactive adhoc queries against large graphs.Horton is a distributed interactive query execution engine for large graphs. It includes a query execution engine and a query optimizer that allows interactive execution of queries on large distributed graphs in parallel. In this paper the graph for study came from a social networking application called Codebook which uses a graph to model data on software components, developers, development artifacts and their interactions in large software projects. This kind of information about software engineers, software components, and their interactions in large software projects such as the Windows or Linux Operating Systems results in large graphs. Online querying of large graphs is very different from conventional database querying. Conventional databases store highly structured data and there is never a join performed without a join table that holds the foreign keys of both participating tables. Moreover in relational databases, references to other rows are indicated by referring to their primary key attributes via foreign columns. This is enforceable with constraints but only when the reference is never optional. Joins are computed at query time that is both compute and memory intensive. Graph databases are an improvement where the relationships are the first class citizens of the graph data model. Nodes and relationships are assembled into connected structures. Each node in the graph database model directly and physically contains a list of relationship-records that represents its relationship to other nodes.Although a graph database stores connections as first class citizens readily available thereby avoiding relationship computations at query times, it explores the larger neighborhood around the initial starting point leaving billions outside the search perimeter. Neo4j implements such a NoSQL graph database. The Map-Reduce framework can help with such queries but they are meant for batch processing not online queries. Even though their throughput can be increased to be at par with a low latency online query, they are fundamentally not the same. The paper describing Pregel states that graph algorithms often exhibit poor locality of memory access, very little work per access, and a changing degree of parallelism over the course of execution. Distribution over many machines and their faults makes the locality issue even worse. Arguably such data can be stored as distributed database clusters but even then there is no easy way to state reachability queries in the relational world. Besides large graphs pose a different challenge than hundreds of smaller graphs. Horton partitions the graph in among several servers and stores the graph partitions in main memory to offer fast query response time. The partitioning is horizontal to scale with the machines in the data centers.

#codejam problems
We have seen what ominoes are in the exercise described earlier. K-ominoes are some arrangement of K unit squares so that the unit squares remain joined on one or more edges. Determine how many different ominoes there are for a size k.
int GetOminoes (int k)
{
Int count = 0;
Var matrix = new int [k,k];
var ominoes = new List<int[,]>();
GetCombinations ( k, ref matrix, 0, ref count, ref ominoes);
Return count;
}

Void GetCombinations ( int k, int [,] matrix, int level, int ref count, List<int[,]> ominoes){
if (k==0) {
if (ominoes.Any(x => x.compareTo(matrix) == 0) == False)
{
count += 1;
ominoes.Add(matrix);
}
return;
}
for (int base = k; base > 0; base--)
{
if ( level == 0) { // contiguous occupancy k choose base }
else{ k choose base }
if (any-islands) continue;
GetCombinations(k - base, matrix, level + 1, count);
// reset occupancy
}
}

One way would be to
// fill the matrix with k 1s
// Eliminate those that are disjoint
// Ignore duplicates from rotations and translations or comparing two distributions rowcounts and columncounts sequences with all of four orientations

Another way would be to sequentially reduce the base from k to 1.
At each chosen base m increase levels with recursion for the remaining by choosing base from k to 1. For each chosen base and level, find different positions or arrangements linearly

Cluster computing

Saturday, March 5, 2016

No comments:

Post a Comment