Cluster computing

Monday, October 10, 2016

Considerations for large graph processing

Facebook engineers who worked on large scale graphs with trillions of edges mentioned the following issues they faced:

1) They needed to extend the basic pregel model of graph processing. For example, to perform k-means clustering, they had to enable generalizations for the iterations involved such as to make the framework more reusable and the tasks are abstracted per vertex. In this case, first, the vertex compute method calculates the distance to all the centroids while becoming the member of the nearest cluster. Then the centroids are recalculated and at the next outer iteration, the new centroids are available to every vertex.

2) Pregel model also requires computations that are characterized as "thinking like a vertex" where as some computations need to be executed in a centralized fashion.

3) Pregel only allows one message type and one message combiner, therefore computations at a vertex needed to fit all in. To allow computations to be distinctive, cleaner and more reusable, they added composable computation.

4) When messages are distinct, commutative and associative, they can be combined and thus making the message more "aggregatable". For those that cannot be aggregated, they introduced a technique called "superstep splitting" where a fragment of a message is sent and a partial computation updates the state of the vertex value. This is all in memory.

5) To improve performance and scalability, they had to introduce additional parallelization in the runtime and the shared contexts which they called "WorkerContext". Bottlenecks and overheads such as checkpointing were addressed by scheduling. This was finer level than what the infrastructure provided.

6) They even optimized the memory utilization of the graph infrastructure because it allowed arbitrary vertex id, vertex value, edge and message classes. They did this with 1) serializing edges with byte array and 2) serializing messages on the server.

7) They also improved parallelization with sharded aggregators that provided efficient shared state across workers. With this approach, each aggregator gets assigned to a randomly picked worker which then gathers the values, performs the aggregation and distributes the final values to master and other workers. This distributes the load that was otherwise entirely on the master.

Since the graphs are large, it is partitioned as n parts on m machines. Computations are distributed across workers and aggregators.
#codingexercise

Void ToBinary(int num, ref string binary)

{

    If (num == 0) return;

    binary = num %2 + binary;

    ToBinary(num/2, ref binary);

}

Void ToInt(string binary, ref int num, int k = 0)

{

    If (String.IsNullOrEmpty(binary)) return;

    Char c = binary.RemoveLast();

    Num += math.pow(2, k) × c.toInt();

    ToInt(binary, ref num, k + 1);

}

Int ResetKthBit(int n, int k)

{

String binary;

ToBinary(n, ref binary);

Assert(binary.count > n);

If (binary[k] == 0)

     binary[k] = 1;

Int num;

ToInt(binary, ref num, 0);

Return num;

}

Cluster computing

Monday, October 10, 2016

No comments:

Post a Comment