Wednesday, May 16, 2018

We now proceed to the discussion on parallel Breadth-First-Search.
Graph-based computations are pervasive in modern day applications where important entities need to be identified and ranked, anomalous patterns need to be detected,  sudden changes in networks need to be detected or tightly interconnected cluster of entities need to be found. In addition, they may  be highly data intensive. For example, an undirected graph may have upwards of 4 billion vertices  and 40 billion edges. A breadth first traversal that can traverse billions of edges / second is then required. This is possible only if we break away from the time blocking sequential access. 
There are two highly tuned parallel approaches for BFS on large parallel systems:  
1) a level synchronous strategy that relies on a simple vertex based graph partitioning and
2) a two-dimensional sparse matrix partitioning-based approach that mitigates parallel communication overhead.
These approaches are also used to benchmark supercomputers. For example, Graph-500 is used to rank them on their performance with BFS.  Distributed memory systems are a hallmark of supercomputers.
BFS on a distributed memory systems architecture involves communications between the processors and a distribution of graph partitions among them. These graphs are generally sparse and the number of edges is usually to the scale of a constant factor times the number of vertices. The average path length in these graphs is a small value bounded by log of the number of vertices.
The approach 1 is straightforward where the iterations over each vertex and its adjacencies are parallelized.
The distance update and the enqueing are atomic. There is a barrier synchronization step for each level. The optimizations involved include:
1) ensuring that the edge visit steps are load-balanced.
2) the synchronization costs are reduced
3) the locality of memory references is improved.

Full discussion : https://1drv.ms/w/s!Ashlm-Nw-wnWtkxOVeU-mbfydKxs 

No comments:

Post a Comment