Cluster computing

Saturday, May 26, 2018

With my experience building web applications and services for many companies, I have found that one of the easiest and appealing engagements is to connect customers to their resources via the web. Most organizations often find that they have a data source and a whole lot of customers who want to look up their data in one or more ways.

The simplicity of using a table to list their resources and allowing the customer to query their own data however they like, is not only universally appealing but also very convenient. A manifestation of a mature solution to this problem space that is so ubiquitous across organizations is the availability of rich querying operations and simpler storage.

A table and standard query operators serve this purpose very well. When the same query operations are available on the web as methods that can be invoked with a web address identifier, it allows many devices as well as applications to consume the data in a universal way. Said another way, the absence of this manifestation indicates a potential limitation.

The language to invoke query operations over the web is with the help of web application programming interface or webAPIs for short. They conform to one or more of the following patterns for the query operations:

GET /resources?offset=0&limit=100 retrieves the first hundred records.

GET /resources?field=field1,field2 &ordering=field1retrieves just those fields and ordered by field1

GET /resources?sort=-created_at retrieves the most recent resources added pertaining to the caller

GET /resources?state=closed retrieves those resources that are no longer available

GET /resources?latitude=&longitude=&radius=

GET /resources?id={"lt": 100, "gt": 30}

GET .resource?ql=location within 500 of 40.042016, -86.900749

GET /resources?ql=select * where name = 'Fred'&limit=50 (Courtesy: usergrid.apache.org)

Friday, May 25, 2018

We were discussing reduction in vectors and graphs. We now proceed to the hybrid graph processing of parallel Breadth First Search. The hybrid version uses intra node multi-threading.Its called hybrid because it uses both distributed memory graph partitioning and shared memory traversal parallelism which can enable scalability to several thousands of cores.
The partitioning may simply be 1D before the level synchronization happens. The distance array is distributed among the processes. Every process maintains the status of the vertices it owns and it can utilize multi-threading to enumerate the adjacencies but only the owner process of a vertex determines if it is visited already or not. Therefore all the adjacencies of the vertices in the current frontier need to be sent to their corresponding owner. This happens in the all-to-all communication step. Most of the activity is data parallel. The only synchronization that happens is at the barriers. The only improvement over the serial level synchronous algorithm is the computation involved from distributed graph which is the preparing of messages and the all to all communication.
Let FS be the stack to store the vertices at the current level which represent the frontier
and NS be the stack to store the newly visited vertices one hop away from the current level.

the serial algorithm is:
After initialization and pushing the root on the stack,
while FS not empty:
for each u in FS {
for each neighbor v of u:
if v is not visited:
push v onto NS
set level of v
}
Set FS to NS, empty NS, increment level

The hybrid algorithm is:
After initialization, push the root on the stack after determining owner,
and For each of the the p processors, initialize send and receive buffers and thread local stack tbuf-ij for thread i at processor j
while FS not empty:
for each u in FS in parallel{
for each neighbor v of u do
pv = find_owner(v)
push v to tbuf-ipv
Thread barrier
for each of the processors
merge thread-local tbuf-ij in parallel and form send-buf-j
Thread barrier
Now perform all to all collective step with master:
send data in send-buf and aggregate newly visited vertices in recv-buf
Thread barrier
for each u in the recv-buf in parallel do
if d is infinity then
set level of u
push u to NS-i
Thread barrier
Perform parallel merge of NS-i to FS
Thread barrier
#detailed discussion in https://1drv.ms/w/s!Ashlm-Nw-wnWtkxOVeU-mbfydKxs

Thursday, May 24, 2018

We were discussing dimensionality reduction using both linear and non-linear transformations.
This technique eliminates noise by choosing the dimensions as salient features and lowers cost without significant loss of information. Given a data set X with n data points that needs to be reduced to d dimensions, a linear transformation proceeds by selecting V data set that have d dimensions corresponding to each other and matrix multiplying their transpose to each of the points in the X data set to get Y. Since the V has d dimensions the resulting linear transformations also have d dimensions. Non linear dimensionality reduction techniques may even learn an internal model within the data as in the case of manifold learning. In this case, a high dimensionality data set may be projected onto smaller dimension while trying to preserve the structure of inter-point distances from the high dimensional space in the lower dimension projection. It is called non-linear because the mapping cannot be represented as a linear combination of original variables.
Different set of dimensions for the reduced space results in different perspectives which yields different visualizations.
Interestingly graphs can also be reduced. It is called hypergraph coarsening which is an approximation of the original structure of the graph. Coarsening can be iterative. Succession of smaller hypergraphs tend to make incremental progress towards a coarser graph with less overall loss of information. There are several methods. Pairs of vertices that are similar can be merged. Skip edges may be overlayed on the same graph to represent the coarser graph. Centrality may be adjusted based on weights of the edges removed from vertices that are removed without loss of path or connectivity among the fewer vertices.
#book summary : https://1drv.ms/b/s!Ashlm-Nw-wnWtnZC3FVGlkh-m47E

Wednesday, May 23, 2018

Graph analysis is often compared with vector analysis. In the latter case,
dimensionality reduction is often used. Increase in dimensions may even have marginal benefits. On the other hand, the cost savings in using lesser dimensions is actually quite significant.
Dimensionality reduction can involve both linear and non-linear transformations.
This technique also eliminates noise by choosing the dimensions as salient features. Given a data set X with n data points that needs to be reduced to d dimensions, a linear transformation proceeds by selecting V data set that have d dimensions corresponding to each other and matrix multiplying their transpose to each of the points in the X data set to get Y. Since the V has d dimensions the resulting linear transformations also have d dimensions. Non linear dimensionality reduction techniques may even learn an internal model within the data as in the case of manifold learning. In this case, a high dimensionality data set may be projected onto smaller dimension while trying to preserve the structure of inter-point distances from the high dimensional space in the lower dimension projection. It is called non-linear because the mapping cannot be represented as a linear combination of original variables.
Different set of dimensions for the reduced space results in different perspectives which yields different visualizations.
Vectors have the advantage that they can participate in a change of point of reference which again can be used to improve visualization.
Eigen values and even vectors can be found for vectors which gives a totally different visualization and often simpler to view.
Dimensionality can also be reduced at multiple levels.
Interestingly graphs can also be reduced. It is called hypergraph coarsening which is an approximation of the original structure of the graph. Coarsening can be iterative. Succession of smaller hypergraphs tend to make incremental progress towards a coarser graph with less overall loss of information. There are several methods. Pairs of vertices that are similar can be merged. Skip edges may be overlayed on the same graph to represent the coarser graph. Centrality may be adjusted based on weights of the edges removed from vertices that are removed without loss of path or connectivity among the fewer vertices.
Thus graph analysis gives a similar form of visualization while dimensionality reduction and vector analysis can enable myriad forms of visualization.
Courtesy Saad from umn.edu

Tuesday, May 22, 2018

Incremental graph algorithms are different from parallel graph algorithms. The two are generally not combined because the approach is gather apply scatter at each vertex in the incremental algorithms while the approach in the latter is partitioning and communication overhead and sometimes including their hybrid variations.
Graph analysis is often compared with vector analysis. In the latter case,
dimensionality reduction is not necessarily a problem to solve. Increase in dimensions may even have marginal benefits. On the other hand, the cost savings in using lesser dimensions is actually quite significant.
This technique also eliminates noise by choosing the dimensions as salient features. This means they improve visualization
Vectors have the advantage that they can participate in a change of point of reference which again can be used to improve visualization.
Eigen values and even vectors can be found for vectors which gives a totally different visualization and often simpler to view.
Dimensionality can also be reduced at multiple levels.
Further-more, the vectors can participate directly in data mining algorithms solving much of the categorization needs that text analysis seems to depend on if neural nets and other AI techniques are not involved or if we rely on document vectors instead of word vectors.
In this regard, graphs may have marginal benefits. However, there is no questioning that bi-partite approach for semantic embedding is better visualized and represented by graphs. When graphs are visualized, they can be depicted with nodes having higher centrality at the core and those having lower centrality at the periphery.
Interestingly graphs can also be reduced. It is called hypergraph coarsening which is an approximation of the original structure of the graph. Coarsening can be iterative. Succession of smaller hypergraphs tend to make incremental progress towards a coarser graph with less overall loss of information. There are several methods. Pairs of vertices that are similar can be merged. Skip edges may be overlayed on the same graph to represent the coarser graph. Centrality may be adjusted based on weights of the edges removed from vertices that are removed without loss of path or connectivity among the fewer vertices.
Thus graph analysis gives a similar form of visualization while dimensionality reduction and vector analysis can enable myriad forms of visualization.

Monday, May 21, 2018

Today I wanted to take a brief break to discuss the relevancy of incremental and hybrid graph algorithms. BFS was just one instance. We don't use BFS in NLP processing of text. However, we do use graph algorithms.
Many of the documents we process for text analysis are actually very small content when compared to the corpus on which our model is trained. The graph algorithms we use in our models therefore operate on a large number of keywords from the corpus. The keywords in the document are going to be much smaller. However consider the case we are now no longer limited to the number of fixed dimensions for a word vector. In this case we can use all the edges for every keyword since we use persisted relationships. By making these same nlp graph algorithms as incremental, we now offer the ability to perform operations as updates occur. The store-and-static-compute model worked because updates were batched and then graph processing applied on static snapshots from different points in time. It worked so long as the graph modifications were less frequent than static processing. Moreover, nlp graph algorithms are known to be computationally expensive. By making these special case algorithms incremental and parallel, we expand the horizon of commerical practicality for many theoretical graph algorithms. This is a significant win not only for existing algorithms but also new ones we can come up with.
Graph algorithms in nlp has a tremendous history. Semantic word embeddings are popular in nlp graph algorithms. Bipartite graphs are yet another well-known graph usage in nlp. Graphs are directly applicable in syntactic and semantic relations. WordNet is a great example. It's a semantic network that is heavily used for word sense disambiguation, semantic similarity, question answering, and others. There have been efforts to expand or improve this ontology by connecting different domains such as in linkeddata.org. This uses the web to connect related data that wasn't previously linked. It connects data on the Semantic Web using URIs and RDF. Graphs have also come to be different from the traditional nodes and vertices with the introduction of heterogeneous graphs, graphs with multilayered edges etc.
#codingexercise:
Canonicalize arrays:
var items = { "1" : null, "2": 0, "3": "", "4":0, "5": 55, "6": 0, "7":0, "8" : 0, "9": 0 }
var keys = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10" ];
var present = [];
Object.entries(items).forEach(([key, value]) => {
present+= [key];
if (value === "" || value === null ){
items[key] = 0;
}
});
var missing = keys.filter( x => !present.includes(x));
missing.forEach (x => items[x] = 0);
document.write(JSON.stringify(items));

{"1":0,"2":0,"3":0,"4":0,"5":55,"6":0,"7":0,"8":0,"9":0,"10":0}

Sunday, May 20, 2018

We were discussing the parallel Breadth-First-Search using the 2D partitioning approach.
There were two approaches discussed:
The first approach was about partitioning
The second approach was about mitigating communication overhead
The second approach made use of
1) 2D decomposition and
2) parent update
In the second approach, the idea is that with each BFS iteration is equivalent to a sparse matrix -sparse vector multiplication (SpMSV). The matrix is the adjacency matrix which is decomposed equally into a number of processors. The vector is the frontier with integer variables. The algorithm does not store the previous frontiers explicitly as multiple sparse vectors. Instead it computes the breadth first spanning tree by returning a dense parents array.
The parent update is peculiar to this algorithm. The parents array is used in the element-wise with the result of SpMSV.
The parent is always chosen to be a vertex with the highest label.
The inner loop performs a single level traversal
All vectors and input matrix are distributed 2D wise. For example, f which is an initially sparse vector. Each n/p sized piece of the vector takes a progressive section of the adjacency matrix.
The 2D decomposition of the adjacent matrix means that the matrix is viewed as a grid of cells where each cell is of size Pr × Pc and assigned to a processor P (i, j)
The vector operations are also parallelized using loop-level parallelism.
In the 2D processor assignment, the subvectors are accumulated at all the processors on the jth column and each processor after performing its SpMSV scatters the corresponding piece of its intermediate vector to its owner along the ith processor row.
#codingexercise
Write a function to set the keys with invalid values to a default value in a javascript object:
var items = { "1" : null, "2": 0, "3": "", "4":0, "5": 55, "6": 0, "7":0, "8" : 0, "9": 0 }
Object.entries(items).forEach(([key, value]) => {
if (value === "" || value === null ){
items[key] = 0;
}

});
document.write(JSON.stringify(items));

{ "1" : 0, "2": 0, "3": 0, "4":0, "5": 55, "6": 0, "7":0, "8" : 0, "9": 0 }