Cluster computing

Monday, May 21, 2018

Today I wanted to take a brief break to discuss the relevancy of incremental and hybrid graph algorithms. BFS was just one instance. We don't use BFS in NLP processing of text. However, we do use graph algorithms.
Many of the documents we process for text analysis are actually very small content when compared to the corpus on which our model is trained. The graph algorithms we use in our models therefore operate on a large number of keywords from the corpus. The keywords in the document are going to be much smaller. However consider the case we are now no longer limited to the number of fixed dimensions for a word vector. In this case we can use all the edges for every keyword since we use persisted relationships. By making these same nlp graph algorithms as incremental, we now offer the ability to perform operations as updates occur. The store-and-static-compute model worked because updates were batched and then graph processing applied on static snapshots from different points in time. It worked so long as the graph modifications were less frequent than static processing. Moreover, nlp graph algorithms are known to be computationally expensive. By making these special case algorithms incremental and parallel, we expand the horizon of commerical practicality for many theoretical graph algorithms. This is a significant win not only for existing algorithms but also new ones we can come up with.
Graph algorithms in nlp has a tremendous history. Semantic word embeddings are popular in nlp graph algorithms. Bipartite graphs are yet another well-known graph usage in nlp. Graphs are directly applicable in syntactic and semantic relations. WordNet is a great example. It's a semantic network that is heavily used for word sense disambiguation, semantic similarity, question answering, and others. There have been efforts to expand or improve this ontology by connecting different domains such as in linkeddata.org. This uses the web to connect related data that wasn't previously linked. It connects data on the Semantic Web using URIs and RDF. Graphs have also come to be different from the traditional nodes and vertices with the introduction of heterogeneous graphs, graphs with multilayered edges etc.
#codingexercise:
Canonicalize arrays:
var items = { "1" : null, "2": 0, "3": "", "4":0, "5": 55, "6": 0, "7":0, "8" : 0, "9": 0 }
var keys = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10" ];
var present = [];
Object.entries(items).forEach(([key, value]) => {
present+= [key];
if (value === "" || value === null ){
items[key] = 0;
}
});
var missing = keys.filter( x => !present.includes(x));
missing.forEach (x => items[x] = 0);
document.write(JSON.stringify(items));

{"1":0,"2":0,"3":0,"4":0,"5":55,"6":0,"7":0,"8":0,"9":0,"10":0}

Cluster computing

Monday, May 21, 2018

No comments:

Post a Comment