Cluster computing

Thursday, May 10, 2018

We were discussing incremental graph transformations:
One way to do this was to use tables for transformation rules. A graph transformation rule consists of a (LHS, RHS, NAC) where

LHS: left hand side graph

RHS: right hand side graph

NAC: Negative application condition

The matching results from LHS to RHS where NAC prohibits the presence of certain objects and links. By defining the locations in the graph for the application of the rules, the tables itself go through very little changes. However, storing transformation rules with the help of metamodels and instance models requires a lot of preprocessing.

The notion of using a database to store the updates for the graph is a very powerful technique. The above strategy is very involved in that it requires a lot of pre-processing with the hope that subsequent processing is going to be incremental and cheap. This is in fact the case for existing large graphs. However, the ability to reconstruct graphs is equally feasible just like we reconstruct even large master databases such as product catalogs.

Consider a table of updates in the form of each node and edge to add or delete in the form of tuple<Type, operation, Src, Destination, Value, status, created, modified> where type = Node or edge, operation = add or remove, src or destinations are node ids for an edge and null otherwise and Value is the node data for node additions. The status indicates the set of progressive operations associated with this entry such as initialized->in progress->completed/error and the timestamps indicate the positions on the timeline for these entries as they arrive or are acted upon.

Each instance of the tuple is a corresponding node addition or removal and is atomic and independent of another. The sequence of operations in the list of tuples is the order in which the graph was updated. When the operations are journaled, it gives the replay log for reconstructing the graph and a subset of this table gives the incremental changes between two states.

In large social engineering graphs, there are updates of about 86,400 operations per second. Even if we keep the journal in a cloud database where there is no restriction to storage, fetching and updating states on the graph may take a long while. In such case, we can take snapshots of the graph and pin those against timestamps for consistency. Next, we separate the snapshot, replay and consistency checks as offline activities.

Cluster computing

Thursday, May 10, 2018

No comments:

Post a Comment