Cluster computing

Monday, May 7, 2018

We were discussing the differences between full-text search and text analysis in modern day cloud databases.

One application of this is in the use of a code visualization tool.

Software engineering results in a high value asset called source code which are usually millions and millions of lines of instructions that computers can understand. While Software Engineers make a lot of effort in organizing and making sense of this code, usually this interpretation lives as tribal knowledge specific to the teams associated with chunks of these codes. As engineers increase in number and rotate, this knowledge is quick to disappear and takes a lot of effort as onboarding. Manifestation of this knowledge is surprisingly hard, because it results in often and large complicated picture that is hard to understand, comprehend and remember. There have been attempts with documentation tools like function call graph, dependency graph and documentation generators that have tried different angles at this problem space but their effectiveness to engineers fall way short of simple architecture diagrams or data and control flow diagrams from architects.

With the help of an index and a graph, we have all the necessary data structures to write a code visualization tool.

The key challenge in a code visualization tool is keeping the inference in sync with code. While periodic evaluation and global updates to inference may be feasibility, we do better by persisting all relationships in a graph database. Moreover, if we apply incremental consistent updates based on code change triggers, the number of writes becomes less when compared to local updates. This also brings us to the scope of visualization. Tools like CodeMap allow us to visualize based on the current context in the code. The end result is a single picture. However, we reserve the display options to neither be limited to context nor to the queries for rendering. Instead we allow dynamic visualization depending on the queries and overlays involved. For large graphs, determining incremental updates is an interesting problem space with examples in different domains such as software defined networking etc. If we take an approach that all graph transformations can be represented by metadata in the form of rules, we can keep track of these rules and ensure that they are incremental.
By keeping the rules in a table, we can make sure the updates are incremental. The table keeps track of all the matching rules which makes pattern matching fast. The updates to the table require very little changes because most of the modifications are local to the graphs and the rules mention the locations.
#codingexercise https://ideone.com/yqnBiR

Sample application: http://52.191.138.87:8668/upload/ for direct access to text summarization

Cluster computing

Monday, May 7, 2018

No comments:

Post a Comment