Cluster computing

Monday, November 9, 2020

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Reporting stack is usually a pull and transformation operation on any database and is generally independent of the data manipulation from online transactions. Therefore, if a service can simplify its design by offloading reporting stack to say time-series database, grafana and charting stack, then it can focus on business-driven design.

The above is not necessarily true for analysis stacks which often produces a large number of artifacts during computations and as such are heavily engaged in the read-write on the same storage stack.

Sometimes performance drives the necessity to create other storage products. Social engineering utilizes storage products that are not typical to enterprise or cloud storage. This neither means that social engineering applications cannot be built on cloud services nor does it mean that the on-premise storage products necessarily have to conform to organizational hardware or virtualware needs.

To improve performance and scalability, Facebook had to introduce additional parallelization in the runtime and the shared contexts which they called "WorkerContext". Bottlenecks and overheads such as checkpointing were addressed by scheduling. This was a finer level than what the infrastructure provided.

Facebook even optimized the memory utilization of the graph infrastructure because it allowed arbitrary vertex id, vertex value, edge, and message classes. They did this by 1) serializing edges with a byte array and 2) serializing messages on the server.

Facebook improved parallelization with sharded aggregators that provided an efficient shared state across workers. With this approach, each aggregator gets assigned to a randomly picked worker which then gathers the values, performs the aggregation, and distributes the final values to the master and other workers. This distributes the load that was otherwise entirely on the master.

Many companies view graphs as an abstraction rather than an implementation of the underlying database. There are two reasons for this:
First, Key-value stores suffice to capture the same information in a graph and can provide flexibility and speed for operations that can be translated as queries on these stores. Then these can be specialized for the top graph features that an application needs.
Second, different organizations within the company require different stacks built on the same logical data for reasons such as business impact, modular design, and ease of maintenance.

Cluster computing

Monday, November 9, 2020

No comments:

Post a Comment