Cluster computing

Thursday, March 5, 2020

Storage products become popular as a sink as well as a staging in the data pipeline. The platform for a storage product facilitates all aspects of the data manageability both at rest as well as in transit. Yet there is no tagging or labeling of data that allows the storage product to hand out data corresponding to the user of the overall system involving the product.

The only way to handle or associate data from a particular user is with the upstream system that recognizes the user. Since Kubernetes provides a way to recognize the user and the actions for the create, update or delete of resources, it is well-positioned to handle this segregation of data for packing and unpacking purposes.

Data does not always come from users. It can be exchanged between upstream and downstream systems.

Object storage is a limitless storage in terms of capacity. Stream storage is limitless in continuous storage. The two can send data to each other in a mutually beneficial manner. Stream processing can help with analytics while object storage can help with storage engineering best practice. There have their own advantages and can transmit data between themselves for their respective benefits.

A cache can help improve performance of data in transit. The techniques for providing stream segments do not matter to the client and the cache can use any algorithm. The cache also provides the benefits of alleviating load from the stream store without any additional constraints. In fact the cache will also use the same stream reader as the client and with the only difference that there will be fewer stream readers on the stream store than before.

We have not compared this cache layer with a message queue server but there are interesting problems common to both. For example, we have a multiple consumer single producer pattern in the periodic reads from the stream storage. The message queue server or broker enables this kind of publisher-subscriber pattern with retries and dead letter queue. In addition, it journals the messages for review later. This leaves the interaction between the caches and the storage to be handled elegantly with well-known messaging framework. The message broker inherently comes with a scheduler to perform repeated tasks across publishers. Hence it is easy for the message queue server to perform as an orchestrator between the cache and the storage, leaving the cache to focus exclusively on the cache strategy suitable to the workloads. Journaling of messages also helps with diagnosis and replay and probably there is no better store for these messages than the object storage itself. Since the broker operates in a cluster mode, it can scale to as many caches as available. Moreover, the journaling is not necessarily available with all the messaging protocols which counts as one of the advantages of using a message broker. Aside from the queues dedicated to handle the backup of objects from cache to storage, the message broker is also uniquely positioned to provide differentiated treatment to the queues. This introduction of quality of service levels expands the ability of the solution to meet varying and extreme workloads The message queue server is not only a nice to have feature but also a necessity and a convenience when we have a distributed cache to work with the stream storage.

Cluster computing

Thursday, March 5, 2020

No comments:

Post a Comment