Cluster computing

Saturday, September 29, 2018

This article is in continuation of a previous post. We were referring to the design of message queues using object storage. Most message queues scale by virtue of the number of nodes in a cluster based deployment. Object Storage is accessible over S3 APIs to each of these nodes. The namespaces and buckets are organized according to the queues so that the messages may be looked up directly based on the object storage conventions. Since the storage takes care of all ingestion related concerns, the nodes merely have to utilize the S3 APIs to get and put the messages. In addition, we brought up the availability of indigenous queues to be used as a background processor in case the data does need to be sent deep into the object storage. This has at least two advantages. First, it is flexible for each queue to determine what it needs to do with the object. Second the scheduled saving of all messages into the object storage works well for the latter because it is continuous feed with very little read access.

This prompted us to separate this particular solution in its own layer which we called the cache layer so that the queues may work with the cache or with the object storage as required. The propagation of objects from cache to storage may proceed in the background. There are no mandates for the queues related to the cache to serve user workloads. They are entirely internal and specific to the system. Therefore the schedule and their operation can be set as per the system configuration.

The queues on the other hand have to implement one of the protocols from AMQP, STOMP or so on. Also, customers are likely to use the queues in one of the following ways each of which implies a different layout for the same instance and cluster size.

The queues may be mirrored across multiple nodes – This means we can use a cluster
The queues may be chained where one feeds into the other – This means we can use federation
The queues may be arbitrary depending on application needs – This means we build our own aka the shovel work

Consequently the queue layer can be designed independent of the cache and the object storage. While Queue services are available in the cloud and so are the one-stop—shop cloud databases, this kind of stack holds a lot of promise in the on-premise market.

While the implementation of the queue layer is open, we can call out what it should not be. The queues should not be implemented as micro-services. This fails the purpose of the message broker as a shared platform to alleviate the dependencies that the micro-services have in the first place. Also the Queues should not be collapsed into the database or the object storage unless there is runtime to process the messages and the programmability to store and execute logic. With these two extremes, the queue layer can be fashioned as an api gateway, switching fabric and anything that can handle retries, poison queue, dead letters and journaling. Transactional semantics are not the concern here since we are relying on versioning. Finally, the queues can use existing products such as ZeroMQ, RabbitMQ if they allow customizations for on-premise deployment of this stack.

Cluster computing

Saturday, September 29, 2018

No comments:

Post a Comment