Cluster computing

Tuesday, February 11, 2020

There are several examples available for stream store to be helpful in persisting logs, Metrics and Events. The message broker find a lot of use cases for these type of data and their integration with a stream store helps with this a lot.

Both message broker and stream store are increasingly becoming cloud technologies rather than on-premise. This enables their integration much more natural as the limits for storage, networking and compute are somewhat relaxed.

This also helps with increasingly maintenance free deployments of their components. The cloud technologies are suited for load balancing, scaling out, ingress control, pod health and proxies that not only adjust to demand but also come with all the benefits of infrastructure best practices

Monday, February 10, 2020

Yesterday we were discussing the use case of stream storage with message broker. We continue the discussion today.

As compute, network and storage are overlapping to expand the possibilities in each frontier at cloud scale, message passing has become a ubiquitous functionality. While libraries like protocol buffers and solutions like RabbitMQ are becoming popular, Flows and their queues are finding universal recognition in storage systems. Messages are also time-stamped and can be treated as events.A stream store is best suited for a sequential event storage.

Since Event storage overlays on Tier 2 storage on top of blocks, files, streams and blobs, it is already transferring data to those dedicated stores. The storage tier for the message broker with a stream storage system only brings in the storage engineering best practice.

The programmability of streams has a special appeal for the message processors. Runtimes like Apache Flink already supports user jobs which have rich APIs to work with unbounded and bounded data sets.

The message broker now has the ability to host a runtime that is even more performant and expressive than any of the others it supported.

The design for message brokers has certain features that stream storage can handle very well. For example, the MSMQ has a design which includes support for dead letter and poison letter queues, support for retries by the queue processor, a  non-invasive approach which lets the clients send and forget, ability to create an audit log, support for transactional as well as non-transactional messages, support for distributed transactions, support for clusters instead of standalone single machine deployments and ability to improve concurrency control. Stream storage provides excellent support for queues, logs, audit, events, along with transactional semantics.

This design of MSMQ is somewhat different from that of System.Messaging. The system.messaging library transparently exposes the underlying Message queuing windows APIs . For example, it provides GetPublicQueues method that enumerates the public message queues. It takes the message queue criteria as a parameter. This criterion can be specified with parameters such as category and label. It can also take machine name or cluster name, created and modified times as filter parameters. The GetPublicQueuesEnumerator is available to provide an enumerator to iterate over the results.

Though these designs are different, the stream store provides guarantees for durability, performance and storing consistency models that work with both.

Sunday, February 9, 2020

Yesterday we were discussing the use case of stream storage with message broker. We continue the discussion today.
A message broker can roll over all data eventually to persistence, so the choice of storage does not hamper the core functionality of the message broker.
A stream storage can be hosted on any Tier2 storage whether it is files or blobs. The choice of tier 2 storage does not hamper the functionality of the stream store. In fact, the append only unbounded data nature of messages in the queue is exactly what makes the stream store more appealing to these message brokers.
As compute, network and storage are overlapping to expand the possibilities in each frontier at cloud scale, message passing has become a ubiquitous functionality. While libraries like protocol buffers and solutions like RabbitMQ are becoming popular, Flows and their queues are finding universal recognition in storage systems. Messages are also time-stamped and can be treated as events.
A stream store is best suited for a sequential event storage.
Since Event storage overlays on Tier 2 storage on top of blocks, files, streams and blobs, it is already transferring data to those dedicated stores. The storage tier for the message broker with a stream storage system only brings in the storage engineering best practice.
The programmability of streams has a special appeal for the message processors. Runtimes like Apache Flink already supports user jobs which have rich APIs to work with unbounded and bounded data sets.

Saturday, February 8, 2020

The use case for a stream storage in a Message Broker:
A message broker is a software that finds universal appeal to relay messages from one or more publishers to subscribers. Some examples of the commercially well-known message brokers are RabbitMQ, ZeroMQ, SaltStack and others. These message brokers can be visualized as a set of logical queues supported by a distributed cluster that can scale out.
Traditionally, storage for message brokers have been local file systems. Lately, object storage or any form of cloud native queue storage have gained popularity. This article advocates the use of stream storage for message brokers.
Queue services have usually maintained ordered delivery of messages, retries, dead letter handling, along with the journaling. Incoming messages and caches have been mostly write-throughs which reach all the way to the disk. This made periodic asynchronous batching of writes possible with flush to disk being replaced with object storage which was better suited for this kind of a schedule. Object storage was also known to support Zookeeper like paths even with its tri-level Namespace, Bucket and Object hierarchy. S3 worked well even for the processors of the queue as intermediate web accessible storage.
The Queue service needs a cluster-based storage layer or a database server and have traditionally been long standing products in the marketplace. These products brought storage engineering best practice along with their features for geographical replication of objects, content distribution network and message passing algorithms such as Peer to Peer networking. As long as this queuing layer establishes sync between say a distributed or cluster file system and object storage with duplicity-tool like logic, it can roll over all data eventually to persistence, so the choice of storage does not hamper the core functionality of the message broker.
A stream storage can be hosted on any Tier2 storage whether it is files or blobs. The choice of tier 2 storage does not hamper the functionality of the stream store. In fact, the append only unbounded data nature of messages in the queue is exactly what makes the stream store more appealing to these message brokers.
Overall, there is a separation of concerns in storage and networking in each of these cluster-based products and although message brokers tend to position themselves as peer-to-peer networking over storage in local nodes, it is the exact opposite for the data stores. Since they are independent, the message broker together with a stream store can bring in alternate layers of networking and storage in an integrated and more powerful way that can make the message processors take advantage of stream-based programmability while preserving the exactly once and consistency models for the data.

Friday, February 7, 2020

We continue with our discussion on backup of Kubernetes resources

The chart described earlier provides the convenience of using helm with Kubernetes custom resources with either one resource encompassing all other required K8s resources at user namespace scope or use several discrete resources at the charts level while allowing transaction like behavior with create and delete at the overall chart level. The benefit is that resources are now grouped by user and automated for creation and deletion.

Backup and restore helps with data protection, disaster recovery and data migration across clusters. These are also routine activities performed on storage systems. In our discussion, the storage system comprises of hybrid components that may or may not have a backup and restore technique and if they do, their tools might vary.
The commands to do backup and restore have varied for storage systems. For example, filesystems are backed up with rsync or duplicity. Databases have their own backup and restore command. In some cases, the backup and restore may not be even needed for some data.
A virtualizer for translating a global backup and restore command will be helpful since it would know which data to backup and how. It provides a common entry point for triggering the individual backups. There are benefits to a common invocation point such as policy evaluation, monitoring and enforcement
A virtualizer also helps to work with the Kubernetes controller or external software.
The backup and restore works exclusively with persistence. It is also possible to sync between replicas
Most tools for backup now work with S3 storage. This allows web access from any source and to ant destination. Since object storage is considered limitless storage with durability and availability, this suits backup schedules very well. The backups to the web accessible can also be on a regular basis.
When the backups are on a regular basis, the web accessible storage can take incremental backups.
Tools that take backup from a cluster do not necessarily perform incremental backups. In such cases it is perfectly alright to take an incremental backup on local file system using rsync or duplicity and then uploading it to the final cloud storage destination.

Thursday, February 6, 2020

We continue with our discussion on backup of Kubernetes resources

The chart described now provides the convenience of using helm with Kubernetes custom resources with either one resource encompassing all other required K8s resources at user namespace scope or use several discrete resources at the charts level while allowing transaction like behavior with create and delete at the overall chart level. The benefit is that resources are now grouped by user and automated for creation and deletion.

T-shirt size deployment does not need to be a matter of scale. It can be hybrid as well selectively including components that do not need to be present in all categories. The statefulset can describe the replica and the components to include
Even when the pods are the same between the charts, they can be made to behave differently by authoring rules and policies. Even the same chart can be used to conditionally deploy different sizes and containers. The predetermined configuration helps with the proper match for workload requirements.
Backup and restore helps with data protection, disaster recovery and data migration across clusters. These are also routine activities performed on storage systems. In our discussion, the storage system comprises of hybrid components that may or may not have a backup and restore technique and if they do, their tools might vary.
The commands to do backup and restore have varied for storage systems. For example, filesystems are backed up with rsync or duplicity. Databases have their own backup and restore command. In some cases, the backup and restore may not be even needed for some data.
A virtualizer for translating a global backup and restore command will be helpful since it would know which data to backup and how. It provides a common entry point for triggering the individual backups. There are benefits to a common invocation point such as policy evaluation, monitoring and enforcement

Wednesday, February 5, 2020

We continue with our discussion on backup of Kubernetes resources

The chart described now provides the convenience of using helm with Kubernetes custom resources with either one resource encompassing all other required K8s resources at user namespace scope or use several discrete resources at the charts level while allowing transaction like behavior with create and delete at the overall chart level. The benefit is that resources are now grouped by user and automated for creation and deletion.
The groupings for the resources can be based on selectors. This makes the chart combine resources dynamically. Since the resource have annotations, selectors based on match can be efficiently used to group resources. The charts make it convenient to create and delete these groups of resources all at once.
The custom resource is not a dynamic selection of resources. It is a resource in itself. A custom resource may have definition to include other resources and it will make it easy to create and delete them with the help of a single resource.
The use of charts helps us define t-shirt size deployments as well. This comes from scaling out the pods to different capacity because they will allow the load to scale. Thus us an effective plan to handle workloads that vary against deployment.
T-shirt size deployment does not need to be a matter of scale. It can be hybrid as well selectively including components that do not need to be present in all categories. The statefulset can describe the replica and the components to include
Even when the pods are the same between the charts, they can be made to behave differently by authoring rules and policies. Even the same chart can be used to conditionally deploy different sizes and containers. The predetermined configuration helps with the proper match for workload requirements.