Cluster computing

Sunday, February 9, 2020

Yesterday we were discussing the use case of stream storage with message broker. We continue the discussion today.
A message broker can roll over all data eventually to persistence, so the choice of storage does not hamper the core functionality of the message broker.
A stream storage can be hosted on any Tier2 storage whether it is files or blobs. The choice of tier 2 storage does not hamper the functionality of the stream store. In fact, the append only unbounded data nature of messages in the queue is exactly what makes the stream store more appealing to these message brokers.
As compute, network and storage are overlapping to expand the possibilities in each frontier at cloud scale, message passing has become a ubiquitous functionality. While libraries like protocol buffers and solutions like RabbitMQ are becoming popular, Flows and their queues are finding universal recognition in storage systems. Messages are also time-stamped and can be treated as events.
A stream store is best suited for a sequential event storage.
Since Event storage overlays on Tier 2 storage on top of blocks, files, streams and blobs, it is already transferring data to those dedicated stores. The storage tier for the message broker with a stream storage system only brings in the storage engineering best practice.
The programmability of streams has a special appeal for the message processors. Runtimes like Apache Flink already supports user jobs which have rich APIs to work with unbounded and bounded data sets.

Saturday, February 8, 2020

The use case for a stream storage in a Message Broker:
A message broker is a software that finds universal appeal to relay messages from one or more publishers to subscribers. Some examples of the commercially well-known message brokers are RabbitMQ, ZeroMQ, SaltStack and others. These message brokers can be visualized as a set of logical queues supported by a distributed cluster that can scale out.
Traditionally, storage for message brokers have been local file systems. Lately, object storage or any form of cloud native queue storage have gained popularity. This article advocates the use of stream storage for message brokers.
Queue services have usually maintained ordered delivery of messages, retries, dead letter handling, along with the journaling. Incoming messages and caches have been mostly write-throughs which reach all the way to the disk. This made periodic asynchronous batching of writes possible with flush to disk being replaced with object storage which was better suited for this kind of a schedule. Object storage was also known to support Zookeeper like paths even with its tri-level Namespace, Bucket and Object hierarchy. S3 worked well even for the processors of the queue as intermediate web accessible storage.
The Queue service needs a cluster-based storage layer or a database server and have traditionally been long standing products in the marketplace. These products brought storage engineering best practice along with their features for geographical replication of objects, content distribution network and message passing algorithms such as Peer to Peer networking. As long as this queuing layer establishes sync between say a distributed or cluster file system and object storage with duplicity-tool like logic, it can roll over all data eventually to persistence, so the choice of storage does not hamper the core functionality of the message broker.
A stream storage can be hosted on any Tier2 storage whether it is files or blobs. The choice of tier 2 storage does not hamper the functionality of the stream store. In fact, the append only unbounded data nature of messages in the queue is exactly what makes the stream store more appealing to these message brokers.
Overall, there is a separation of concerns in storage and networking in each of these cluster-based products and although message brokers tend to position themselves as peer-to-peer networking over storage in local nodes, it is the exact opposite for the data stores. Since they are independent, the message broker together with a stream store can bring in alternate layers of networking and storage in an integrated and more powerful way that can make the message processors take advantage of stream-based programmability while preserving the exactly once and consistency models for the data.

Friday, February 7, 2020

We continue with our discussion on backup of Kubernetes resources

The chart described earlier provides the convenience of using helm with Kubernetes custom resources with either one resource encompassing all other required K8s resources at user namespace scope or use several discrete resources at the charts level while allowing transaction like behavior with create and delete at the overall chart level. The benefit is that resources are now grouped by user and automated for creation and deletion.

Backup and restore helps with data protection, disaster recovery and data migration across clusters. These are also routine activities performed on storage systems. In our discussion, the storage system comprises of hybrid components that may or may not have a backup and restore technique and if they do, their tools might vary.
The commands to do backup and restore have varied for storage systems. For example, filesystems are backed up with rsync or duplicity. Databases have their own backup and restore command. In some cases, the backup and restore may not be even needed for some data.
A virtualizer for translating a global backup and restore command will be helpful since it would know which data to backup and how. It provides a common entry point for triggering the individual backups. There are benefits to a common invocation point such as policy evaluation, monitoring and enforcement
A virtualizer also helps to work with the Kubernetes controller or external software.
The backup and restore works exclusively with persistence. It is also possible to sync between replicas
Most tools for backup now work with S3 storage. This allows web access from any source and to ant destination. Since object storage is considered limitless storage with durability and availability, this suits backup schedules very well. The backups to the web accessible can also be on a regular basis.
When the backups are on a regular basis, the web accessible storage can take incremental backups.
Tools that take backup from a cluster do not necessarily perform incremental backups. In such cases it is perfectly alright to take an incremental backup on local file system using rsync or duplicity and then uploading it to the final cloud storage destination.

Thursday, February 6, 2020

We continue with our discussion on backup of Kubernetes resources

The chart described now provides the convenience of using helm with Kubernetes custom resources with either one resource encompassing all other required K8s resources at user namespace scope or use several discrete resources at the charts level while allowing transaction like behavior with create and delete at the overall chart level. The benefit is that resources are now grouped by user and automated for creation and deletion.

T-shirt size deployment does not need to be a matter of scale. It can be hybrid as well selectively including components that do not need to be present in all categories. The statefulset can describe the replica and the components to include
Even when the pods are the same between the charts, they can be made to behave differently by authoring rules and policies. Even the same chart can be used to conditionally deploy different sizes and containers. The predetermined configuration helps with the proper match for workload requirements.
Backup and restore helps with data protection, disaster recovery and data migration across clusters. These are also routine activities performed on storage systems. In our discussion, the storage system comprises of hybrid components that may or may not have a backup and restore technique and if they do, their tools might vary.
The commands to do backup and restore have varied for storage systems. For example, filesystems are backed up with rsync or duplicity. Databases have their own backup and restore command. In some cases, the backup and restore may not be even needed for some data.
A virtualizer for translating a global backup and restore command will be helpful since it would know which data to backup and how. It provides a common entry point for triggering the individual backups. There are benefits to a common invocation point such as policy evaluation, monitoring and enforcement

Wednesday, February 5, 2020

We continue with our discussion on backup of Kubernetes resources

The chart described now provides the convenience of using helm with Kubernetes custom resources with either one resource encompassing all other required K8s resources at user namespace scope or use several discrete resources at the charts level while allowing transaction like behavior with create and delete at the overall chart level. The benefit is that resources are now grouped by user and automated for creation and deletion.
The groupings for the resources can be based on selectors. This makes the chart combine resources dynamically. Since the resource have annotations, selectors based on match can be efficiently used to group resources. The charts make it convenient to create and delete these groups of resources all at once.
The custom resource is not a dynamic selection of resources. It is a resource in itself. A custom resource may have definition to include other resources and it will make it easy to create and delete them with the help of a single resource.
The use of charts helps us define t-shirt size deployments as well. This comes from scaling out the pods to different capacity because they will allow the load to scale. Thus us an effective plan to handle workloads that vary against deployment.
T-shirt size deployment does not need to be a matter of scale. It can be hybrid as well selectively including components that do not need to be present in all categories. The statefulset can describe the replica and the components to include
Even when the pods are the same between the charts, they can be made to behave differently by authoring rules and policies. Even the same chart can be used to conditionally deploy different sizes and containers. The predetermined configuration helps with the proper match for workload requirements.

Tuesday, February 4, 2020

We continue with our discussion on backup of Kubernetes resources

is also a difference in the results of the Velero tool versus custom configuration generated using the scripts above. For example, there is no knowledge of the product and the logic pertaining to the reconcilation of the operator states in built into the output of the tool. The custom configuration on the other hand, leverages the product specific knowledge to make the export and import of user resources all the more efficient, streamlined and conformant with the product.

The above chart now provides the convenience of using helm with Kubernetes custom resources with either one resource encompassing all other required K8s resources at user namespace scope or use several discrete resources at the charts level while allowing transaction like behavior with create and delete at the overall chart level. The benefit is that resources are now grouped by user and automated for creation and deletion.
The groupings for the resources can be based on selectors. This makes the chart combine resources dynamically. Since the resource have annotations, selectors based on match can be efficiently used to group resources. The charts make it convenient to create and delete these groups of resources all at once.
The custom resource is not a dynamic selection of resources. It is a resource in itself. A custom resource may have definition to include other resources and it will make it easy to create and delete them with the help of a single resource.

Monday, February 3, 2020

The Velero tool is designed to take backups from cluster. It requires S3 storage which comes with a cloud provider such as AWS.
The install command is
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.0.0 \
--bucket velerobucket \
--backup-location-config region=us-east-2 \
--snapshot-location-config region=us-east-2 \
--secret-file /root/aws-iam-creds-csv-local \
--log_dir /tmp/velero
And the server part can be created with helm charts. However, the backups were yet to be created so I do not have that handy.

On the other hand, I have created scripts and charts to make it easy to create and delete K8s resources.
This chart now provides the convenience of using helm with Kubernetes custom resources with either one resource encompassing all other required K8s resources at user namespace scope or use several discrete resources at the charts level while allowing transaction like behavior with create and delete at the overall chart level. The benefit is that resources are now grouped by user and automated for creation and deletion.