Cluster computing

Sunday, February 23, 2020

Serviceability of multi-repository product pipeline:

Software products are a snapshot of the source code that continuously accrues in a repository usually called ‘master’. Most features are baked completely before they are included for release in the master. This practice of constantly vetting the master for any updates and keeping it incremental for release is called continuous integration and continuous deployment.

This used to work well for source code that was all compiled together in the master and released with an executable. However, contemporary software engineering practice is making use of running code independently on lightweight hosts called containers with a popular form of deployment called Docker images. These images are built from the source code in their own repositories. The ‘master’ branch now becomes a repository that merely packages the build from each repository that contributes to the overall product.

This works well for v1.0 release of a product because each repository has its own master. When a version is released, it usually forks from the master as a release branch. Since code in each repository has contributed to the release, they all fork their own branches at the time when the packaging repository is forked. Creating a branch at the time of release on any repository allows that branch to accrue hot fixes independent of the updates made in the ‘master’ for the next release.

There is a way to avoid proliferation of release branches for every master of a repository. This has to do with the leverage of source control to go back in time. Since the source code version control continuously revisions each update to a repository, it can bring back the snapshot of the code up to a given version. Therefore, if they are no fixes required in a repository for a released version, a release branch for that repository need not be created.

It is hard to predict which repositories will or will not require their release branches to be forked, so organizations either create them all at once or play it by the ear on the case by case basis. When the number of repositories is a handful, either option makes no difference.

The number of repositories is also hard to predict between successive versions of a software product. With the use of open source and modular organization of code, the packaging and storing of code in repositories has multiplied. Some can even number in hundreds or have several levels of branches representing organizational hierarchy before the code reaches master. The minimum requirement for any released product is that all of the source code pertaining to the release must be snapshot.

Complicating this practice, a fix may now require updates to multiple repositories. This fix will require more effort to port between master and release branches for their respective repositories. Instead, it might be easier to flatten out all repositories into folders under the same root in a new repository. This technique allows individual components to be build and released from sub-folders rather than their own repository and gives a holistic approach to the entire released product.

This gets done only once for every release and centralizes the sustained engineering management by transformation of the code rather than proliferation of branches.

Saturday, February 22, 2020

Friday, February 21, 2020

Thursday, February 20, 2020

We were discussing the use case of stream storage with message broker. We continue the discussion today.

A message broker can roll over all data eventually to persistence, so the choice of storage does not hamper the core functionality of the message broker.

A stream storage can be hosted on any Tier2 storage whether it is files or blobs. The choice of tier 2 storage does not hamper the functionality of the stream store. In fact, the append only unbounded data nature of messages in the queue is exactly what makes the stream store more appealing to these message brokers

While message broker lookup queues by address, a gateway service can help with translation of address. In this way, a gateway can significantly expand the distribution of queues.

First, the address mapping is not at site level. It is at queue level.  

Second, the address of the queue – both universal as well as site specific are maintained along with the queue as part of its location information 

Third, instead of internalizing a table of rules from the external gateway, a lookup service can translate universal queue address to the address of the nearest queue. This service is part of the message broker as a read only query. Since queue name and address is already an existing functionality, we only add the ability to translate universal address to site specific address at the queue level.  

Fourth, the gateway functionality exists as a microservice.  It can do more than static lookup of physical location of an queue given a universal address instead of the site-specific address. It has the ability to generate tiny urls for the queues based on hashing.  This adds aliases to the address as opposed to the conventional domain-based address.  The hashing is at the queue level and since we can store billions of queues in the queue storage, a url shortening feature is a significant offering from the gateway service within the queue storage. It has the potential to morph into other services than a mere translator of queue addresses. Design of a url hashing service was covered earlier as follows. 

Fifth, the conventional gateway functionality of load balancing can also be handled with an elastic scale-out of just the gateway service within the message broker.  

Sixth, this gateway can also improve access to the queue by making more copies of the queue elsewhere and adding the superfluous mapping for the duration of the traffic. It need not even interpret the originating ip addresses to determine the volume as long as it can keep track of the number of read requests against existing address of the same queue.  

These advantages can improve the usability of the message brokers and their queues by providing as many as needed along with a scalable service that can translate incoming universal address of queues to site specific location information. 

Wednesday, February 19, 2020

We were discussing the use case of stream storage with message broker. We continue the discussion today.

A message broker can roll over all data eventually to persistence, so the choice of storage does not hamper the core functionality of the message broker.

A stream storage can be hosted on any Tier2 storage whether it is files or blobs. The choice of tier 2 storage does not hamper the functionality of the stream store. In fact, the append only unbounded data nature of messages in the queue is exactly what makes the stream store more appealing to these message brokers

A message broker is supposed to assign the messages If it sends it to the same single point of contention, it is not very useful When message brokers are used with cache, the performance generally improves over what might have been incurred in going to the backend to store and retrieve That is why different processors behind a message broker could maintain their own cache. A dedicated cache service like AppFabric may also be sufficient to distribute incoming messages. In this case, we are consolidating processor caches with a dedicated cache. This does not necessarily mean a single point of contention. Shared cache may offer at par service level agreement as an individual cache for messages. Since a message broker will not see a performance degradation when sending to a processor or a shared dedicated cache, it works in both these cases. Replacing a shared dedicated cache with a shared dedicated storage such as a stream store is therefore also a practical option.
Some experts argued that message broker is practical only for small and medium businesses which are small scale in requirements. This means that they are stretched on large scale and stream storage deployments which are not necessarily restricted in size but message brokers are also a true cloud storage. These experts point out that stream storage is useful for continuous workload. Tools like duplicity use S3 API to persist in object storage and in this case, message brokers can also perform backup and archiving for their journaling. These workflows do not require modifications of data objects and this makes stream storage perfect for them. These experts argued that the problem with message broker is that it adds more complexity and limits performance. It is not used with online transaction processing which are more read and write intensive and do not tolerate latency.

Tuesday, February 18, 2020

Monday, February 17, 2020

We were discussing the use case of stream storage with message broker. We continue the discussion today.

A message broker can roll over all data eventually to persistence, so the choice of storage does not hamper the core functionality of the message broker.

A stream storage can be hosted on any Tier2 storage whether it is files or blobs. The choice of tier 2 storage does not hamper the functionality of the stream store. In fact, the append only unbounded data nature of messages in the queue is exactly what makes the stream store more appealing to these message brokers

Chained message brokers
This answers the question that if the mb is a separate instance, can message broker be chained across object storage. Along the lines of the previous question, if the current message broker does not resolve the queue for a message located in its queues, is it possible to distribute the query to another message broker. These kinds of questions imply that the resolver merely needs to forward the queries that it cannot answer to a default pre-registered outbound destination. In a chained message broker, the queries can make sense simply out of the naming convention and say if a request belongs to it or not. If it does not, it simply forwards it to another message broker. This is somewhat different from the original notion that the address is something opaque to the user and does not have any interpretable part that can determine the site to which the object belongs. The linked message broker does not even need to take time to local instance is indeed the correct recipient. It can merely translate the address to know if it belongs to it with the help of a registry. This shallow lookup means a request can be forwarded faster to another linked message broker and ultimately to where it may be guaranteed to be found. The Linked message broker has no criteria for the message broker to be similar and as long as the forwarding logic is enabled, any implementation can exist in each of the message broker for translation, lookup and return. This could have been completely mitigated if the opaque addresses were hashes and the destination message broker was determined based on a hash table. Whether we use routing tables or a static hash table, the networking over the chained message brokers can be its own layer facilitating routing of messages to correct message broker.