Cluster computing

Friday, February 14, 2020

Public cloud technologies can be used to improve our products and their operations even if they are deployed on-premise. Public clouds have become popular for their programmability which endear them to developers while providing unparalleled benefits such as no maintenance, lower total cost of ownership, global reachability and availability for all on-premise instances, unified and seamless operations monitoring via system center tools and improved performance via almost limitless scalability.

The public cloud acts like a global sponge to collect data across on-premise deployments that serve to give insights into the operations of those deployments. Take the use case of collecting metrics such as total size and object count from any customer deployment. Metrics like these are lightweight and have traditionally been passed into special purpose on-premise storage stacks which again pose the challenge that most on-premise deployments are dark for monitoring remotely. On the other hand, the proposal here is that if these choice metrics were to fly up to the public clouds like pixie dusts from each of the customer deployments, then they can improve visibility into the operations of all customer deployments while expanding the horizon of monitoring and call home services. A service hosted on the public cloud becomes better suited to assuming the role of a network operations center to receive callback-home alerts, messages, events and metrics. This is made possible with the rich development framework available in the public cloud which allows rapid development of capabilities in the area of operations monitoring, product support and remote assistance to customers for all deployments of software products and appliances. It also lower costs for these activities. Metrics is just one example of this initiative to move such developmental work to the public cloud. All services pertaining to the operations of customer deployments of our products are candidates for being built and operated as tenants of the public cloud.

The proposals of this document include the following: 1) improving visibility of on-premise appliance deployments to services in the public cloud, 2) Using the public cloud features to improve manageability of all deployed instances, 3) Using a consolidated reporting framework via services built in the cloud that map all the deployed instances for their drill down analytics and 4) using the public cloud as a traffic aggregator for all capabilities added to the on-premise storage now and in the future.

Specifically, these points were explained in the context of knowing the size and count of objects from each deployed instance. Previously, there was no way of pulling this information from customer deployments by default. Instead the suggestion was that we leverage the on-premise instance to publish these metrics to a service hosted in the public cloud. This service seemed to be available already in both the major public clouds. The presentation demonstrated Application Programming Interface aka API calls made against each public cloud to put and get the total size and count of objects.

These API calls were then shown as introduction to several powerful features available from the public cloud for value additions to the manageability of deployed instances. The latter part of the presentation focused on the involvement of public cloud for the monitoring and operations of geographically distributed on-premise customer deployments.

As a mention of a new capabilities made possible with the integration of on-premise deployment with manageability using public cloud, it was shown that when this information is collected via the API calls made, they become easy to report with drill downs on groups of deployments or aggregation of individual deployments.

Finally, the public cloud was suggested to be used as a traffic aggregator aka sponge for subscribing to any light-weight data not just the size and count described in this presentation and thus become useful in alleviating publishing-subscribing semantics from on-premise deployments. Together, these proposals are a few of the integration possibilities of on-premise deployments with capabilities hosted in the public cloud.

#codingexercise

Determine the max size of a resource from allresources

Int getMaxResourceLength(List<String > input) {
return input.stream().map(x -> getSize(x))
.max(Comparator.comparing(Integer::valueOf))
.get();

}

Thursday, February 13, 2020

Both message broker and stream store are increasingly becoming cloud technologies rather than on-premise. This enables their integration much more natural as the limits for storage, networking and compute are somewhat relaxed.

This also helps with increasingly maintenance free deployments of their components. The cloud technologies are suited for load balancing, scaling out, ingress control, pod health and proxies that not only adjust to demand but also come with all the benefits of infrastructure best practices

The networking stacks in the host computers have maintained a host centric send and receive functionality for over three decades. Along with the services for file transfers, remote invocations, peer-to-peer networking and packet capture, the networking in the host has supported sweeping changes such as web programming, enterprise computing, IoT, social media applications and cloud computing. The standard established for this networking across vendors was the Open Systems Interconnection model. The seven layers of this model strived to encompass all but the application logic. However, the networking needs from these emerging trends never fully made its feedback to the host networking stack. This article presents the case where a message broker inherently belongs to a layer in the networking stack. Message Brokers have also been deployed as a standalone application on the host supporting Advanced Message Queuing Protocol. Some message brokers have demonstrated extreme performance that has facilitated the traffic on social media.

The message broker supports an observer pattern that allows interested observers to listen from outside the host. When the data unit is no longer packets but messages, it facilitates data transfers in units that are more readable. The packet capture used proxies as a man in the middle which provides little or no disruption to ongoing traffic while facilitating the storage and analysis of packets for the future. Message subscription makes it easier for collection and verification across hosts.

The notion of data capture now changes to the following model:

Network Data subscriber

Message Broker

Stream storage

Tier 2 storage

This facilitates immense analytics directly over the time series database. Separation of analysis stacks implies that the charts and graphs can now be written with any product and on any host.

This way a message broker becomes a networking citizen in the cloud.

Wednesday, February 12, 2020

Both message broker and stream store are increasingly becoming cloud technologies rather than on-premise. This enables their integration much more natural as the limits for storage, networking and compute are somewhat relaxed.
This also helps with increasingly maintenance free deployments of their components. The cloud technologies are suited for load balancing, scaling out, ingress control, pod health and proxies that not only adjust to demand but also come with all the benefits of infrastructure best practices
The networking stacks in the host computers have maintained a host centric send and receive functionality for over three decades. Along with the services for file transfers, remote invocations, peer-to-peer networking and packet capture, the networking in the host has supported sweeping changes such as web programming, enterprise computing, IoT, social media applications and cloud computing. The standard established for this networking across vendors was the Open Systems Interconnection model. The seven layers of this model strived to encompass all but the application logic. However, the networking needs from these emerging trends never fully made its feedback to the host networking stack. This article presents the case where a message broker inherently belongs to a layer in the networking stack. Message Brokers have also been deployed as a standalone application on the host supporting Advanced Message Queuing Protocol. Some message brokers have demonstrated extreme performance that has facilitated the traffic on social media.
The message broker supports an observer pattern that allows interested observers to listen from outside the host. When the data unit is no longer packets but messages, it facilitates data transfers in units that are more readable. The packet capture used proxies as a man in the middle which provides little or no disruption to ongoing traffic while facilitating the storage and analysis of packets for the future. Message subscription makes it easier for collection and verification across hosts.
The notion of data capture now changes to the following model:
Network Data subscriber
Message Broker
Stream storage
Tier 2 storage

This facilitates immense analytics directly over the time series database. Separation of analysis stacks implies that the charts and graphs can now be written with any product and on any host.
This way a message broker becomes a networking citizen in the cloud.

Tuesday, February 11, 2020

There are several examples available for stream store to be helpful in persisting logs, Metrics and Events. The message broker find a lot of use cases for these type of data and their integration with a stream store helps with this a lot.

Both message broker and stream store are increasingly becoming cloud technologies rather than on-premise. This enables their integration much more natural as the limits for storage, networking and compute are somewhat relaxed.

This also helps with increasingly maintenance free deployments of their components. The cloud technologies are suited for load balancing, scaling out, ingress control, pod health and proxies that not only adjust to demand but also come with all the benefits of infrastructure best practices

Monday, February 10, 2020

Yesterday we were discussing the use case of stream storage with message broker. We continue the discussion today.

As compute, network and storage are overlapping to expand the possibilities in each frontier at cloud scale, message passing has become a ubiquitous functionality. While libraries like protocol buffers and solutions like RabbitMQ are becoming popular, Flows and their queues are finding universal recognition in storage systems. Messages are also time-stamped and can be treated as events.A stream store is best suited for a sequential event storage.

Since Event storage overlays on Tier 2 storage on top of blocks, files, streams and blobs, it is already transferring data to those dedicated stores. The storage tier for the message broker with a stream storage system only brings in the storage engineering best practice.

The programmability of streams has a special appeal for the message processors. Runtimes like Apache Flink already supports user jobs which have rich APIs to work with unbounded and bounded data sets.

The message broker now has the ability to host a runtime that is even more performant and expressive than any of the others it supported.

The design for message brokers has certain features that stream storage can handle very well. For example, the MSMQ has a design which includes support for dead letter and poison letter queues, support for retries by the queue processor, a  non-invasive approach which lets the clients send and forget, ability to create an audit log, support for transactional as well as non-transactional messages, support for distributed transactions, support for clusters instead of standalone single machine deployments and ability to improve concurrency control. Stream storage provides excellent support for queues, logs, audit, events, along with transactional semantics.

This design of MSMQ is somewhat different from that of System.Messaging. The system.messaging library transparently exposes the underlying Message queuing windows APIs . For example, it provides GetPublicQueues method that enumerates the public message queues. It takes the message queue criteria as a parameter. This criterion can be specified with parameters such as category and label. It can also take machine name or cluster name, created and modified times as filter parameters. The GetPublicQueuesEnumerator is available to provide an enumerator to iterate over the results.

Though these designs are different, the stream store provides guarantees for durability, performance and storing consistency models that work with both.

Sunday, February 9, 2020

Yesterday we were discussing the use case of stream storage with message broker. We continue the discussion today.
A message broker can roll over all data eventually to persistence, so the choice of storage does not hamper the core functionality of the message broker.
A stream storage can be hosted on any Tier2 storage whether it is files or blobs. The choice of tier 2 storage does not hamper the functionality of the stream store. In fact, the append only unbounded data nature of messages in the queue is exactly what makes the stream store more appealing to these message brokers.
As compute, network and storage are overlapping to expand the possibilities in each frontier at cloud scale, message passing has become a ubiquitous functionality. While libraries like protocol buffers and solutions like RabbitMQ are becoming popular, Flows and their queues are finding universal recognition in storage systems. Messages are also time-stamped and can be treated as events.
A stream store is best suited for a sequential event storage.
Since Event storage overlays on Tier 2 storage on top of blocks, files, streams and blobs, it is already transferring data to those dedicated stores. The storage tier for the message broker with a stream storage system only brings in the storage engineering best practice.
The programmability of streams has a special appeal for the message processors. Runtimes like Apache Flink already supports user jobs which have rich APIs to work with unbounded and bounded data sets.

Saturday, February 8, 2020

The use case for a stream storage in a Message Broker:
A message broker is a software that finds universal appeal to relay messages from one or more publishers to subscribers. Some examples of the commercially well-known message brokers are RabbitMQ, ZeroMQ, SaltStack and others. These message brokers can be visualized as a set of logical queues supported by a distributed cluster that can scale out.
Traditionally, storage for message brokers have been local file systems. Lately, object storage or any form of cloud native queue storage have gained popularity. This article advocates the use of stream storage for message brokers.
Queue services have usually maintained ordered delivery of messages, retries, dead letter handling, along with the journaling. Incoming messages and caches have been mostly write-throughs which reach all the way to the disk. This made periodic asynchronous batching of writes possible with flush to disk being replaced with object storage which was better suited for this kind of a schedule. Object storage was also known to support Zookeeper like paths even with its tri-level Namespace, Bucket and Object hierarchy. S3 worked well even for the processors of the queue as intermediate web accessible storage.
The Queue service needs a cluster-based storage layer or a database server and have traditionally been long standing products in the marketplace. These products brought storage engineering best practice along with their features for geographical replication of objects, content distribution network and message passing algorithms such as Peer to Peer networking. As long as this queuing layer establishes sync between say a distributed or cluster file system and object storage with duplicity-tool like logic, it can roll over all data eventually to persistence, so the choice of storage does not hamper the core functionality of the message broker.
A stream storage can be hosted on any Tier2 storage whether it is files or blobs. The choice of tier 2 storage does not hamper the functionality of the stream store. In fact, the append only unbounded data nature of messages in the queue is exactly what makes the stream store more appealing to these message brokers.
Overall, there is a separation of concerns in storage and networking in each of these cluster-based products and although message brokers tend to position themselves as peer-to-peer networking over storage in local nodes, it is the exact opposite for the data stores. Since they are independent, the message broker together with a stream store can bring in alternate layers of networking and storage in an integrated and more powerful way that can make the message processors take advantage of stream-based programmability while preserving the exactly once and consistency models for the data.