Cluster computing: Pipelines

Thursday, September 17, 2020

Pipelines

The stream store has the opportunity to streamline access to metadata independent of applications so that they can focus on their business needs.

#pipelines

Build Automation jobs and message queues have a lot in common with streams and events. There is a pipeline pattern formed where events from one stream feed into another. The operators that extract, transform and load the events from one stream to another can form a continuous pipeline that transfers events as they come.

The setting up of these pipelines requires readers and writers to be commissioned between streams. This relay can easily scale as and when the load increases given the parallelization supported by the stream store. But it would help to have an api that sets up a relay and takes simple filters as say predicate parameters. This automatic provisioning of relays is sufficient to setup multiple hop pipelines.

This higher-level api for relays to setup a pipeline can also support management features associated with a pipeline by collecting metrics and statistics as described earlier for streams or to report on status at various points of transitions. The ability to view end-to-end operations of a pipeline is helpful for business needs and for manageability.

There are many other patterns of inter stream activities that can also be supported via a set of operators like appenders, collectors and forwarders. These make pipelines a veritable storage container in itself along with its metadata.

The participation of stream store in data pipeline is now transformed as the participation of streams in a pipeline with managed features from the stream store. Some of the onus from applications has been removed by including automations within the stream store.

Cluster computing

Thursday, September 17, 2020

Pipelines

No comments:

Post a Comment