Cluster computing: Stream operations continued

Wednesday, July 1, 2020

Stream operations continued

Scopes can also be copied by iterating over all the streams and making copies of each stream. This is generally a long running operation as each stream could be massive by itself. The operation to perform scope copy is the similar to the copy of a folder containing streams that were written to files

The utility of stream copy is only enhanced by the use cases of scope copy, segment range copy and archival. In place editing of streams is avoided by performing sealing of segments written. There is no overwrite and all write operations are append only

The streamManager could also move a stream from one scope to another by using copy operation although metadata only operation would be significantly more efficient in this case.

The streamManager can also automate returning the size and count of historical events. This is particularly helpful for aggregating across streams and scopes and the stream store is best able to provide that information. The information does not have to come via streamManager. It can be queried through a metering API. The purpose of using an API is that it becomes a pull model where the data is pulled from the store rather than having the store to publish it. Any API can do this including the streamManager. The stream store is best able to serve this information because it can do it most efficiently and with high performance. Also, the callers can choose to call as and when required rather than having to subscribe to the stream store for any pre-defined period. This follows a read-only model for the consumers of the API and it is very much suited for dashboards which display charts and graphs. Most dashboards involve a query against a table or a store to render the data and the query can be as simple as making an API call. Unless the metering information is periodically flushed to disk, there is no need for data to be pushed from the stream store. Even in cases where the data needs to be pushed, a man-in-the-middle agent for the receiver can pull from the store and push to the receiver. This is typically the case for all store that gather metrics from other heterogenous system. Since the size of the table or store persisting the data and the type of query can be very involved, the stream store is best able to determine the API and the result to be returned. When the API is properly designed, the caller will be able to meet its display requirements from the parameters of the API an

Cluster computing

Wednesday, July 1, 2020

Stream operations continued

No comments:

Post a Comment