Cluster computing

Tuesday, July 7, 2020

Stream store notification continued

The notifications are purely a client-side concept because a periodic polling agent that watches for the number of segments and the number of readers can raise this notification. The stream store does not have to persist any information. Yet the state associated with a stream can also be synchronized by persistence instead of retaining it purely as an in-memory data structure. This alleviates the need for the users to keep track of the notifications themselves although the last notification is typically the most actionable one and the historical trend only gives an indication for the frequency of adjustments.

The notifications can also be improved to come from the writer side because the writers are aware of the current and active segments being written to and when they are sealed since the previous segment will not be the same as the current for the event written after sealing. These writer-side notifications can give indication to the distribution of events written to segments across key-space.

Notifications could also be improved to come from segments, streams and scopes for all operations associated with them. They need not just be made available on the reader group. They can be made available via a subscriber interface available globally.

There may be questions on why such information on segments, streams and scopes need to come from notifications as opposed to metrics which are suited for charts and graphs via existing time-series database and reporting stack. The answer is quite simple. The stream store is a time-series store and should promote its own stack with the beautiful stream processing language and store that it supports. All the notifications are events and those events are also as continuous as the data that generates them. They can be persisted in the stream store itself.

When the events are persisted in the stream itself, the publisher-subscriber interface then becomes similar to writer-reader that the store already supports. The stack that analyzes and reports the data can read directly from the stream. Should this stream be internal, a publisher-subscriber interface would be helpful. The ability to keep the notification stream internal enables the store to cleanup as necessary. Persistence of events also helps with offline validation and introspection for assistance with product support.

Notifications on the reader group are interesting when readers come and go. Changes in both number of segments and number of readers could renew notifications to the reader group.

Monday, July 6, 2020

Stream Store notifications continued

Stream Store Notifications:

A stream store is elastic, unbounded and continuous data storage. Events arrive in a stream to the stream store and are serialized by time. A partitioned collection of events that span a start time t0 and end time t1 is called a segment. There can be more than one segment between t0 and t1 to allow parallel read and write. The multiple segments are spread out over a key space determined by the use of routing keys used to write the events. Segments do not overlap in both horizontal and vertical directions and need not align on the same time boundaries. Each segment can be started and sealed independently so that no more events are appended to it. The number of segments for a stream depends on the scaling policy specified at the time of stream creation. A scaling policy can be fixed, data-based or event-based to accommodate varying size and count of segments according to anticipated load.

This logical representation of a stream is hard for a user to visualize when the data is being read while it is arriving by the order of millions from a device. The user has the guarantee that the events will be written in order of their arrival and none of the events will be lost during write and subsequent read but the user does not know the progress made in terms of reading the segments or if all the data has been read. If the user knew the progress, the readers could be scaled accordingly to keep up or complete reading at a faster rate.

The stream store publishes notifications for segments that come to the aid of the user in this regard. This article describes these notifications and their usage with potential improvements in the future.

The notifications are usually based on their type. For example, a well-known stream store publishes a SegmentNotification and an EndOfDataNotification when all the streams managed by a reader group have been read. The notifications are posted to the reader group so that the user can take action associated with the entire reader group

The SegmentNotification is published only when the number of segment changes. With its payload of the current number of segments and the current number of readers, a reader group can determine if the number of readers needs to be scaled up or scaled down. The trend in changes to the number of segments gives an indication of the rate at which the data is arriving. A reader is bound to a segment until all the events in that segment have been read. Each segment is read exactly by one reader in any reader group configured with that stream. Even if the readers are scaled up or down, the stream store maintains the assignments of segments to readers in a way that balances the active readers to the segments uniformly. There are at most as many active readers in a reader group as there are segments in the stream. Ideally, there is a one-to-one mapping between a segment and a reader.

Sunday, July 5, 2020

Stream Store Notifications:

Saturday, July 4, 2020

Stream Store client discussion continued

The S3 storage differs from stream storage in that the S3 does not permit append operation. Data from the objects can be combined into a new object but the object storage leaves it to the user to do so. The streams are continuous and unbounded data and the events can be appended into the stream. The type of read and write access determines one or the other storage. The S3 storage was always web accessible while the stream storage supported gRPC protocol. The participation in data pipeline is not only about access but also how the data is held and forwarded. In the former case, batch oriented pertains to creation of individual objects while in the latter, the events flow from queues to the stream store in a continuous manner with window-based processing.

This comparison leads to the definition of APIs on the data path to be on a per event basis. This is useful for iteration and works well to forward to an unbounded storage. The stream serves this purpose very well and the readers and writers operate on a per-event basis. So, it might seem counterintuitive to implement a copyStream but the operation is valid for any collection of events including the container itself. The stream gets translated to containers on the tier 2 which already support this operation. The provisioning of this operator at the streamManager API level only enhances the devOps requirements for data transfer and programmability improvements to extract-transform-load operators.

The above arguments in favor of a copyStream operation also strengthen the case for data tiering via hot-warm-cold transitions where sealed streams become ready for retention on cheaper and hybrid storage. This transition to age the data for backup or archival is only possible on segments or collections of events.

The archival on tertiary storage is usually from the tier2 and not directly from the streams. The streams can be shortened based on retention period and the portions of the data of the unretained segmentRange then goes into cold storage. The storage for this data then is in the form of a bounded segmentRange which is more suitable for archival. The archival policy between heterogeneous system does not make a copy of the stream as it writes the data read from the source stream and then truncates the stream.

The archival implementation requires administrative intervention. On the other hand, applications for archival can be written and authorized on a stream by stream basis. These applications would need a programmatic way to copy segmentRange. The readers and writers can do this on an event by event basis while the historical reader can read a segmentRange between head and tail streamcuts. If a backup of the entire stream is required, then a copyStream comes helpful because those streams can then be handed over to an administrator or automation for export to tertiary storage.

For thousands of streams, the above method may not scale since we are archiving on a stream by stream basis. A copyScope method can only target all the streams in a scope. The ability to select such streams across scopes for custom archival belongs to an application. This application can package the stream with metadata for export offline by an administrator. The metadata would be in the same form as any other event in the stream. This addition improves the packaging and visibility into the export so they can be restored if necessary. StreamStore can handle data export and import to its tier2 but the export and import across storage belongs to an application. The application does not have to be a client of one stream store. It can be a client of two stream stores where one store acts as the source while the other stream acts as the destination. StreamStore can also act as a source or a destination if an alternate type of store is used as the destination or source respectively.

Since long running applications such as these involving archival and copyStream operations are subject to failures, pause and resume, the applications could publish notifications. These notifications could be granular and raised for every segment in the stream. The segment boundaries at which the notifications are sent could correspond to the segments in the source stream.

Friday, July 3, 2020

Stream store client discussion

There are more runtime execution statistics and metrics available but size and count are included in top level queries.

Thursday, July 2, 2020

Stream store discussion continued

Wednesday, July 1, 2020

Stream operations continued

Scopes can also be copied by iterating over all the streams and making copies of each stream. This is generally a long running operation as each stream could be massive by itself. The operation to perform scope copy is the similar to the copy of a folder containing streams that were written to files

The utility of stream copy is only enhanced by the use cases of scope copy, segment range copy and archival. In place editing of streams is avoided by performing sealing of segments written. There is no overwrite and all write operations are append only

The streamManager could also move a stream from one scope to another by using copy operation although metadata only operation would be significantly more efficient in this case.

The streamManager can also automate returning the size and count of historical events. This is particularly helpful for aggregating across streams and scopes and the stream store is best able to provide that information. The information does not have to come via streamManager. It can be queried through a metering API. The purpose of using an API is that it becomes a pull model where the data is pulled from the store rather than having the store to publish it. Any API can do this including the streamManager. The stream store is best able to serve this information because it can do it most efficiently and with high performance. Also, the callers can choose to call as and when required rather than having to subscribe to the stream store for any pre-defined period. This follows a read-only model for the consumers of the API and it is very much suited for dashboards which display charts and graphs. Most dashboards involve a query against a table or a store to render the data and the query can be as simple as making an API call. Unless the metering information is periodically flushed to disk, there is no need for data to be pushed from the stream store. Even in cases where the data needs to be pushed, a man-in-the-middle agent for the receiver can pull from the store and push to the receiver. This is typically the case for all store that gather metrics from other heterogenous system. Since the size of the table or store persisting the data and the type of query can be very involved, the stream store is best able to determine the API and the result to be returned. When the API is properly designed, the caller will be able to meet its display requirements from the parameters of the API an