Stream Store Notifications:
A stream store is elastic, unbounded and continuous data storage. Events arrive in a stream to the stream store and are serialized by time. A partitioned collection of events that span a start time t0 and end time t1 is called a segment. There can be more than one segment between t0 and t1 to allow parallel read and write. The multiple segments are spread out over a key space determined by the use of routing keys used to write the events. Segments do not overlap in both horizontal and vertical directions and need not align on the same time boundaries. Each segment can be started and sealed independently so that no more events are appended to it. The number of segments for a stream depends on the scaling policy specified at the time of stream creation. A scaling policy can be fixed, data-based or event-based to accommodate varying size and count of segments according to anticipated load.
This logical representation of a stream is hard for a user to visualize when the data is being read while it is arriving by the order of millions from a device. The user has the guarantee that the events will be written in order of their arrival and none of the events will be lost during write and subsequent read but the user does not know the progress made in terms of reading the segments or if all the data has been read. If the user knew the progress, the readers could be scaled accordingly to keep up or complete reading at a faster rate.
The stream store publishes notifications for segments that come to the aid of the user in this regard. This article describes these notifications and their usage with potential improvements in the future.
The notifications are usually based on their type. For example, a well-known stream store publishes a SegmentNotification and an EndOfDataNotification when all the streams managed by a reader group have been read. The notifications are posted to the reader group so that the user can take action associated with the entire reader group
The SegmentNotification is published only when the number of segment changes. With its payload of the current number of segments and the current number of readers, a reader group can determine if the number of readers needs to be scaled up or scaled down. The trend in changes to the number of segments gives an indication of the rate at which the data is arriving. A reader is bound to a segment until all the events in that segment have been read. Each segment is read exactly by one reader in any reader group configured with that stream. Even if the readers are scaled up or down, the stream store maintains the assignments of segments to readers in a way that balances the active readers to the segments uniformly. There are at most as many active readers in a reader group as there are segments in the stream. Ideally, there is a one-to-one mapping between a segment and a reader.
The notifications are purely a client-side concept because a periodic polling agent that watches for the number of segments and the number of readers can raise this notification. The stream store does not have to persist any information. Yet the state associated with a stream can also be synchronized by persistence instead of retaining it purely as an in-memory data structure. This alleviates the need for the users to keep track of the notifications themselves although the last notification is typically the most actionable one and the historical trend only gives an indication for the frequency of adjustments.
The notifications can also be improved to come from the writer side because the writers are aware of the current and active segments being written to and when they are sealed since the previous segment will not be the same as the current for the event written after sealing. These writer-side notifications can give indication to the distribution of events written to segments across key-space.
No comments:
Post a Comment