The benefits of Async stream readers in cache for data pipelines to stream store
The cache essentially projects batches of segmentRanges so that the entire stream does not have to scanned from begin to end repeatedly by every reader. This mode of operation is different from pairing every reader with a writer to the destination.
The cache may use a queue to hold several segmentRanges from a stream.
The choice of queue is extremely important for peak throughput. The use of a lock free data structure as opposed to an ArrayBlockingQueue can do away with lock contention. The queues work satisfactorily until they are full. Garbage free Async stream readers have the best response time.
Asynchronous read together with parallelism boost performance. The cache may encounter significant size for each segment and a writer that transfers a segment over the network may take time in the order of hundreds of milliseconds. Instead, having a continuous background import for adjustable window of steam segments tremendously improves the load on the stream store while responding to seek requests on segments. The throttling and backpressure can be well handled by adjustible window.
Another approach is to have a continuous reader read the stream from beginning to end in a loop and have the cache or the writer decide which of the segments are prioritized to be written to the destination stream store when conflating different streams. This approach requires the cache to discard segments that are not necessary for the store such as when they are duplicates. The cache or the writer can even empty the blocking queue to suggest more parallelism in the readers as necessary to replenish the window of segments it needs.
The stitching of segment ranges can be performed in the cache instead of the stream store by the design above. Even a message queue broker or a web accessible storage can benefit the use of conflating of streams
As long as the historical readers follow the same order as the order of segmentRanges in the source stream, the events will be guaranteed to be written in the same order in the destination stream. This calls for the historical readers to send their payload to a blocking Queue and the writer will follow the same order of writes as the order in which they are in the queue . The blocking Queue backed by the cache becomes a serializer of the different segmentRanges to the destination stream and the writes can be superfast when all the events for the segmentRange are available in memory
No comments:
Post a Comment