Using cache for stream access:
The cache for streams is a write through cache because the writes are at the end of the stream. These writes are hardly a concern for cache and their performance is mitigated by Bookkeeper. Batched writes from Bookkeeper are not different from a periodic backup schedule.
Read access patterns benefit from some organization of segments
When the segments in the stream have skip level access by say 2,4,8 adjacent nodes, the cache can access the segments via skip levels and prefetch those that will be read. Skip level access on streams means that we are able to perform as fast as random access over sequential streams.
The cache may use indexes on locations to augment the deficiency of storing record locations in the stream store. This index is merely a translation of the sequential segment number from the stream store in terms of the leaps of the contiguous segments we need to make. And the best way to do that for that particular segment. Given a segment it’s sequential number from the start may be internal to the stream store. However, if that number is available from the stream store to be mapped with the segment whenever it is cached, then the translation of the location to the segment in terms of skip-level access is straightforward for example the number 63 from start, will require as many multiples of 8 less than target, same with multiples of 4 starting from the position left with the previous step, then multiples of 2 such that they are maximized in that order so that the overall count is least. This computation benefits in bringing ranges based on numbers alone rather than range indexes based on say BTree
Without the location available from the stream store some persistence is needed for the lookup of the segment number for the corresponding segment and usually involves an iteration of all the segments from the store. A hash of the segment may be sufficient for these lookups.
The hierarchical representation of stream segments may be facilitated with other data structures but they tend to centralize all operations. The purpose of skip level access is faster access on the same sequential access so that no other data structures are necessary
The cache for streams is a write through cache because the writes are at the end of the stream. These writes are hardly a concern for cache and their performance is mitigated by Bookkeeper. Batched writes from Bookkeeper are not different from a periodic backup schedule.
Read access patterns benefit from some organization of segments
When the segments in the stream have skip level access by say 2,4,8 adjacent nodes, the cache can access the segments via skip levels and prefetch those that will be read. Skip level access on streams means that we are able to perform as fast as random access over sequential streams.
The cache may use indexes on locations to augment the deficiency of storing record locations in the stream store. This index is merely a translation of the sequential segment number from the stream store in terms of the leaps of the contiguous segments we need to make. And the best way to do that for that particular segment. Given a segment it’s sequential number from the start may be internal to the stream store. However, if that number is available from the stream store to be mapped with the segment whenever it is cached, then the translation of the location to the segment in terms of skip-level access is straightforward for example the number 63 from start, will require as many multiples of 8 less than target, same with multiples of 4 starting from the position left with the previous step, then multiples of 2 such that they are maximized in that order so that the overall count is least. This computation benefits in bringing ranges based on numbers alone rather than range indexes based on say BTree
Without the location available from the stream store some persistence is needed for the lookup of the segment number for the corresponding segment and usually involves an iteration of all the segments from the store. A hash of the segment may be sufficient for these lookups.
The hierarchical representation of stream segments may be facilitated with other data structures but they tend to centralize all operations. The purpose of skip level access is faster access on the same sequential access so that no other data structures are necessary
 
No comments:
Post a Comment