Cluster computing

Saturday, October 12, 2019

We were discussing the cache usage for stream access:

When the segments in the stream have skip level access by say 2,4,8 adjacent nodes, the cache can access the segments via skip levels and prefetch those that will be read. Skip level access on streams means that we are able to perform as fast as random access over sequential streams.

The cache may use indexes on locations to augment the deficiency of storing record locations in the stream store. This index is merely a translation of the sequential segment number from the stream store in terms of the leaps of the contiguous segments we need to make. And the best way to do that for that particular segment. Given a segment it’s sequential number from the start may be internal to the stream store. However, if that number is available from the stream store to be mapped with the segment whenever it is cached, then the translation of the location to the segment in terms of skip-level access is straightforward for example the number 63 from start, will require as many multiples of 8 less than target, same with multiples of 4 starting from the position left with the previous step, then multiples of 2 such that they are maximized in that order so that the overall count is least. This computation benefits in bringing ranges based on numbers alone rather than range indexes based on say BTree

Without the location available from the stream store some persistence is needed for the lookup of the segment number for the corresponding segment and usually involves an iteration of all the segments from the store. A hash of the segment may be sufficient for these lookups.

The hierarchical representation of stream segments may be facilitated with other data structures but they tend to centralize all operations. The purpose of skip level access is faster access on the same sequential access so that no other data structures are necessary

Another approach is to maintain the segment numbers on both sides. For example, the cache may have clients that read from the start of the stream up to a targeted segment. The stream store may perform repeated scans as it serialized the client’s accesses to the stream. The cache has the opportunity to bypass the stream store and alleviate the workload on the stream store by providing the segments that are most popular between accesses. As each client presents a target segment number and the stream store presents the segment numbers from the same or different stream, the cache has the opportunity to come up with segment numbers based on relevance via skip level access and priority via its eviction policy. The cache therefore becomes an intelligent agent that does away with redundant scans of streams by the store.

The overlapping interval range between segment numbers is decided by the cache which it uses with skip level access to fetch the segments. Efficient representation of such a range of segments is easy with the same logic as demonstrated for a targeted segment number. In this case, the same algorithm is repeated for begin and end of targeted range and the interval is represented in terms of the skip level access between the start and the end.

Cluster computing

Saturday, October 12, 2019

No comments:

Post a Comment