Thursday, July 16, 2020

Stream manager discussion continued


The following section describes the stream store implementation of copyStream.

Approach 1. The events are read by a batch client from the streamcuts pointing to head and tail of the stream. The head and tail are determined at the start of the copy operation and do not accommodate any new events that may be written to the tail of the stream after the copy operation is initiated. The segment iterator initiated for the ranges copies the events one by one to the destination stream. When the last event has been written to the stream, the last segment is sealed and the destination stream is made available to the caller. The entire operation is re-entrant and idempotent if a portion of the stream has already been written.

Approach 2. The stream store does not copy events per se but segments since the segments point to ids, epoch and offset which are invariants between the copies. The segments are stored on tier 2 as files or blobs and they are copied with a change of name. The resulting set of segment ranges are then placed in the container for a new stream which is the equivalent of the folder or bucket on tier 2 and given a proper name for the stream. The entire operation can be scoped to a transaction so that all or none are copied.

Between the two approaches, the latter is closer to the segment store and probably more efficient. The metadata for the streams such as the watermarks and reader group states are removed. Only the data part of the streams is copied. The copied streams are then registered with the stream store.

Pravega stream store already has an implementation for ByteArraySegment that allows segmenting a byte array and operating only on that segment. This utility can help view collections of events in memory and writing it out to another stream. It extends from AbstractBufferView and implements ArrayView interfaces.  These interfaces already support copyTo methods that can take a set of events represented as a ByteBuffer and copy it to destination. If copying needs to be optimized, the BufferView interface provides a reader that can copy into another instance of BufferView.  The ArrayView interface provides an index addressable collection of ByteBuffers with methods for slice and copy.  The ByteArraySegment provides a writer that can be used to write contents to this instance and a reader that can copy into another instance. It has methods for creating a smaller ByteArraySegment and copying the instance.

No comments:

Post a Comment