Cluster computing: Stream store discussion continued

Friday, July 17, 2020

Stream store discussion continued

Pravega stream store already has an implementation for ByteArraySegment that allows segmenting a byte array and operating only on that segment. This utility can help view collections of events in memory and writing it out to another stream. It extends from AbstractBufferView and implements ArrayView interfaces. These interfaces already support copyTo methods that can take a set of events represented as a ByteBuffer and copy it to destination. If copying needs to be optimized, the BufferView interface provides a reader that can copy into another instance of BufferView. The ArrayView interface provides an index addressable collection of ByteBuffers with methods for slice and copy. The ByteArraySegment provides a writer that can be used to write contents to this instance and a reader that can copy into another instance. It has methods for creating a smaller ByteArraySegment and copying the instance.

The segment store takes wire commands such as to append an event to a stream as part of its wire service protocol. These commands operate on an event or at best a segment at a time. But the operations they provide are the full set of create, update, read and delete. Some of the entries are based on table segments and require special case because they are essentially metadata in the form of key-value store that are dedicated to stream, transaction and Segment metadata. The existing wire commands serve as an example for extensions to segment ranges or even streams. Even the existing wirecommands may be sufficient to handle segment ranges merely by referring to them by their metadata. For example, we can create metadata from existing metadata and make an altogether new stream. This operation will simply edit the metadata for the new stream to be different in its identity but retain the data references as much as possible. Streams are maintained as much isolated from one another as possible which requires data to be copied. Copying data is not necessarily bad or time-consuming. In fact, storage engineering has lowered the cost of storage as well as that of the activities involved which speaks in favor of users and applications that want to combine lots of small streams into a big one or to make a copy of one stream for isolation of activities. Data pipeline activities such as these have a pretty good tolerance to stream processing operations in general. The stream store favors efficiency and data deduplication but it participates in data pipeline activities. It advocates combining stream processing and batch processing using the same stream and enables writing each event exactly once regardless of failures in the sender, receiver and network. This core tenet remains the same as applications continue to demand their own streams and in the case where they cannot share an existing stream, they will look for copyStream logic.

Now coming back to other examples of commands and activities involving streams and segments, we include an example from integration tests as well. The StreamProducerOperationDataSource also operates on a variety of producer operation types. Some of these include create, merge and abort transactions. A transaction is its own writer of data because it commits all or none. These operations are performed independently and the merge transaction comes close to combining the writes and serves as a good example for the copy stream operation.

Cluster computing

Friday, July 17, 2020

Stream store discussion continued

No comments:

Post a Comment