Cluster computing

Monday, July 20, 2020

Repackaging and Refactoring

Software applications take on a lot of dependencies that go through revisions independent from the applications. These dependencies are fortunately external so the application can defer the upgrade of a dependency by holding on to a version but hopefully not for long otherwise it incurs technical debt.

The application itself could be composed of many components that could be anywhere in the application architecture and hierarchy. When these components undergo rewrite or movement in the layers, then the changes could impact the overall functionality even when there is no difference in logic and purpose.

One way to eliminate the concerns from unwanted regressions in the application behavior, is to position the components where they can change without impacting the rest of the organization.

For example, an application may depend on one or more storage providers. These providers are usually loaded by configuration at the time the application is launched. Since any of the storage provider could be used, the application ships all of them in a Java jar file.

In this case, the positioning of the components could be such that the application separates the storage provider with a contract that can be fulfilled by in-house dependencies or those written externally and available to be imported via standard Java library dependency.

The contract itself could be in its own jar file so that the application is only aware of the contract and not the storage providers which gives more flexibility to the application to focus on its business logic that runtime / host / operational considerations.

The storage providers could now be segregated and provided in their own library which can then be established as a compile time or preferably runtime dependency for the application. This gives the opportunity to test those storage providers more thoroughly and independent of changes in the application.

The cost of this transformation is typically the repackaging, refactoring and reworking of the tests for the original libraries. The last one being the hidden cost which needs to be expressed not only as the changes to the existing tests but the introduction of new more specific tests for the individual storage providers that were not available earlier.

The tests also vary quite a bit because they are not restricted to targeting just the components. They could be testing integration which spans many layers and consequently even more dependencies than the code with which the application runs.

As the repackaging is drawn out, there will be interfaces and dependencies that will be exposed which were previously taken for granted. These provide an opportunity to be declared via tests that can ensure compatibility as the system changes.

With the popularity of docker images, the application is more likely required to pull in the libraries for the configurable storage providers even before the container starts. This provides an opportunity for the application to try out various deployment modes with the new packaging and code.

Sunday, July 19, 2020

Stream store discussion continued

The notification system is complimentary to the health reporting stack but not necessarily a substitute. This document positions the notification system component of the product as a must-have component which should work well with existing subscriber plugins. It also exists side by side with metrics publisher and subscribers.

Publisher-subscriber pattern based on observable events in a stream suits the notification system very well. Persistence of notifications also enables web-based access of notifications that can be programmed to deliver in more than one form to a customer as per the compatibility of the device owned by the customer. A stream store proves to be better suited for the messaging service so as a storage solution in can provide this service out of the box.

Classes:

Sensor Manager:

• Refreshes the list of active sensors by getting the entire set from SensorScanner.

• Creates a sensor task for each of the active sensors and cancels those for the inactive ones

SensorTask

• Retrieves sensors from persistence layer

• Persists sensor to persistence layer

• Determines the senor alert association

SensorScanner

• Maintains the sensors as per the storage container hierarchy in the storage system

• Periodically updates the sensor manager

SensorClient

• Creates/gets/updates/deletes a sensor

AlertClient

• Creates/gets/updates/deletes an alert

AlertAction

• Changes state of an alert by triggering the operation associated

NotificationList

• Creates a list of notification objects and implements the set interface

• Applies a function parameter on the data from the set of notification objects

• Applies predefined standard query operators on the data from the set of notification objects

Notification

• Data structure for holding the value corresponding to a point of time.

Unit-Test:

Evaluator evaluator = sensor.getEvaluatorRetriever().getEvaluator("alertpolicy1"); SensorStateInformation stateInformation = evaluator.evaluate(sensor, threshold);

Saturday, July 18, 2020

Stream store discussion continued

Design of alerts and notifications system for any storage engineering product.

Storage engineering products publish alerts and notifications that interested parties can subscribe to and take actions on specific conditions. This lets them remains hands-off the product as it continues to serve the entire organization.

The notifications are purely a client-side concept because a periodic polling agent that watches for a number of events and conditions can raise this notification. The store does not have to persist any information. Yet the state associated with a storage container can also be synchronized by persistence instead of retaining it purely as an in-memory data structure. This alleviates the need for the users to keep track of the notifications themselves. Even though the last notification is typically the most actionable one, the historical trend from persisted notifications gives an indication for the frequency of adjustments.

The notifications can also be improved to come from various control plane activities as they are most authoritative on when certain conditions occur within the storage system. These notifications can then be run against the rules specified by the user or administrator so that only a filtered subset is brought to the attention of the user.

Notifications could also be improved to come from the entire hierarchy of containers within a storage product for all operations associated with them. They need not just be made available on the user interactions. They can be made available via a subscriber interface that can be accessed globally.

There may be questions on why information on the storage engineering product needs to come from notifications as opposed to metrics which are suited for charts and graphs via existing time-series database and reporting stack. The answer is quite simple. The storage engineering product is a veritable storage and time series database in its own right and should be capable storing both metrics and notifications. All the notifications are events and those events are also as continuous as the data that generates them. They can be persisted in the store itself. Data does not become redundant as they are stored in both formats. Instead, one system caters to the in-store evaluation of rules that trigger only the alerts necessary for the humans and another is more continuous machine data that can be offloaded for persistence and analysis to external dedicated metrics stacks.

When the events are persisted in the store itself, the publisher-subscriber interface then becomes similar to writer-reader that the store already supports. The stack that analyzes and reports the data can read directly from the store. Should this store container for internal events be hidden from public, a publisher-subscriber interface would be helpful. The ability to keep the notification container internal enables the store to cleanup as necessary. Persistence of events also helps with offline validation and introspection for assistance with product support.

Friday, July 17, 2020

Stream store discussion continued

Pravega stream store already has an implementation for ByteArraySegment that allows segmenting a byte array and operating only on that segment. This utility can help view collections of events in memory and writing it out to another stream. It extends from AbstractBufferView and implements ArrayView interfaces. These interfaces already support copyTo methods that can take a set of events represented as a ByteBuffer and copy it to destination. If copying needs to be optimized, the BufferView interface provides a reader that can copy into another instance of BufferView. The ArrayView interface provides an index addressable collection of ByteBuffers with methods for slice and copy. The ByteArraySegment provides a writer that can be used to write contents to this instance and a reader that can copy into another instance. It has methods for creating a smaller ByteArraySegment and copying the instance.

The segment store takes wire commands such as to append an event to a stream as part of its wire service protocol. These commands operate on an event or at best a segment at a time. But the operations they provide are the full set of create, update, read and delete. Some of the entries are based on table segments and require special case because they are essentially metadata in the form of key-value store that are dedicated to stream, transaction and Segment metadata. The existing wire commands serve as an example for extensions to segment ranges or even streams. Even the existing wirecommands may be sufficient to handle segment ranges merely by referring to them by their metadata. For example, we can create metadata from existing metadata and make an altogether new stream. This operation will simply edit the metadata for the new stream to be different in its identity but retain the data references as much as possible. Streams are maintained as much isolated from one another as possible which requires data to be copied. Copying data is not necessarily bad or time-consuming. In fact, storage engineering has lowered the cost of storage as well as that of the activities involved which speaks in favor of users and applications that want to combine lots of small streams into a big one or to make a copy of one stream for isolation of activities. Data pipeline activities such as these have a pretty good tolerance to stream processing operations in general. The stream store favors efficiency and data deduplication but it participates in data pipeline activities. It advocates combining stream processing and batch processing using the same stream and enables writing each event exactly once regardless of failures in the sender, receiver and network. This core tenet remains the same as applications continue to demand their own streams and in the case where they cannot share an existing stream, they will look for copyStream logic.

Now coming back to other examples of commands and activities involving streams and segments, we include an example from integration tests as well. The StreamProducerOperationDataSource also operates on a variety of producer operation types. Some of these include create, merge and abort transactions. A transaction is its own writer of data because it commits all or none. These operations are performed independently and the merge transaction comes close to combining the writes and serves as a good example for the copy stream operation.

Thursday, July 16, 2020

Stream manager discussion continued

The following section describes the stream store implementation of copyStream.

Approach 1. The events are read by a batch client from the streamcuts pointing to head and tail of the stream. The head and tail are determined at the start of the copy operation and do not accommodate any new events that may be written to the tail of the stream after the copy operation is initiated. The segment iterator initiated for the ranges copies the events one by one to the destination stream. When the last event has been written to the stream, the last segment is sealed and the destination stream is made available to the caller. The entire operation is re-entrant and idempotent if a portion of the stream has already been written.

Approach 2. The stream store does not copy events per se but segments since the segments point to ids, epoch and offset which are invariants between the copies. The segments are stored on tier 2 as files or blobs and they are copied with a change of name. The resulting set of segment ranges are then placed in the container for a new stream which is the equivalent of the folder or bucket on tier 2 and given a proper name for the stream. The entire operation can be scoped to a transaction so that all or none are copied.

Between the two approaches, the latter is closer to the segment store and probably more efficient. The metadata for the streams such as the watermarks and reader group states are removed. Only the data part of the streams is copied. The copied streams are then registered with the stream store.

Wednesday, July 15, 2020

Reader Group Notifications continued

The following section describes the set of leads that can be drawn for investigations into notifications. First, the proper methods for determining the data with which the notifications are made, needs to be called correctly. The number of segments is a valid datapoint. It can come from many sources. The streamManager is authoritative but the code that works with the segments might be at a lower level to make use of the streamManager. The appropriate methods to open the stream for finding the number of segments may be sufficient. If that is a costly operation, then the segments can be retrieved from say the readergroup but those secondary data structures must also be kept in sync with the source of truth.

Second the methods need to be called as frequently as the notifications that are generated otherwise the changes will not be detected. The code that generates the notifications is usually offline as compared to the online usage of the stream. This requires the data to be fetched as required for generating these notifications.

Lastly, the notification system should not introduce unnecessary delay in sending the notifications. If there is a delay or a timeout, it becomes the same as not receiving the notification.

The ReaderGroupState is initialized at the time of the creation of reader group. It could additionally be initialized/updated at the time of reader creation. Both the reader Group and reader require the state synchronizer to be initialized so the synchronizer can be initialized with the new group state. This is not implemented in the Pravega source code yet.

During the initialization of the ReaderGroupState the segments are fetched from the controller and corresponds to the segments at the current time. The controller also provides the ability to get all segments between start and end tailcuts. However those segments are a range not a map as desired.

The number of Readers comes from the reader group. These data will remain the same on cross thread calls and invocations from different levels of the stack. The readerGroup state is expected to be accurate for the purpose of sending notifications but it can still be inaccurate if the state is initialed once and only updated subsequently without refreshing the state via initialization each time.

Tuesday, July 14, 2020

Reader Group notifications continued.

One of the best ways to detect all reader group segment changes is to check the number of segments at the time of admission of readers into the reader group.

This would involve:

Map<SegmentWithRange, Long> segments = ReaderGroupImpl.getSegmentsForStreams(controller, config);

synchronizer.initialize(new ReaderGroupState.ReaderGroupStateInit(config, segments, getEndSegmentsForStreams(config)));

Where the segments are read from the ReaderGroup which knows the segment map and the distribution of readers rather than getting the segments from the state.

Lastly, the notification system should not introduce unnecessary delay in sending the notifications. If there is a delay or a timeout, it becomes the same as not receiving the notification.