Friday, September 4, 2020

Distributed stream stores

 Federated and Chained Stream stores:

Introduction: A Stream store is a limitless continuous storage for data ingestion traffic from devices. As such, it can scale to large loads with their throughput and latency requirements. It is also convenient for an administrator to setup and manage one. But deployments rarely tend towards global solutions. Instead, different departments want to maintain, own and run their instances.  Sometimes a swarm of stores may help split the traffic with some naming or routing convention.  These instances must behave in a co-ordinated manner. The following document proposes two different approaches.

1. Federated deployments: In the Federated stores, we can have a matrix deployment model where in the local store is translated to a canonical participation convention and exported such as with an open public view or closed federated view. In the latter case, the view written and maintained by administrators and global in nature can still be visible externally with access to specific user groups.

Federated schema is usually built with consensus. And data interchange standards complement federated schema. As an example, XML is a syntax for federated schema and data in federated schema even if the stores are in a flat organization under the root element. This category has support of major IT vendors.

Query processing also requires some features for distributed stores. Data transfer strategies now need to consider communications cost besides CPU cost and I/O cost. Query decomposition and allocation between stores should also be determined. Consider the case when data is at site 1 and site 2 and query is at site 3. In such cases, we could send both stream events to site 3 or send one stream events to Site 2 and then the result to site 3 or send the other stream events to site 1 and then the result to site 3.

2. Chained deployments: In this case, the deployments have to be chained so that if the processing one store say times out, it can proceed to another. This model is favored for distributed queries because the stores are linked even if they don’t have redundant data. Most interactions between stores is in the form of requests and responses.  These may have traverse through multiple layers before they are authoritatively handled by the store and stream. Relays help translate requests and responses between layers. They are necessary for making the request processing logic modular and chained. if the current stream store does not resolve the stream for a event located in its streams, is it possible to distribute the query to another stream store. The resolver merely needs to forward the queries that it cannot answer to a default pre-registered outbound destination. In a chained stream store, the queries can make sense simply out of the naming convention and say if a request belongs to it or not. If it does not, it simply forwards it to another stream store. This is somewhat different from the naming convention technique which may or may not have any interpretable part that can determine the site to which the object belongs.  The linked stream store does not even need to take time to determine that the local instance is indeed the correct recipient. It can merely translate the address to know if it belongs to it with the help of a registry. This shallow lookup means a request can be forwarded faster to another linked stream store and ultimately to where it may be guaranteed to be found. The Linked stream store has no criteria for the stream store to be similar and as long as the forwarding logic is enabled, any implementation can exist in each of the stream store for translation, lookup and return. Unlike the matrix approach which might require hashes and finding the store based on the access, the client can be blissfully ignorant of where the data resides or which stores answers the call.  Whether we use routing tables or a static hash table, the networking over the chained stream stores can be its own layer facilitating routing of events to correct stream store.

Conclusion: The federated and chained deployments are patterns found in many organizational structures. Their application to storage products and particularly stream store becomes novel.


No comments:

Post a Comment