Cluster computing: Distributed stream stores continued

Friday, September 11, 2020

Distributed stream stores continued

A federation does not mandate a P2P network but it does share principles of top heavy architecture because it brings consistency across all members.

Chaining algorithms:

Although chaining refers to something in continuity like in the chain of trust for certificates or the chain of custody as in the case of a pipeline of devOps activities, this article introduces the notion of chaining in the form of linked stream stores. A set of linked stream stores helps with distributed queries where the stream may be resolved not in the immediate store where the query is received but in a store that can be reached over links and is the true owner of that stream. In that sense, chaining helps with relay of control and data traffic as well as analytical workloads. Certain stores have the ability to include geo replication and provide a content distribution network which allows them to internalize all load-balancing, assignment and fulfilment activities but in a distributed setting, we can take some of those concerns out of stream stores and across an entire collection.

Using simple techniques of stream resolutions by name or classification, participating stores can forward requests until a specific store is found. When the naming convention utilizes the ordering of stream stores, the access becomes logarithmically more efficient than next hops. For example, let us say that stream stores are arranged in contiguous ranges of key space uniformly spread between start and finish. Any new stream store can be inserted into a range and an old one can be removed from this key range without affecting the assignments of other stores.

Given a stream name, let us say a classifier has identified the store to belong to a key range. Now the lookup of that store in the overall key space can follow the same convention as the lookup of a node in a skip-list. The common techniques in such lookups involve the following criteria:

1. Starting at the highest level of skipping, we access a store in the desired range or the one immediately before it.

2. Then dropping down a level, we repeat step 1 until we reach the intended store.

The algorithm above works in all cases because it has an invariant and an incremental progress towards a known termination. Therefore, its correctness is determined.

The arrangement for a skip-list is governed by the following criteria:

1. Each skip list node has a height and a set of neighboring skip list nodes, precisely as many neighboring nodes as its height, some of which may be null.

2. Each skip list node has some piece of data associated with it.

3. Each skip list node has a key to look it up

Skiplist is initialized with the max height of any node. Head and tail are initialized and attached

Insert operation involves the following:

Find where the node goes by comparing keys

If it is a duplicate, don't insert, otherwise perform insert

Use a random number to generate a height

Remove operation involves the following:

Find the node to be deleted, traverse using available max height to lowest till we reach the node

for each of the levels on the current node, update head and tail forward references and delete the node.

With these accesses outlined, the stream stores can be chained in the same way as the nodes in a skip-list for efficient traversal.

Cluster computing

Friday, September 11, 2020

Distributed stream stores continued

No comments:

Post a Comment