Tuesday, July 4, 2017

Today we continue to review the NoSQL design. These involve Key Value store, run on a large number of commodity machines with data partitioned and replicated between machines. We discussed the infrastructure which consists of physical and virtual nodes Data is partitioned based on consistent hashing. Cachepoints are arranged along the circle depicting the key range and cache objects corresponding to the range. With multiple copies of data, we need consistency models. The choice of consistency model determines how the client request is dispatched to a replica and how the replicas propagate and apply the updates. The topology that helps with this can be classified as a single master model, a multi-master or no-master model or a Quorum based 2-PC model that provides strict consistency.
The single master model is helpful for scaling out the reads such as with replication where the master applies the updates and it is asynchronously propagated to the slaves and reads can be serviced by slaves.The updates propapagate by a state transfer model that overwrites the state or a operation transfer model that reexecutes the operations Generally the full object is not sent over the network, only a part of it as determined by hash tree, is sent as the delta of the changes. 
The multimaster model is also called as the No master model where clients can issue update to any server and each replica will eventually get to the same state. This is very helpful when we want to scale out the writes such as with such as when there are hot spots between the key range circle in consistent consistent hashing.
The syncing of states is achieved with techniques such as the traditional 2PC. This provides strict consistency when there are N replicas for data. It is done via two phases. During the prepare phase, the co-ordinator asks every replica whether it is ready to perform the update and the replicas flush to disk and respond with success.Next during the commit phase, each replica writes another log entry to confirm the update. This technique suffers from poor availability as disk I/O completes. Therefore a new technique named Quorum based 2PC is preferred. In this model, the coordinator asks all the  replicas to write the data but waits only for the responses from a select number of them. This improves probabilties in favor of availability. When reading the data, a minimum set of replicas are read and the one returned with the latest timestamp. For strict consistency, the important condition is to make the readset and the writeset overlap where the minimum number of replicas participating in the read and those participating in the writes exceed the total number of replicas.
If we don't require strict consistency, we can go with a Gossip protocol to propagate the updates via message exchanges. A vector clock is used to help with the messaging.It is a timestamp mechanism that helps each replica to order its operations based on a notion of time.Each replica keeps its own vector clock and advances it whenever there is an operation such as when an internal operation happens or when a message is sent to another replica or when it receives a message from another replica. The timestamp is also included in the message. Each local vector clock merges the timestamp by picking out the maximum value. A partial ordering among vector clocks helps with causal relationships. The state always flows unidirectionally from the replica to the client and the vector clock is represented by one entry for each clock. The client keeps a vector clock for the last replica and submits it in the requests it makes. The replicas ensure that this clock is exceeded in its operations. The gossip itself is the model for state and operation transfer. Each replica maintains a vector clock as well as a state version tree that contains all the conflicting updates. For queries, the client sends its vector clock and the replicas send back subsets of state tree preceding the clients vector clock and the client merges them. Similarly the replicas update only if the client's vector clock is advanced than its own.Replicas merge their sets using the gossip protocol by maintaining a queue for the update requests and merging the clocks.
Courtesy:nosql patterns from horicky
#codingexercise
# robot in a circle problem and solution:  http://ideone.com/vHFOJ9
# get similarities based on permutations: http://ideone.com/nunvC1

No comments:

Post a Comment