Friday, July 7, 2017

Today we get back on our discussion of the NoSQL design. These involve Key Value store, run on a large number of commodity machines with data partitioned and replicated between machines. We discussed the infrastructure which consists of physical and virtual nodes Data is partitioned based on consistent hashing. Cachepoints are arranged along the circle depicting the key range and cache objects corresponding to the range. With multiple copies of data, we need consistency models. The choice of consistency model determines how the client request is dispatched to a replica and how the replicas propagate and apply the updates. The topology that helps with this can be classified as a single master model, a multi-master or no-master model or a Quorum based 2-PC model that provides strict consistency.
The syncing of states is achieved with techniques such as the traditional 2PC. To improve availability, quorum based 2PC was introduced. In this model, the coordinator asks all the  replicas to write the data but waits only for the responses from a select number of them. When reading the data, a minimum set of replicas are read and the one returned with the latest timestamp. This technique provided strict consistency. 
If we don't require strict consistency, we can go with a Gossip protocol to propagate the updates via message exchanges. A vector clock is used to help with the messaging. Both the client and replicas maintain vector clocks Each local vector clock merges the timestamp by picking out the maximum value. The client does not share its vector clock, only the replicas do When the client sends its vector clock, the replicas will check whether the client state precedes any of its current versions and discard it accordingly. When it receives updates from other replicas via gossip, it will merge the version trees. 
Replicas may also exchange their logs and check whether certain operation can be applied because their dependencies have been applied Since the operations are sorted by vector clock in causal order, simultaneous operations can also be serialized.  The replica performing the first update gets to pick a sequence number higher than all others and everyone follows this sequence number in their vector clocks. Since operations are in different stages of processing on different replicas, a replica will not discard the operations it has completed until it sees the vector clocks from all others to have preceded it.
Deletes are somewhat different because when an object is deleted, its timestamp may also be lost. Therefore to handle this case, the associated metadata and timestamp information of the deleted objects is kept around long enough until confirmed by all other replicas.
Storage for the map-reduce may also be in the form of distributed files or even databases. The advantage of using databases in such cases is that they not only provide copy-on-write but also multi-version concurrency control. A bloom filter or an index can be used to determine if a key exists in a set. All operations are logged. All data and logs are flushed to disk to handle recovery for that node. 
The API model for a large keystore can include the following 
get(key) which extracts the value given in a key
put(key, value) which creates or updates the value given its key
delete(key) which removes the key and its associated value
execute(key, operation, parameters) which invokes an operation to the value
mapreduce(keylist, mapFunc, reduceFunc) which invokes mapreduce on a key range
#codingchallenge http://ideone.com/SjlLWd
deserialization http://ideone.com/CoZYi0
#palindromes in a string : http://ideone.com/2WA4tn 

No comments:

Post a Comment