Saturday, August 19, 2017

We continue discussing the ZooKeeper. It is a co-ordination service with elements from group messaging, shared registers and distributed lock services. It provides a interface that guarantees wait-free property and FIFO execution of requests from each client.  Requests across all clients are also linearized.
We saw a few examples of primitives that are possible with ZooKeeper. These primitives exist only on the client side. These included implementing dynamic configuration in a distributed application, specifying group membership and to implement locks and double barriers. These have been put to use with applications such as Yahoo web crawler which uses this for its page fetching service. Katta, a distributed indexer uses this for coordination. Even high traffic Yahoo message broker which is a distributed publish-subscribe system uses this for managing configuration, failure detection and group membership. We discussed their usages in detail.
We next review the components in the implementation of ZooKeeper. It provides high availability by replicating the ZooKeeper data  These include the following 1) request processor, 2) Atomic Broadcast and 3) replicated database
The request processor is one which prepares the request for processing that also includes co-ordination activities among servers for write requests. If the request processing involves co-ordination, it is handled by an agreement protocol which is an implementation of atomic broadcast. The changes are committed in ZooKeeper database that is replicated across all servers.  This database is an in-memory database containing the entire data tree. Each Znode is about 1MB of data by default and this can be configured. Disk updates are logged and updates are write-through.  They also keep a replay or write ahead log of committed operations and take snapshots of their in-memory database.
Clients connect to only one server each. If they perform read operations, they directly read the database avoiding the agreement protocol.  Write requests are forwarded to a leader which notifies the others with messages consisting of state changes and agreed upon state changes. Write requests are in a transaction and the messaging layer is atomic, so servers do not have local replicas that diverge. A request may not be idempotent but a transaction is one.  The leader converts a request into a transaction describing the state of the system when the request is applied. With the help of transactions, all requests are translated into success transactions or error transactions and the database is kept consistent. The agreement protocol is initiated by the leader which executes the request and broadcasts the change with Zab using a simple majority quorum.  The request processing pipeline is kept full to achieve high throughput.  Zab delivers the messages in order and exactly once. Even if a message is redelivered during recovery, all transactions are idempotent.

No comments:

Post a Comment