A brief survey of distributed consensus algorithms in cloud
services.
Many cloud services and computing infrastructure are
concerned with providing consistency levels and redundancy for the changes made
to the data. For example, Azure Cosmos
DB provides five consistency levels of Eventual, Consistent-Prefix, Session,
Bounded Staleness and strong consistency in the increasing order. Each level provides availability and
performance tradeoffs. Each level provides availability and performance
tradeoffs and the relaxation of consistency levels from stronger to weaker
increases the availability and throughput and reduces latency. These are proven
to be true for any distributed system by the PACELC theorem that states that in
case of network partitioning (P) in a distributed computer system, one has to
choose between availability (A) and consistency (C) (as per the CAP theorem),
but else (E), even when the system is running normally in the absence of
partitions, one has to choose between latency (L) and consistency (C).
Another example of such a cloud service is Service Fabric. It
is often compared to Kubernetes which is also a container orchestration
framework that hosts applications. Kubernetes extends this idea of
app+container all the way where the host can be nodes of a cluster. Cluster
membership is maintained by requiring kubelets to maintain a connection with
the API server to send heartbeats every period. Service Fabric avoids this with
a decentralized ring and failure detection. It is motivated by the adage that
distributed consensus is at the heart of numerous co-ordination problems, but
it has a trivially simple solution if there is a reliable failure detection
service.
Consensus algorithms that help participating processes that
need co-ordination to agree on some data value that is needed during
computation are critical to providing these consistency levels, availability,
and performance. Algorithms like Paxos, Raft and Bayou have demonstrated their
use in a variety of production and mission critical systems. This article
surveys these approaches.
Paxos and Raft, both take a similar approach differing only
in their approach for leader election. Raft only allows servers with up-to-date
logs to become leaders, whereas Paxos allows any server to be leader if it then
updates its logs to ensure that it remains up-to-date. Raft is more efficient
since it does not involve log entries to be exchanged during leader elections.
Raft algorithm was proposed to address the long-standing
issues with understandability of the widely studied Paxos algorithm. It has a
clear abstraction and presentation and can be a simplified version of the
original Paxos algorithm. Specifically, Paxos divides terms between servers
whereas Raft allows a follower to become a candidate in any term, but followers
will vote for only one candidate per term. Paxos followers will vote for any
candidate, whereas Raft followers will only vote for a candidate if the
candidate’s log is at least as up to date. If a leader has uncommitted log
entries from a previous term, Paxos will replicate them in the current term
whereas Raft will replicate them in their original term. Raft’s leader election
is therefore lightweight when compared to Paxos.
No comments:
Post a Comment