The preceding post discussed
the Bayou writes and the write stability and commitment. This article discusses
a conflict resolution system with an inspiration from the Bayou model.
The service we look under the hood is Cosmos DB
whose update propagation, conflict resolution and causality tracking have a
frame of reference in Bayou system. Some transformations were applied because
Cosmos DB is massively scalable and comes with resource governance, multiple
consistency levels, and stringent and comprehensive SLAs.
Cosmos DB makes use of a partition set that is
distributed across multiple regions and allows multi-region writes. A
replication protocol replicates data among the physical partitions comprising a
given partition-set. Each physical partition accepts writes and serves reads
from clients typically co-located in the same region. Writes accepted by a
physical partition within a region are durably committed and made highly
available before they are acknowledged to the client. The set of writes is
exchanged with other partitions within the partition-set and resolved based on
the virtual time. Clients can request either tentative or committed writes by
passing a request header. This exchange is dynamic and based on the topology of
the partition-set, regional proximity of the physical partitions and the
configured consistency levels. The commit scheme used by Cosmos DB is the
primary commit. Knowledge of which writes have committed, and in which order
they were committed then propagates to the other partitions. The primary
behaves like any other partition in all other aspects. Different replicated
data collections can have a different partition designated as primary. The
primary is dynamic and integral part of the reconfiguration of the partition-set
based on the topology of the overlay. The committed writes including multi-row
or batched updates are guaranteed to be ordered by virtue of the vector clock.
Vector clocks are used to encode virtual time and
region ID during each exchange for consensus with the replica set and partition
set respectively. This helps with causality tracking and version vectors help
resolve the update conflict because the partition with the timestamp that is
lowest among all the available clocks can execute a stable write. With the help
of version vectors, the update conflicts are resolved while the topology and
peer selection algorithm are designed to ensure fixed and minimal storage and
minimal network overhead of version vectors. This algorithm guarantees the
strict convergence property.
The Azure Cosmos DB databases configured with
multiple write regions allows interchangeable conflict resolution policies for
the clients and these include ‘Last-Write-Wins’ and application defined ‘Custom’
conflict resolution policies. The
Last-Write-Wins allows customization of any numerical property to override the
system-defined timestamp property-based clock protocol. The Custom conflict
resolution policy allows application defined merge procedure as part of the commitment
protocol with the guarantee that these procedures are executed exactly once per
write-write conflict. All these procedures take the same input parameters of
‘incomingItem’ for the write that generates the conflicts, the
’existingitem’for the currently persisted item, ‘isTombstone’ to indicate if
the incomingItem conflicts with a previously deleted item and ‘conflictingItems’
that includes the list of persisted versions of all items that are conflicting
with the incomingItem. A simple merge
conflict procedure could simply let deleted item to win, existing item to win, existing
conflicting item to win or incoming item to win in that order.
Reference: https://medium.com/paxos-algorithm/paxos-algorithm-71b76aaeb6b
No comments:
Post a Comment