Another mechanism to keep the state
in sync across local and remote is the publisher-subscriber model. This model
assumes that there is a master copy of the data, maintained by the publisher,
and the updates can be bidirectional allowing the publisher to update the data
for the subscribers and vice versa.
The publisher is responsible for
determining which datasets have external access and when they are made available,
they are called publications. Different scopes of datasets can be published to
different subscribers, and they can be specified at runtime with the help of
parameters. In such cases, subscribers map to different partitions of data. If
the data overlaps, then the subscribers see a shared state. It is possible to
have near real-time sharing of publications across subscribers on overlapped
data with the help of versioning. Conflict resolution on updates for
conflicting versions is easily resolved by the latest first strategy.
Common synchronization
configurations also vary quite widely. Such a configuration refers to the
arrangement of publisher and subscriber data. The publisher-subscriber model
allows both peer-to-peer and hierarchical configurations. There are two
hierarchical configurations that are quite popular. The first is the network
aka tree topology and the second is the hub and spoke topology. Both
configurations are useful for many subscribers. Unlike the hierarchical
configuration, peer-to-peer configuration does not have a single authoritative
data store. Further, the data updates do not make it to all the subscribers.
Peer-to-peer configurations are best suited for fewer subscribers. Some of the
challenges with peer-to-peer configurations include maintaining data integrity,
implementing conflict detection and resolution, and programming synchronization
logic. Generally, these are handled by messaging algorithms such as Paxos and
with some concept of message sequencing or vector clock and gossip protocol.
Efficiency in data synchronization
in these configurations and architectures comes from determining what data
changes, how to scope it, and how to reduce the traffic associated with
propagating the change. It is customary
to have a synchronization layer on the client, a synchronization middleware on
the server, and a network connection during the synchronization process that
supports bidirectional updates. The basic synchronization process involves the
initiation of synchronization – either on-demand or on a periodic basis, the
preparation of data and its transmission to a server with authentication, the
execution of the synchronization logic on the server-side to determine the
updates and the transformations, the persistence of the changed data over a
data adapter to one or more data stores, the detection and resolution of
conflicts and finally the relaying of the results of the synchronization back
to the client application.
No comments:
Post a Comment