Cluster computing: Synchronization of state with remote (continued...)

Another mechanism to keep the state in sync across local and remote is the publisher-subscriber model. This model assumes that there is a master copy of the data, maintained by the publisher, and the updates can be bidirectional allowing the publisher to update the data for the subscribers and vice versa.

The publisher is responsible for determining which datasets have external access and when they are made available, they are called publications. Different scopes of datasets can be published to different subscribers, and they can be specified at runtime with the help of parameters. In such cases, subscribers map to different partitions of data. If the data overlaps, then the subscribers see a shared state. It is possible to have near real-time sharing of publications across subscribers on overlapped data with the help of versioning. Conflict resolution on updates for conflicting versions is easily resolved by the latest first strategy.

Common synchronization configurations also vary quite widely. Such a configuration refers to the arrangement of publisher and subscriber data. The publisher-subscriber model allows both peer-to-peer and hierarchical configurations. There are two hierarchical configurations that are quite popular. The first is the network aka tree topology and the second is the hub and spoke topology. Both configurations are useful for many subscribers. Unlike the hierarchical configuration, peer-to-peer configuration does not have a single authoritative data store. Further, the data updates do not make it to all the subscribers. Peer-to-peer configurations are best suited for fewer subscribers. Some of the challenges with peer-to-peer configurations include maintaining data integrity, implementing conflict detection and resolution, and programming synchronization logic. Generally, these are handled by messaging algorithms such as Paxos and with some concept of message sequencing or vector clock and gossip protocol.

Efficiency in data synchronization in these configurations and architectures comes from determining what data changes, how to scope it, and how to reduce the traffic associated with propagating the change. It is customary to have a synchronization layer on the client, a synchronization middleware on the server, and a network connection during the synchronization process that supports bidirectional updates. The basic synchronization process involves the initiation of synchronization – either on-demand or on a periodic basis, the preparation of data and its transmission to a server with authentication, the execution of the synchronization logic on the server-side to determine the updates and the transformations, the persistence of the changed data over a data adapter to one or more data stores, the detection and resolution of conflicts and finally the relaying of the results of the synchronization back to the client application.

Cluster computing

Tuesday, April 27, 2021

Synchronization of state with remote (continued...)

No comments:

Post a Comment