Wednesday, September 19, 2018

Introduction:
This article is an addition to the notion of a Cache Layer for Object Storage that caches objects and translates workloads into frequently backed up objects so that the changes are routinely persisted into the Object Storage. The notion that data can be allowed to age before making its way into the object storage is not a limiting factor. Object Storage just like file storage and especially when file-system enabled allows direct access for persistence anyways. The previous article referenced here merely pointed to the use cases where the reads and writes to objects are much more often that something shallower than an Object Storage will benefit immensely.
Therefore, this article merely looks at the notion of lazy replication. If we use the cache layer and regularly save the objects from the cache into the Object Storage, it is no different than using a local filesystem for persistence and then frequently backing it up into the Cloud. We have tools like duplicity that frequently backup a filesystem into object storage. Although they use archives for compaction but it is no different from copying the data from source to destination even if the source is a file system and the destination is an object store. The schedule of this copying can be made as frequent as necessary to ensure the propagation of all changes by a maximum time limit.
Let us now look at the replication within the Object Storage. After all, the replication is essentially copying objects across the sites within the storage. This copying was intended for the purposes of durability When we setup multiple sites within a replication group, the object get copied to these sites so that it remains durable against loss. This copying is almost immediate and very well handled within the put method of the S3 API that is used to upload objects into the object storage. Therefore, there is multizone update of the object in a single put command when the replication group spans sites.  When the object is uploaded, it may be saved in parts and all the book keeping regarding parts are also safeguarded for durability.  Both the object data and the parts location information are treated as logically representing the object. There are three copies of such a representation so that if one copy is lost, another can be used. In addition, erasure codes may allow the reconstruction of an object and so the copy operation may not necessarily be a straightforward byte range copy.
Lazy replication allows for copying beyond these durability semantics. It allows for copying on a scheduled basis by allowing the data to age. There may be many updates to the object between two copy operations and this is tolerated because there is no semantic difference between the objects as long as they are copied elsewhere. Said another way, this is the equivalent of chaining object stores so that the cache layer is an object storage in itself with direct access to the data persistence and the object storage behind it as the one receiving copies of the objects that are allowed to age. Since the copy operations occur on every time interval, there is little or no data loss between the primary and the secondary object storages. We just need a connector that transfers some or all objects in a bucket from a namespace to an altogether different bucket in a different namespace in possibly a different object storage. This may be similar to file sync operations between local and remote file system which also allows for offline work to happen. The difference between the file sync operation and a lazy replication is probably just the strategy. Replication as such has several strategies even from databases where logs are used to replay the same changes in a destination database. The choice of strategy and the frequency is not necessary for the discussion that objects can be copied across object storage.
When Object Storage are linked this way, it may be contrary to the notion that a single Object Storage represents a limitless storage with zero maintenance so that the object once saved will always be found avoiding the use of unnecessary copies. However, the performance impact of using an Object Storage directly as opposed to a local file system may affect certain workloads where it may be easier to stage the data prior to its saving in the Object Storage. Therefore this lazy replication may come in helpful to increase the use cases of the Object Storage.

No comments:

Post a Comment