Thursday, September 20, 2018

We were discussing that data can be allowed to age in one later before making its way to another. Let us consider now how to use this aging of objects. We transfer the objects from the cache to the storage when its updates have accumulated to one of the scheduled transfers. This interval could be very small or very large depending on the capability of the cache. By buffering the objects in the cache without writing it to object storage, we provide the ability to study the updates on the object. This might mean byte range updates to the object. If the objects are small, the updates don’t matter as the whole object can be overwritten. If the objects are large, keeping track of the byte range updates helps with the consolidation and re-organization of the updates. Why is this useful? The object storage takes the smallest unit of data storage as the object. It has no notion of what content that object has. Therefore, it sees the objects as byte ranges and keeps track of the updates to the objects in the form of byte ranges. By doing some of this early translation of overlapping byte range updates to non-overlapping updates, we make it easier for the object storage to persist the object. That is not all. How the object cache persists to the object storage may be entirely proprietary to the cache-based solution. Instead of relying on the versioning from the object storage, cache-based solution may propose to store incremental updates. The versioning had several drawbacks. It made unnecessary copies of every version and the whole object was copied. While it was simpler to get the whole object back, it was not necessary to make copies of the byte ranges that did not change. Moreover, with the repackaging of the objects, we now had the chance to perform deduplication of what we store. This allows us to reconstruct the object with what we store because typically not all of the contents change at once.

No comments:

Post a Comment