Sunday, September 23, 2018

The cache layer can just as well utilize versioning if the updates were frequent and the objects were small and numerous. All the concerns with the persistence reside with the storage layer including the form and representation of the objects. Whether the objects are saved as repeated full byte-ranges of the objects or incremental updates to previous version is entirely the storage layers concern. On the other hand, the cache layer determines the schedule and the load of the updates. Therefore, it may choose to persist some objects more often than others.  The dynamic schedule is very helpful even to the customers workload because not all workloads can be satisfied by the same schedule. There are really two aspects to this dynamic schedule. First the cache layer determines which objects belong to which groups based on policies that can be evaluated on the nature of the workload. For example, heavy continuous writes require frequent persistence otherwise there is a chance that the updates might be lost. Light weight writes with heavy reads do not require frequent persistence otherwise the cached object will become invalidated more often than is necessary. Second, the cache layer decides between flush and backup operations. These operations are governed by the pools of treatments to objects that the cache layer maintains. The cache layer becomes smart merely by associating a group to a pool. While it may allow customizations to how the objects are mapped to groups, it reserves the administrative mapping of groups to pools of service levels. The pools have varying schedules for flush and backup operations. The flush to local disk and the backup from local disk to object storage are performed the same without any dependence on the object or the schedule. As long as the objects fall in one of the queues, it will be serviced.  The flush operation is largely unknown to the object storage since it is a convenience only for the cache layer to prevent data loss. The backup operation to object storage is however as streamlined as possible so that the data ingestion rate never goes down and the load can be met with adequate service level agreement. The cache layer provides the ability to fine tune the behavior at an object level and whether it stores segments or files before sending the object to the storage layer is its own concern. The two layers have mutually independent concerns but provide synergy in the form of a wider appeal of a durable store for all data generators. This cache layer could be internalized into the object storage but there is more benefit if they are separate. Unlike a gateway service that provide address resolution of an object to a specific site, a cache layer cannot be brought into the object storage because the name resolution is different from augmented read and write paths. If the object resides in the storage and is merely accessed via address resolution, there are no changes to the write on the object. On the other hand, the cache layer is heavy on the writes to the object storage and tries to be smart about sending the objects to the store. If this were to be part of the object storage, it would unnecessarily affect the writes of all objects. The object storage incurs cost to provide the kind of storage it does. It becomes harder for the object storage to provide the quality of service to individual workloads when it is performing distributed operations in a global store. The cache layer provides benefits outside the domain of the object storage assuming the latter is already the best for its domain. Therefore, the object cache and the object storage have to be separate layers and may be implemented with different designs



No comments:

Post a Comment