Wednesday, March 20, 2019

We were discussing the S3 API:

Virtually all storage providers in cloud and on-premise storage solutions support the S3 Application Programming Interface. With the help of this APIs, applications can send and receive data without having to worry about the storage best practice. They are also able to switch from one storage appliance to another, one on-premise cluster to another, one cloud provider to another and so on. The API was introduced by Amazon but has become the industry standard and accepted by many storage providers. Even competitors like Azure provide an S3 Proxy to allow applications to access their storage with this API.

S3 like any other cloud-based service has been developed with the Representation State Transfer (REST) best practice. However, it does not involve all the provisions of HTTP 2.0 (released) or 3.0 (in-progress). Neither does it provide a protocol like abstraction where layers can be added above or below in a storage stack. A networking stack on the other hand has dedicated protocols for each of its layers.  A storage stack may comprise of say, at the very least, an active workload versus a backup workload layer where the active remains as the higher layer and can support say HTTP and the backup remains as the lower layer and supports S3. Perhaps this distinction has been somewhat obfuscated where object storage can expand its capabilities to both layer.

The S3 API makes no endearment for developers on how object storage can be positioned as an object queue, an object cache, an object query, a gateway, a log index store, and many other such capabilities. API best practice enables automated monitoring, logging, auditing, and many more.

If a new storage class is added and the functionalities not at par with the regular S3 storage, then they would have an entirely new set of APIs and these would preferably have a prefix to differentiate the APIs from the rest.

If the storage stack is layered from the active regular s3 storage on the top to the less frequently used storage classes at a lower level than the regular and finally the glacier or least used data as the last layer, then aging of data alone is sufficient to migrate from the top layer all the way to the bottom without any involvement of API. That said, the API could provide visibility to the users on the contents of each storage class along with the additional functionality of direct placement of objects in those classes or their eviction. Since the nature of the storage class differentiates the api set and we decided to use prefix based api naming conventions to indicate the differentiation, each storage class adds a new set to the existing APIs. On the other hand, policies common to all three storage classes or the functionality that stripes across layers will be provided either with request attributes targeting that layer and its lower layers or with the help of parameters.

Functionalities such as deduplication, rsync for incremental backups, compression and management will require new APIs and these do not have to be limited to objects in any one storage class. APIs that automate the workflow of calling more than one APIs can also be written as a coarse granularity API. These wrapped APIs can collate functionalities for a single layer or across layers. They can also include automation not specific to the control or data path of the storage stack. Together the new functionality and wrapped APIs can become one whole set.

S3 API can become a protocol for all storage functionalities. They can be organized as a flat list of features or by resource path qualified functionalities where the resource path may pertain to storage classes. These API could also support discoverability. And these APIs could support nuances specific to file protocols and content addressability.

https://1drv.ms/w/s!Ashlm-Nw-wnWuT-u1f7DRjBRuvD4

No comments:

Post a Comment