Sunday, April 16, 2023

 

Amazon Simple Storage Service or S3 for short is a limitless durable cloud storage service that can be used to store, backup and protect data in the form of blobs or files. There is a well-defined API to access the storage and many independent on-premises S3 storage solution providers largely conform to this protocol with suitable enhancements for their solutions. This allows high interoperability between command line and user interface tools for browsing the storage. Some of the well-known UI tools used to browse S3 storage on-premises or in the cloud are the S3 Browser for Windows and Commander One for Mac OS. Similarly, command-line tools used are s3cmd, azcopy and rclone. This article highlights some of the features of the command-line tools beginning with the common functionalities and proceeding to the differences.

All command line tools require authentication with the S3 storage prior to making the API requests. Each request is given a signature based on the credentials usually in the form of a pair of access key and secret. These credentials can encapsulate a variety of permissions to use with the namespaces, buckets and object hierarchy and issued by the owner of those storage containers. These tools also recognize the account and credentials issued by the cloud for use of their blob service as an alternative to storage account or container specific credentials. Since the create, update, delete and retrieve of the storage objects and the signing of request follow well-defined protocols, the credentials and their providers can be set as parameters to use with the tools. Some version incompatibilities might exist, but many object storage providers follow this pattern. Many of them also recognize shared access signatures (SAS) for accessing a specific container in this storage. While keys and secrets are recorded and saved for reuse, shared access signatures are not saved at the provider and can expire after a certain duration. Although they can grant access to the bearer and cannot be revoked once issued, they are usually not considered risky enough to warrant revocation mechanisms or enforcements. For this reason, policies are not put in place to discourage their use and cloud providers and security standards do not restrict their usage.

Between the command-line tools of s3cmd, azcopy and rclone, there is a lot of contention to which storage providers they support. S3cmd is favored for linux based storage appliances and azcopy works well with Microsoft’s Azure public cloud but rclone wins hands down in covering not only the widest possible selection but also the most advanced features than other tools. By virtue of differences between operating systems and the cloud providers playing to the strengths of those ecosystems and having incompatibilities between Azure and AWS storage protocols, azcopy can read from either cloud whereas s3cmd works with those conforming to Amazon S3 service. All storage operations with these tools are idempotent and retriable which plays well for robustness against provider failures, rate limits and network connectivity disruptions. While s3cmd and azcopy must invoke download of files and folders independent from the upload of the same, rclone can accomplish simultaneous download and upload in a single command invocation by specifying different data connections as command line arguments and in a streaming manner which is more performant and leaves less footprint. For example:

user@macbook localFolder % rclone copy –metadata –progress aws:s3-namespace azure:blob-account
Transferred: 215.273 MiB / 4.637 GiB, 5%, 197.226 KiB/s, ETA 6h32m14s

will stream from one to another leaving no trace in the local filesystem.

No comments:

Post a Comment