Monday, August 24, 2015

In the world of cloud computing and cloud based object storage, how to use buckets ? Are they the same as folders ? How does an object storage differ from a file system ? 
Buckets cannot be simplified to folders although objects can be organized with folders. A bucket cannot have another bucket.  A folder can nest another folder.  A folder is a file system artifact.  A bucket is not. A bucket can have many different objects within it. A bucket is an object storage artifact. Let us see what it means. 
If we take the example of a folder called logs that has say log1.txt, log2.txt and log3.txt, then they exist in the bucket as three different objects with a prefix “logs/” and key names “logs/log1.txt”, “logs/log2.txt”, “logs/log3.txt”. The folder we see in our console is merely for logical organizational convenience within the bucket. It doesn’t have a data structure per se unlike the folder in a file system.  Think prefixes when working with an object store directly instead of folders. 
  
Objects can be moved from one folder to another but not from one bucket to another. If we are creating too many buckets, we are introducing artificial restrictions beforehand. It’s the same as partitioning a disk multiple times when a single undivided partition would do. Moreover, it would eliminate the need to copy or migrate data from one to others. A single seamless bucket is also better positioned to meet future needs than a dozen of them when all else remains the same. There is no loss of organization when there is one bucket or many. 
  
Buckets are wide and deep. They sprawl over several thousand storage devices and span datacenters. In the world of Openstack, they power file systems. For the example when users are using Ceph, they may have no idea that the data they are putting in a filesystem is actually getting stored in the cloud with RADOS object store.  The filesystem meets the users convenience and the habits. We organize based on files and folders and that habit is difficult to change. Moreover that habit serves us well across paradigms such as this. When using an object store directly, we should be mindful of the rich metadata associated with objects and metadata based access of objects. 
  
As with metadata, permissions are also fine grained and available at the object level as well as the folder level. Folders can be made public or private. In a public folder, all the objects that appear within a public folder are available for viewing or downloading to anyone on the internet. If you want to browse a folder that is made public, you will get access denied because the folder is just a naming prefix for an object or group of objects. It doesn’t exist.  A folder can be made public but it cannot be reverted to private. Instead you can individually mark the objects private within a public folder. Buckets have access permissions and versioning status and you can specify which region a bucket should reside in. Logging, versioning and event subscriptions can be turned on for a bucket.  Cost allocation tags (key value pairs) can be assigned at the bucket level. You could create many buckets but you should exhaust the possibilities with something more granular than a bucket. For example, you could use naming conventions for objects.  
Many different objects can exist within a bucket. They can be queried with autocomplete like jump feature of bucket browsing instead of paging through millions of items. Moreover, metadata attributes are available that can be used to label the objects in different ways.  
  
An object store was designed with one of the founding principles to reduce the storage admin’s task of provisioning and maintaining storage.  Anytime we are increasing those tasks, we are not doing something right.  
Some more resources on object storage include the following: 
The caveat is that even AWS documentation doesn't explain the usage and goes so far as suggesting that buckets be created homogeneous when in fact they are designed to hold heterogeneous content. Their blogs on the other hand are far more revealing 
  
#codingexercise
Node GetMin(Node root)
{
Node cur = root;
while (cur && cur.left)
    cur = cur.left;
return cur;

No comments:

Post a Comment