Tuesday, August 7, 2018

We were discussing Object Storage and file systems and the cloud-first strategy for newer workloads as well as migrating older workloads. However, not all workloads are suited for this cloud first strategy. We determine the suitability based on performance and cost-perspective. Ideally this is determined in the production environment. There are tools that can perform workload IO capture and playback. With this IO pattern replayed on a new storage system, the suitability of the cloud first strategy becomes clear.
We noted that workload patterns can change over time. There may be certain seasons where the peak load may occur annually.  Planning for the day to day load as well as the peak load therefore becomes important.  Workload profiling can be repeated year round so that the average and the maximum are known for effective planning and estimation.
Storage systems planners know their workload profiles. While deployers view applications, services and access control, storage planners see workload profiles and make their recommendations based exclusively on the IO, costs and performance.  In the object storage world, we have the luxury of comparision with file-systems. In a file-system, we have several layers each contributing to the overall I/O of data. On the other hand, a bucket is independent of the filesystem. As long as it is filesystem enabled, users can get the convenience of a file system as well as the object storage. Moreover, the user account accessing the bucket can also be setup. Only the IT can help determine the correct strategy for the workload because they can profile the workload.


One of the factors that escapes attention is the capacity planning when migrating workloads. It is true that object storage is more durable than file systems. But if there is one copy of data in a file system, there may be three copies in an object storage.  
The formula: 
                          X = 3 Y + metadata 
Where X = object storage in GB 
            Y = data on native file system in GB 
           And metadata = some usage that can be attributed to metadata. 
Holds true for replicated data in the object storage. 
In other words, planners may do well to include sufficient capacity in on-premise usage of Object Storage. Keeping multiple copies of files in sync across traditional storage systems was a challenge for workloads. This is not the case for object storage where the data may be replicated on multiple nodes. 
Similarly, while a file may be stored as an object, its metadata may be enhanced with custom attributes. This increases the size of the object and since there are more copies now, there may be an increase from all metadata. Custom attributes also imply that while the size contribution from the data might remain the same, the size of the overall metadata will increase with the addition of each heavy attribute.

No comments:

Post a Comment