Cluster computing

Tuesday, January 28, 2014

I'm going to pick up a book on Splunk next but I will complete the discussion on AWS and Amazon S3 (simple storage service) first. We talked about usage patterns, performance, durability and availability. We will look at Cost Model, Scalability and Elasticity, Interfaces and AntiPatterns next. Then we will review Amazon Glacier.
S3 supports virtually an unlimited number of files in a directory. Unlike a disk drive that restricts the size of the data before partitioning, S3 can store unlimited number of bytes. Objects are stored in a single bucket, and S3 will scale and distribute redundant copies of the information.
In terms of interfaces, standard REST and SOAP based interfaces are provided. These support both management and data operations. Objects are stored in buckets. Objects have unique key. Each object is web based and rather than a file system but has a file system like hierarchy.
There's an SDK available over these interfaces that are more popular in most programming languages.
Where S3 is not the right choice include the following: S3 is not a standalone filesystem
It cannot be queried to retrieve a specific object unless you know the bucket name and key.
It doesn't support rapidly changing data and its also not a backup or archival storage. While it is ideal for websites, it is used to store the static content with the dynamic content stored on EC2.
Amazon Glacier is an extremely low cost storage service for backup and archival. Customers can reliably store their data for as little as 1 cent per gigabyte per month. You store data in Amazon glacier as archives. Archives are limited to 4 TB but there is no limit on their number.

Cluster computing

Tuesday, January 28, 2014

No comments:

Post a Comment