Cluster computing

Thursday, January 12, 2023

A motivation to use S3 over document store:

Cost is one of the main drivers for the choice of cloud technologies. Unfortunately, programmability and functionality are developer’s motivations. For example, a document store like dynamo db is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. It might be the convenience choice for schema-less storage, a table representation and for its frequent usage with an in-memory cache for low latency. But the operations taken on the resource stored in the table must be plain and simple create, update, get and delete of the resource. On the other hand, in terms of storage of such objects, a web accessible store like S3, is sufficient.

When we calculate the cost of a small sized application, the monthly charges might appear something like this:

API Gateway 0.04 USD

Cognito 10.00 USD

DynamoDB 75.02 USD

S3 2.07 USD

Lambda 0.00 USD

Web Application Firewall 8.00 USD

In this case, the justification to use S3 is clear from the cost savings for the said low-overhead resources for whom only cloud persistence is necessary.

It is in this context, that application modernization has the potential to driven costs by moving certain persistence to S3 instead of DynamoDB. The only consideration is the inevitability to use a new and improved feature on S3 called Amazon S3 Select to realize these cost savings. The bookkeeping operations on the other objects can be achieved by querying a ledger object that makes progressive updates without deleting earlier entries.

Using Amazon S3 Select, we can query for a subset of data from an S3 object by using Simple SQL expressions. The selectObjectContent API in the AWS SDK for JavaScript is used for this purpose.

Let us use a CSV file named target-file.csv as the key, that’s uploaded to an S3 object in the bucket named my-bucket in the us-west-2 region. This csv contains entries with username, age attributes. If we were to select users with an age greater than 20, the SQL query would appear as

SELECT username FROM S3Object WHERE cast(age as int) > 20

With Javascript SDK, we write this as:

const S3 = require(‘aws-sdk/clients/s3’);

s3.selectObjectContent(params, (err, data) => {

if (err) {

// handle error

Return

}

const eventStream = data.Payload;

eventStream.on(‘data’, (event) => {

if (event.Records) {

// event.Records.Payload is a buffer containing

// a single record, partial records, or multiple records

process.stdout.write(event.Records.Payload.toString());

} else if (event.Stats) {

console.log(`Processed ${event.Stats.Details.BytesProcessed} bytes`);

} else if (event.End) {

console.log('SelectObjectContent completed');

}

});

// Handle errors encountered during the API call

eventStream.on('error', (err) => {

switch (err.name) {

// Check against specific error codes that need custom handling

}

});

eventStream.on('end', () => {

// Finished receiving events from S3

});

Cluster computing

Thursday, January 12, 2023

No comments:

Post a Comment