Cluster computing

Friday, January 13, 2023

A few more considerations for using S3 over document store for basic storage operations.

The previous article introduced cost as the driving factor for leveraging simple storage aka S3. The document store has many features but is priced based on read and write capacity units. All those features may not be necessary for mere create, update and delete of an object. This results in significant savings even on low-end applications that typically have a monthly charge as follows:

API Gateway 0.04 USD

Cognito 10.00 USD

DynamoDB 75.02 USD

S3 2.07 USD

Lambda 0.00 USD

Web Application Firewall 8.00 USD

It is in this context that we strive to use S3 APIs for ordinary persistence.

The sample code below illustrates the use of Javascript SDK for making these operations:

const REGION = "us-west-2";

const s3 = new S3Client({

region: REGION,

credentials: fromCognitoIdentityPool({

client: new CognitoIdentityClient({ region: REGION }),

identityPoolId: "us-west-2:de827e1d-f9b6-4402-bd0e-c7bdce52d8c8",

}),

});

const docsBucketName = "mybucket";

export const getAllDocuments = async () => {

if (!client) {

await createAPIClient();

}

try {

const data = await s3.send(

new ListObjectsCommand({ Delimiter: "/", Bucket: docsBucketName })

);

console.log(JSON.stringify(data, null, 4));

var results = [];

if (typeof data != "undefined" && data.hasOwnProperty("Contents")) {

results = data.Contents.map(function(item,index) {

var identifier = item.Key + item.LastModified + item.Owner.ID;

return {

'FileSize' : item.Size,

'Name' : item.Key,

'Owner' : item.Owner.ID,

'DateUploaded' : item.LastModified,

'FileName' : item.Key,

'SK' : 'Doc#BVNA',

'PK' : identifier.hashCode().toString(),

'Thumbnail' : '/images/LoremIpsum.jpg'};

});

}

return results;

} catch (err) {

console.log("Error", err);

return [];

}

};

Unlike the document store that returns a unique identifier for every item stored, here we must make our own identifier. The file contents and the file attributes together can help make this identifier if we leverage basic cryptology functions such as md5. Also, unlike the document store there is no index. Tags and metadata are available for querying purposes and it is possible adjust just the tags for state management but it is even better to populate the operations on an uploaded object in a dedicated metadata object in the database.

Then, it is possible to query just the contents of that specific object with:

const S3 = require(‘aws-sdk/clients/s3’);

s3.selectObjectContent(params, (err, data) => {

if (err) {

// handle error

Return

}

const eventStream = data.Payload;

eventStream.on(‘data’, (event) => {

if (event.Records) {

// event.Records.Payload is a buffer containing

// a single record, partial records, or multiple records

process.stdout.write(event.Records.Payload.toString());

} else if (event.Stats) {

console.log(`Processed ${event.Stats.Details.BytesProcessed} bytes`);

} else if (event.End) {

console.log('SelectObjectContent completed');

}

});

// Handle errors encountered during the API call

eventStream.on('error', (err) => {

switch (err.name) {

// Check against specific error codes that need custom handling

}

});

eventStream.on('end', () => {

// Finished receiving events from S3

});

This mechanism is sufficient for low overhead persistence of objects in the cloud.

Cluster computing

Friday, January 13, 2023

No comments:

Post a Comment