Cluster computing

Tuesday, March 10, 2020

Streams and tier2 storage.

Streams are overlayed on tier 2 storage which includes S3. Each stream does not have to map one on one with a file or a blob for user data isolation. This is entirely handled by the stream storage. Let us take a look at the following instead

A stream may have an analytics jar. These jars can be published to a maven repository on a blob store.

For example:

publishing {

publications {

mavenJava(MavenPublication) {

from components.java

}

repositories {

maven {

url "s3://${repoBucketName}/releases"

credentials(AwsCredentials) {

accessKey awsCredentials.AWSAccessKeyId

secretKey awsCredentials.AWSSecretKey

}

Artifactory is already popular to all maven publishers as jcenter(). This approach is just to generalize that to S3 storage whether it is on-premises or in the cloud.

Taking this one step forward to generalize even publishers that can package user data into a tar ball or extract a stream to a blob are all similarly useful.

When a publisher packages a stream, it can ask the stream store to send all the segments of the stream over the http. A user script that makes this curl call can redirect the output to a file. The file then becomes included in another S3 call to upload it as a blob.

Such a publisher can combine more than one stream in an archive or package metadata with it. This publisher knows what is relevant to the user to pack her data into an archive. Another publisher could make this conversion of user data to blob without the user needing to know about any intermediary file.

Cluster computing

Tuesday, March 10, 2020

No comments:

Post a Comment