Cluster computing

Saturday, August 18, 2018

Web assets as a software update:

Contents

Introduction:
Any application with a web interface requires the usage of resources in the form of markup, stylesheets and scripts. Although they may represent code for the interaction with the end user, they don’t necessarily have to be maintained on the server side and treated the same way as server-side code. This document argues for using an update service for any code that is not maintained on the server side. The update service automatically downloads and installs the latest update to the code on a device or a relay server by a pull mechanism rather than the conventional pipeline-based push mechanism. Furthermore, the source for the update service may be an object storage and preferably via distributors like Artifactory or Content Distribution Network.

Description:
Content Distribution Network are widely popular to make web application assets available to a web page regardless of whether it is hosted on the mobile, desktop or software as a service. They serve many purposes but primarily function as a set of proxy servers distributed over geographical locations such that the web page may readily find them and download them at high speed regardless of when, where and how the web page is displayed. Update service on the other hand is generally a feature of any software platform such that tenants can download the latest update from their publisher. The server has been a yet another model where there is a single source code from a single point of origin and usually gated over a pipeline and every consuming device or application points to this server via web redirects. These three software publishing conventions make no restrictions over the size or granularity of individual releases and generally they are determined based on what can be achieved within a timeline. Since the most recent update is guaranteed to work compatible with previous versions of host or device ecosystem and updates are mostly forward progressive, there is very little testing or requirement to ensure that new releases mix and match on a particular host works well. Moreover, a number of request responses are already being made to load a web page. Therefore, there is no necessity to make these downloads or responses to be a minimum size. This brings us to a point where we view assets not as a bundle but as something discrete that can be versioned and made available over the web. The rules for publishing assets to a set of proxy servers are similar to the rules for releasing code to a virtual server. This works very well for asset that is viewed as files or objects. However, even archives are candidate for being versioned and uploaded via multi-part upload. Typically, proxy servers have local storage while object storage unifies the storage and exposes a single endpoint for the object. This would mean replicating the object over multiple geographical zones from the same object storage. Regardless of the topology of the storage where the assets are made available, the update service can rotate through one or more providers for downloading it to the device. Typically a gateway service takes care of accessing the object storage in this case.

Conclusion:
Software may be viewed both in terms of server-side logic and client updated assets. The granularity of releases for both can be fine grained and independently verified. The distribution may be finely balanced so that the physical representation of what makes an application, is much more modular and automatic for every consumer.

Friday, August 17, 2018

We look at a particular usage of Object Storage as Content Distribution Network (CDN). The latter is merely a collection of proxy servers. Typically, proxy servers have local storage while object storage unifies the storage and exposes a single endpoint for the object. This would mean replicating the object over multiple geographical zones from the same object storage. Regardless of the topology of the storage where the assets are made available, any service requiring to use the content can rotate through one or more CDNs for downloading it to the device.
Typically a CDN is enabled over object storage using a gateway. A Rados gateway for example enables content to be served from a distributed object storage. In order to read an object, a rados gateway will create a cluster handle and then connect to a cluster. Then it opens an IO context and reads the data from the object following which it closes the context and the handle. This gateway is implemented in the form of a proxy FastCGI module and can be used with any web server that supports such module.
The use of a gateway facilitates server-load-balancing, request routing and content services. It can be offloaded to hardware. Gateways may perform web switching or content switching They need not be real web servers and can route traffic to other web servers. They may be augmented to monitor servers in order to change forwarding rules. Rules may be made simpler with good addressing instead of using lookup. Also, a gateway is generally assigned a single virtual IP address.
Moreover, not all the request need to reach the object storage. In some cases web caches may be used. A gateway can forward a request to a web cache just the same way as it forwards a request to a web server. The benefits of using a web cache including saving bandwidth, reducing server load, and improving request-response time. If a dedicated content store is required, typically the caching and server are encapsulated into a content server. This is quite the opposite paradigm of using object storage and replicated objects. to directly serve the content from the store. The distinction here is that there are two layers of functions - The first layer is the gateway layer that solves distribution using techniques such as caching, asset copying and load balancers. The second layer is the compute and storage bundling in the form of a server or a store with shifting emphasis on code and storage.
The two layers need to adhere to the end to end principle which is best to do with a DiffServ paradigm

Thursday, August 16, 2018

We were discussing the suitability of Object Storage to various workloads. Specifically, we discussed its role in Artifactory which is used to store binary objects from CI/CD. A large number of binary objects or files gets generated with each run of the build. These files are mere build artifacts and the most common usage of these files is download. Since the hosted solution is cloud based, Artifactory users demands elasticity, durability and http access. Object Storage is best suited to meet these demands. The emphasis here is the distinction over a file-system exposed over the web for their primary use case scenario. In fact, the entire DevOps process with its CI/CD servers can use a common Object Storage instance so that there is little if any copying of files from one staging directory to another. The Object Storage not only becomes the final single instance destination but also avoids significant inefficiencies in the DevOps processes. Moreover, builds are repeated through development, testing and production so the same solution works very well for repetitions. This is not just a single use case but an indication that there are many cycles within the DevOps process that could benefit from a storage tier as Object Storage. Static content like binary images of executable are generally copy over write. Versioning for same named files is a feature of Object Storage. Object Storage can not only use file exports but also provide automatic versioning of content. It becomes a content library for binary artifacts of build in all the features demanded over a file system such as versioning. Previous versions may be retained for as long as the life-cycle rules allow. These rules can be specified for the objects. It can also provide time limited access to content. The URI exposed for the object can be shared with anyone. The object may be downloaded on any device anywhere. It enables multi-part upload (mpu) of large objects. This is considered a significant improvement for large binary objects since it enables binary transfer in parts. There are three steps - an mpu upload is requested, different parts are uploaded, and finally an mpu complete is requested. The object storage constructs the object from the parts and then it can be accessed just the same as any other object. Each part is identified and they can number in hundreds. The part upload request includes a part number. The object storage returns a tag header for each part. The header and part number must be included in subsequent requests. The parts can be sent to object storage in any order. A complete request or an abort request must be sent to finalize the parts and permit Object storage to start reconstruction of the object and removing the parts. Parts uploaded so far can be listed. If the listed parts are more than 1000, a series of such list requests need to be sent. Multi-part uploads can be concurrent.If a part is sent again, it will update the already uploaded part. All the parts are used for reconstruction of the original object only after the complete request is received.

Wednesday, August 15, 2018

We were discussing the suitability of Object Storage to various workloads and the programmability convenience that enables migration of old and new workloads. We discussed the use of UI as well as SDK for ingesting data.

Let us now consider the usage of object storage for powering web applications. Static resources and files for web application can be served directly out of object storage. There are many web applications that require to serve a portion of the file system over the web due to a large number of artifacts. These are ideal for Object Storage. Consider Artifactory which is a leading hosted solution for all things binary. It is a perfect match for code repositories and aids CI/CD. A large number of binary objects or files gets generated with each run of the build. These files are mere build artifacts and the most common usage of these files is download. Since the hosted solution is cloud based, Artifactory demands elasticity, durability and http access. These are just some of the things that Object Storage provides a suitable platform for. The emphasis here is the suitability of Object Storage over a filesystem for the primary use case scenario. In fact, the entire DevOps process with its CI/CD servers can use a common Object Storage instance so that there is little if any copying of files from one staging directory to another. The Object Storage not only becomes the final single instance destination but also avoids significant inefficiencies in the DevOps processes. Moreover, builds are repeated through development, testing and production so the same solution works very well for repetitions. This is not just a single use case but an indication that there are many cycles within the DevOps process that could benefit from a storage tier as Object Storage.
boolean isDivisibleBy35(uint n)
{
return isDivisbleBy5(n) && isDivisibleBy7(n);
}

Tuesday, August 14, 2018

We were discussing the suitability of Object Storage to various workloads and the programmability convenience that enables migration of old and new workloads. In particular, we discussed connectors for various data sources and their bidirectional data transfer. Duplicity is a command line tool that is an example of a connector tool but we were discussing availability of an SDK with the object storage. Writing the connectors for each data source is very much like an input-output model. The data is either from the external source to an object storage or from object storage to external source. In each of these directions a connector only changes for the type of external source. Otherwise the object storage facing part of the connector is already implemented in the form of S3 Apis for read and write. The APIs varies only for the data source as available from the data source. This makes it easy to write the connector as an amalgam of source facing API for bidirectional data transfer to Object-Storage facing S3 Apis. A read from the external data source is written to Object storage with s3 put api and a write to the external data destination has data coming from Object storage with a read using S3 get apis. Since each connector varies by the type of external data platform, they can be written one per data platform so that it is easier to use with that data platform. Also, SDKs facilitate development by providing language based convenience. Therefore, the same connector sdk may be offered in more than one language.

SDKs may be offered in any language for the convenience of writing data transfer in any environment. It just does not stop there. UI widens the audience for the same purposes and brings in administrators and systems engineering without the need for writing scripts or code. ETL for example is a very popular usage of designer tools with drag and drop logic facilitating wiring and transfer of data. SDK may power the UI as well and both can be adapted to the data source, environment and tasks.

#codingexercise

bool isDivisibleBy55(uint n)

{

return isDivisibleBy5(n) &&isDivisibleBy11(n);

}

bool isDivisibleBy77(uint n)

{

return isDivisibleBy7(n) &&isDivisibleBy11(n);

}

Monday, August 13, 2018

We were discussing the suitability of Object Storage to various workloads
We said that the connectors for these data sources are not offered out of object storage products but they could immensely benefit data ingestion. S3 Api deals exclusively with the namespace, buckets and objects even when the Apis are made available as part of SDK but something more is needed for the connectors.
Writing the connectors for each data source is very much like an input-output model. The data is either from the external source to an object storage or from object storage to external source. In each of these directions a connector only changes for the type of external source. Otherwise the object storage facing part of the connector is already implemented in the form of S3 Apis for read and write. The APIs varies only for the data source as available from the data source. This makes it easy to write the connector as an amalgam of source facing API for bidirectional data transfer to Object-Storage facing S3 Apis. A read from the external data source is written to Object storage with s3 put api and a write to the external data destination has data coming from Object storage with a read using S3 get apis. Since each connector varies by the type of external data platform, they can be written one per data platform so that it is easier to use with that data platform. Also, SDKs facilitate development by providing language based convenience. Therefore, the same connector sdk may be offered in more than one language.
The connectors are just an example of programmability convenience of data ingestion from different workloads. Specifying metadata for the objects and showing sample queries on object storage as part of sdk is another convenience for the developers using Object Storage. Well written examples in the sdk and documentation for easing search and analytics associated with Object Storage will tremendously help the advocacy of Object Storage in different software stacks and offerings. Moreover, it will be helpful to log all activities of the sdk for data and queries so that these can make its way to a log store for convenience with audit and log analysis. The usage of sdk to improve automatic tagging and logging is a powerful technique to improve usability and maintaining history.
#codingexercise
boolean isDivisibleBy22(uint n){
return isDivisibleBy2(n) && is DivisibleBy11(n);
}
boolean isDivisibleBy33(uint n) {
return isDivisibleBy3(n) && isDivisibleBy11(n);
}

Sunday, August 12, 2018

We were discussing the suitability of Object Storage to various workloads after having discussed its advantages and its position as a perfect storage tier:
The data sources can include:
Backup and restore workflows
Data warehouse ETL loads
Log stores and indexes
Multimedia libraries
Other file systems
Relational database connections
NoSQL databases
Graph databases
All upstream storage appliances excluding aging tiers.
Notice that the connectors for these data sources are not offered out of object storage. In reality, S3 Api deals exclusively with the namespace, buckets and objects even when the Apis are made available as part of SDK.
Writing the connectors for each data source is very much like an input-output model. The data is either from the external source to an object storage or from object storage to external source. In each of these directions a connector only changes for the type of external source. Otherwise the object storage facing part of the connector is already implemented in the form of S3 Apis for read and write. The APIs varies only for the data source as available from the data source. This makes it easy to write the connector as an amalgam of source facing API for bidirectional data transfer to Object-Storage facing S3 Apis. A read from the external data source is written to Object storage with s3 put api and a write to the external data destination has data coming from Object storage with a read using S3 get apis. Since each connector varies by the type of external data platform, they can be written one per data platform so that it is easier to use with that data platform. Also, SDKs facilitate development by providing language based convenience. Therefore, the same connector sdk may be offered in more than one language.

#codingexercise
bool isDivisibleBy14 (n) {
return isDivisibleBy (2) && isDivisibleBy(7);
}