Cluster computing

Wednesday, August 29, 2018

We discussed that a gateway is supposed to distribute the traffic.Ut works exceptionally well when it routes request to on-premise or cloud object stores. The on-premise helps with closer access of data. The same concept may apply to geographical distribution If similar content where each object storage serves a specific region. In this case replication may need to be set-up between different object storage. we could leverage an object storage replication group to do automatic replication. It might be considered a bottleneck if the same object storage is used. This is different from redirecting requests to separate servers/caches. However, shared services may offer at par service level agreement as an individual service for requests. Since a gateway will not see a performance degradation when sending to a proxy server or a shared dedicated store, it works in both these cases. Replacing a shared dedicated store with a shared dedicated storage such as an Object Storage is therefore also a practical option. Moreover, a cache generally improves performance over what might have been incurred in going to the backend. That is why different proxy servers behind a gateway could maintain their own cache. A dedicated cache service like AppFabric may also be sufficient to handle requests. In this case, we are consolidating proxy server caches with a dedicated cache.
There is a tradeoff when we address gateway logic, replication logic, and storage server logic independently. While it is modular to visualize each layer as a separation of concerns, there is no necessity to house them in different products. Moreover they can be viewed as storage server logic and this can be moved into the storage server. The tradeoff is that when these layers are consolidated, they do not facilitate testing. Moreover they become more dedicated towards the storage and leave the onus on the owner to make copies of the content as necessary for the geographical regions. However, we argued that the storage and replication are handled well within object storage and what was missing was just the gateway feature. This gateway feature can be made extensible but it would be sufficient to enable the user to store once and have the same content made available from each geographical region and the request routed to the nearest geographical region. Further the address translation need not be made specific to region, they can be made granular to objects. If we take an example of the url from an object storage for the exposed endpoint of an object over http, it usually has a namespace, bucket and object name as hierarchy. This is the only input from the user. This component does not change. However, the gateway rules previously translated the server address but now they can translate the object naming hierarchy to the nearest site.

Tuesday, August 28, 2018

We discussed that a gateway is supposed to distribute the traffic. If it sends it to the same single point of contention, it is not very useful When requests are served from separate caches, the performance generally improves over what might have been incurred in going to the backend. That is why different proxy servers behind a gateway could maintain their own cache. A dedicated cache service like AppFabric may also be sufficient to handle requests. In this case, we are consolidating proxy server caches with a dedicated cache. This does not necessarily mean a single point of contention. Shared services may offer at par service level agreement as an individual service for requests. Since a gateway will not see a performance degradation when sending to a proxy server or a shared dedicated cache, it works in both these cases. Replacing a shared dedicated cache with a shared dedicated storage such as an Object Storage is therefore also a practical option.
While gateway route requests, they could be replaced with a networking layer that enables a P2P network of different object storage which could be on-premise or in the cloud. A distributed hash table in this case determines the store to go to. The location information for the data objects is deterministic as the peers are chosen with identifiers corresponding to the data object's unique key. Content therefore goes to specified locations that makes subsequent requests easier. Unstructured P2P is composed of peers joining based on some rules and usually without any knowledge of the topology. In this case the query is broadcast and peers that have matching content return the data to the originating peer. This is useful for highly replicated items. P2P provides a good base for large scale data sharing. Some of the desirable features of P2P networks include selection of peers, redundant storage, efficient location, hierarchical namespaces, authentication as well as anonymity of users. In terms of performance, the P2P has desirable properties such as efficient routing, self-organizing, massively scalable and robust in deployments, fault tolerance, load balancing and explicit notions of locality. Perhaps the biggest takeaway is that the P2P is an overlay network with no restriction on size and there are two classes structured and unstructured. Structured P2P means that the network topology is tightly controlled and the content is placed on random peers and at specified location which will make subsequent requests more efficient.

Monday, August 27, 2018

We were discussing anecdotal quotes from industry experts on gateway for object storage.
They cited gateways for object storage as provided by public cloud providers. This is a convenience for using on - premise and cloud storage. which shows that there is value in this proposition. In addition, our approach is novel in using it for Content Distribution Network and by proposing it to be built into the object storage as a service.
Some experts argued that gateway is practical only for small and medium businesses which are small scale in requirements. This means that they are stretched on large scale and object storage deployments are not necessarily restricted in size. These experts argued that the problem with gateway is that it adds more complexity and limits performance.
When gateways solve problems where data does not have to move, they are very appealing to many usages across the companies that use cloud providers. There have been several vendors in their race to find this niche. In our case, the http references to use copies of objects versus the same object is a way to do just that. With object storage not requiring any maintenance or administration and providing ability to store as much content as necessary, this gateway service becomes useful for content distribution network purposes.
Some experts commented that public cloud storage gateways are able to mirror volume to a cloud but they are still just building blocks in the cloud. They do not scale capacity or share data to multiple locations This is exactly what we try to do with a gateway from object storage.
A gateway is supposed to distribute the traffic. If it sends it to the same single point of contention, it is not very useful When requests are served from separate caches, the performance generally improves over what might have been incurred in going to the backend. That is why different proxy servers behind a gateway could maintain their own cache. A dedicated cache service like AppFabric may also be sufficient to handle requests. In this case, we are consolidating proxy server caches with a dedicated cache. This does not necessarily mean a single point of contention. Shared services may offer at par service level agreement as an individual service for requests. Since a gateway will not see a performance degradation when sending to a proxy server or a shared dedicated cache, it works in both these cases. Replacing a shared dedicated cache with a shared dedicated storage such as an Object Storage is therefore also a practical option.
#codingexercise
print all the combinations of a string in sorted order
void PrintSortedCombinations(String a)
{
a.Sort();
PrintCombinations(a);
// uses the Combine() method implemented earlier
}

Sunday, August 26, 2018

Anecdotal quotes from industry on gateway for object storage.
We know gateways for object storage is provided as a convenience by public cloud providers. Therefore, there is value in that proposition. In addition, we are also having a novel approach in using it for Content Distribution Network and by proposing it to be built into the object storage as a service. Today we use anecdotal quotes from industry in this regard.
They mention that gateways help connect systems that would otherwise require a lot of code to wire the APIs for data flow. This arduous task of rewriting applications to support web interfaces applies to those who are wanting to migrate to different object storage stacks. It does not really apply in our case.
Some experts argued that gateway is practical only for small and medium businesses which are small scale in requirements. This means that they are stretched on large scale and object storage deployments are not necessarily restricted in size. They are a true cloud storage. These experts point out that object storage is best for backup and archiving. Tools like duplicity use S3 apis to persist in object storage and in this case we are including any workflow for backup and archiving. These workflows do not require modifications of data objects and this makes object storage perfect for them. These experts argued that the problem with gateway is that it adds more complexity and limits performance. It is not used with primary storage applications which are more read and write intensive and do not tolerate latency. Some even argued that the gateway is diminished in significance when the object storage itself is considered raw.
On the other hand, other experts argued that gateways give predictable performance between on-premises infrastructure and a public cloud storage provider. They offer easy integration into existing infrastructure and they offer ability to integrate on a storage protocol by protocol basis. This may be true for cloud gateways in general but our emphasis was on virtual http endpoints within a single object storage.
When gateways solve problems where data does not have to move, they are very appealing to many usages across the companies that use cloud providers. There have been several vendors in their race to find this niche. In our case, the http references to use copies of objects versus the same object is a way to do just that. With object storage not requiring any maintenance or administration and providing ability to store as much content as necessary, this gateway service becomes useful for content distribution network purposes.
Some experts commented that public cloud storage gateways are able to mirror volume to a cloud but they are still just building blocks in the cloud. They do not scale capacity or share data to multiple locations This is exactly what we try to do with a gateway from object storage.

Saturday, August 25, 2018

We said we could combine gateway and http proxy services within the object storage for the site specific http addresses of objects. The gateway also acts as a http proxy. Any implementation of gateway has to maintain a registry of destination addresses. As http access enabled objects proliferate with their geo-replications, this registry becomes granular at the object level while enabling rules to determine the site from which they need to be accessed. Finally they gather statistics in terms of access and metrics which come very useful for understanding the http accesses of specific content within the object storage.

Both the above functionalities can be elaborate allowing gateway service to provide immense benefit per deployment.

The advantages of an http proxy include aggregations of usages. In terms of success and failure, there can be detailed count of calls. Moreover, the proxy could include all the features of a conventional http service like Mashery such as Client based caller information, destination-based statistics, per object statistics, categorization by cause and many other features along with a RESTful api service for the features gathered.

Friday, August 24, 2018

We were saying there are advantages to writing Gateway Service within Object Storage. These included:

First, the address mapping is not at site level. It is at object level.

Second, the address of the object – both universal as well as site specific are maintained along with the object as part of its location information

Third, instead of internalizing a table of rules from the external gateway, a lookup service can translate universal object address to the address of the nearest object. This service is part of the object storage as a read only query. Since object name and address is already an existing functionality, we only add the ability to translate universal address to site specific address at the object level.

Fourth, the gateway functionality exists as a microservice. It can do more than static lookup of physical location of an object given a universal address instead of the site-specific address. It has the ability to generate tiny urls for the objects based on hashing. This adds aliases to the address as opposed to the conventional domain-based address. The hashing is at the object level and since we can store billions of objects in the object storage, a url shortening feature is a significant offering from the gateway service within the object storage. It has the potential to morph into other services than a mere translator of object addresses. Design of a url hashing service was covered earlier as follows.

Fifth, the conventional gateway functionality of load balancing can also be handled with an elastic scale-out of just the gateway service within the object storage.

Sixth, this gateway can also improve access to the object by making more copies of the object elsewhere and adding the superfluous mapping for the duration of the traffic. It need not even interpret the originating ip addresses to determine the volume as long as it can keep track of the number of read requests against existing address of the same object.

In addition, this gateway service within  object storage may be written in a form that allows rules to be customized.  Moreover rules need not be written in the form of declarative configuration. They can be dynamic in the form of a module. As a forwarder, a gateway may leverage rules that are determined by the deployment. Expressions for rules may include features that can be borrowed from IPSec rules. These are well-known rules that govern whether a connection over the Internet may be permitted into a domain.

With the help of a classifier, these rules may even be evaluated dynamically.

The gateway also acts as a http proxy. Any implementation of gateway has to maintain a registry of destination addresses. As http access enabled objects proliferate with their geo-replications, this registry becomes granular at the object level while enabling rules to determine the site from which they need to be accessed. Finally they gather statistics in terms of access and metrics which come very useful for understanding the http accesses of specific content within the object storage.

Both the above functionalities can be elaborate allowing gateway service to provide immense benefit per deployment.

Thursday, August 23, 2018

We were discussing gateway like functionality from object storage. While a gateway maintains address mapping for several servers where routes translate to physical destination based on say regex, here we give the ability to each object to records its virtual canonical address along with its physical location so that each object and its geographically replicated copies may be addressed specifically. When an object is accessed by its address, the gateway used to forward the request to the concerned site based on a set of static rules say at the web server and usually based on regex. Instead with the gateway functionality now merged into the object storage, there are a few advantages that come our way:

First, the address mapping is not at site level. It is at object level.

Second, the address of the object – both universal as well as site specific are maintained along with the object as part of its location information

Fifth, the conventional gateway functionality of load balancing can also be handled with an elastic scale-out of just the gateway service within the object storage.

These advantages can improve the usability of the objects and their copies by providing as many as needed along with a scalable service that can translate incoming universal address of objects to site specific location information.