Saturday, September 1, 2018

Distributed Gateways
This answers the question that if the gateway is a service within the object storage, can gateways be chained across object storage. Along the lines of the previous question, if the current object storage does not resolve the address for an object located in its storage pools, is it possible to distribute the query to another object storage. These kinds of questions imply that the resolver merely needs to forward the queries that it cannot answer to a default pre-registered outbound destination. In a distributed gateway, the queries can make sense simply out of the namespace-bucket-object hierarchy and say if a request belongs to it or not. If it does not, it simply forwards it to another object storage. This is somewhat different from the original notion that the address is something opaque to the user and does not have any interpretable part that can determine the site to which the object belongs.  The linked object storage does not even need to take time to search for an object within its store to see if it exists. It can merely translate the address to know if it belongs to it with the help of a registry. This shallow lookup means a request can be forwarded faster to another linked object storage and ultimately to where it may be guaranteed to be found. The Linked Storage has no criteria for the object store to be similar and as long as the forwarding logic is enabled, any implementation can exist in each of the storage for translation, lookup and return. This could have been completely mitigated if the opaque addresses were hashes and the destination object storage was determined based on a hash table.  Whether we use routing tables or a static hash table, the networking over the object storage can be its own layer facilitating request resolution at different object storage.

Friday, August 31, 2018

The gateway as a classifier.
The rules of a gateway need not mere regex translation of incoming address to another site-specific address. We are dealing with objects an all part of the object endpoint address such as the hierarchical namespace – bucket – object may be translated to another all-together different address but pointing to the same copy of the object. For that matter hashes of web addresses may be translated so that the caller may only need a tiny url to access an object and internally the same copy of the object may be provided at lightning speed from site specific buckets. We are not just putting the gateway on steroids, we are also making it smarter by allowing the user to customize the rules. These rules can be authored in the form of expressions and statements much like a program with lots of if then conditions ordered by their execution sequence. The gateway works more than an http proxy or a message queue server. It is a lookup of objects without sacrificing performance and without restrictions to the organization of objects within or distributed stores. It works much like routers and although we have referred to gateway as a networking layer over storage, it provides a query execution service as well.  All the queries are similar in their nature. They are mostly web addresses of objects. The storage server only knows about three internal copies of an object for durability. These copies share the same address and different objects have different web address. What a storage server may think as different objects may even be the same object for the user. How the user organizes the objects in namespaces and buckets may be based on her rules that are beyond the site replication. if the gateway can route the request to the same object to different sites, there is nothing preventing the gateway to let the user add custom rules that utilize this address translation for purposes other than geography based content distribution. Fundamentally, a specific address just for an object each does not benefit the customer when she wants to hand out the same address for content that are served by two or more same objects. Where those objects are located and how the address translation works may be based on statics site based routing via regex or dynamic routing based on rules and program.  Moreover, the gateway has the ability to interpret aliases of addresses that the object storage cannot. 

Thursday, August 30, 2018

The case of the Cloud Gateways for storage. 
Some view the cloud gateway as a device that can be placed at the customer’s premise and translate low level file commands into high level http requests that use cloud storage. Public cloud providers distort it further by saying the gateway is provided from the cloud. They offer easy integration into existing infrastructure because they route requests between options. Sometimes direct integration can be very expensive requiring manipulation of APIs for create, update and delete. On the other hand, the gateways feature as adapters and do away with the cost of integration by leveraging existing commands.  
Others use gateway for segregating their workloads. Every store in an organization does not get used uniformly and gateways help to consolidate the infrastructure behind a common entrypoint. This allows users to use the same construct that they have while allowing the planners to separate the storage into high and low usage cases.  
Cloud gateways can also be used for heterogenous stores where the data existing on one storage need not be replicated to another storage as long as they are accessible from the same common entrypoint. 
Regardless of what gateway means for someone, they find universal appeal in their utility. Gateways distribute traffic. It works exceptionally well when it routes request to on-premise or cloud object stores. The on-premise helps with closer access of data. The same concept may apply to geographical distribution If similar content where each object storage serves a specific region. In this case replication may need to be set-up between different object storage. we could leverage an object storage replication group to do automatic replication. It might be considered a bottleneck if the same object storage is used. This is different from redirecting requests to separate servers/caches. However, shared services may offer at par service level agreement as an individual service for requests. Since a gateway will not see a performance degradation when sending to a proxy server or a shared dedicated store, it works in both these cases. Replacing a shared dedicated store with a shared dedicated storage such as an Object Storage is therefore also a practical option. Moreover, a cache generally improves performance over what might have been incurred in going to the backend. That is why different proxy servers behind a gateway could maintain their own cache.  A dedicated cache service like AppFabric may also be sufficient to handle requests. In this case, we are consolidating proxy server caches with a dedicated cache. 
#codingexercise
Determine if a number is perfect.  A perfect number is the sum of all of its divisors.
Boolean isPerfect(uint n ) 
{
var factors = GetFactors(n);
return n == factors.sum();
}
List<int> GetFactors(uint n)
{
var ret = new List<int>();
ret.Add(1);
For (int I = 2; i <= Math.sqrt(n); I++) {
 If (n %I ==0 ) {
       ret.Add(I); // add lo factor
       if (n/i != I ) ret.Add(I); // /add high factor
}
}
return ret;
}

Wednesday, August 29, 2018

We discussed that a gateway is supposed to distribute the traffic.Ut works exceptionally well when it routes request to on-premise or cloud object stores. The on-premise helps with closer access of data. The same concept may apply to geographical distribution If similar content where each object storage serves a specific region. In this  case replication may need to be set-up between different object storage. we could leverage an object storage replication group to do automatic replication. It might be considered a bottleneck if the same object storage is used. This is different from redirecting requests to separate servers/caches. However, shared services may offer at par service level agreement as an individual service for requests. Since a gateway will not see a performance degradation when sending to a proxy server or a shared dedicated store, it works in both these cases. Replacing a shared dedicated store with a shared dedicated storage such as an Object Storage is therefore also a practical option. Moreover, a cache generally improves performance over what might have been incurred in going to the backend. That is why different proxy servers behind a gateway could maintain their own cache.  A dedicated cache service like AppFabric may also be sufficient to handle requests. In this case, we are consolidating proxy server caches with a dedicated cache.
There is a tradeoff when we address gateway logic, replication logic, and storage server logic independently. While it is modular to visualize each layer as a separation of concerns, there is no necessity to house them in different products. Moreover they can be viewed as storage server logic and this can be moved into the storage server. The tradeoff is that when these layers are consolidated, they do not facilitate testing. Moreover they become more dedicated towards the storage and leave the onus on the owner to make copies of the content as necessary for the geographical regions.  However, we argued that the storage and replication are handled well within object storage and what was missing was just the gateway feature. This gateway feature can be made extensible but it would be sufficient to enable the user to store once and have the same content made available from each geographical region and the request routed to the nearest geographical region. Further the address translation need not be made specific to region, they can be made granular to objects. If we take an example of the url from an object storage for the exposed endpoint of an object over http, it usually has a namespace, bucket and object name as hierarchy. This is the only input from the user. This component does not change. However, the gateway rules previously translated the server address but now they can translate the object naming hierarchy to the nearest site.

Tuesday, August 28, 2018

We discussed that a gateway is supposed to distribute the traffic. If it sends it to the same single point of contention, it is not very useful When requests are served from separate caches, the performance generally improves over what might have been incurred in going to the backend. That is why different proxy servers behind a gateway could maintain their own cache.  A dedicated cache service like AppFabric may also be sufficient to handle requests. In this case, we are consolidating proxy server caches with a dedicated cache. This does not necessarily mean a single point of contention. Shared services may offer at par service level agreement as an individual service for requests. Since a gateway will not see a performance degradation when sending to a proxy server or a shared dedicated cache, it works in both these cases. Replacing a shared dedicated cache with a shared dedicated storage such as an Object Storage is therefore also a practical option.
While gateway route requests, they could be replaced with a networking layer that enables a P2P network of different object storage which could be on-premise or in the cloud. A distributed hash table in this case determines the store to go to. The location information for the data objects is deterministic as the peers are chosen with identifiers corresponding to the data object's unique key. Content therefore goes to specified locations that makes subsequent requests easier. Unstructured P2P is composed of peers joining based on some rules and usually without any knowledge of the topology. In this case the query is broadcast and peers that have matching content return the data to the originating peer. This is useful for highly replicated items. P2P provides a good base for large scale data sharing. Some of the desirable features of P2P networks include selection of peers, redundant storage, efficient location, hierarchical namespaces, authentication as well as anonymity of users.  In terms of performance, the P2P has desirable properties such as efficient routing, self-organizing, massively scalable and robust in deployments, fault tolerance, load balancing and explicit notions of locality.  Perhaps the biggest takeaway is that the P2P is an overlay network with no restriction on size and there are two classes structured and unstructured. Structured P2P means that the network topology is tightly controlled and the content is placed on random peers and at specified location which will make subsequent requests more efficient.


Monday, August 27, 2018

We were discussing anecdotal quotes from industry experts on gateway for object storage.
They cited gateways for object storage as provided by public cloud providers.  This is a convenience for using on - premise and cloud storage. which shows that there is value in this proposition. In addition, our approach is novel in using it for Content Distribution Network and by proposing it to be built into the object storage as  a service.
Some experts argued that gateway is practical only for small and medium businesses which are small scale in requirements. This means that they are stretched on large scale and object storage deployments are not necessarily restricted in size. These experts argued that the problem with gateway is that it adds more complexity and limits performance.
When gateways solve problems where data does not have to move, they are very appealing to many usages across the companies that use cloud providers.  There have been several vendors in their race to find this niche. In our case, the http references to use copies of objects versus the same object is a way to do just that.  With object storage not requiring any maintenance or administration and providing ability to store as much content as necessary, this gateway service becomes useful for content distribution network purposes.
Some experts commented that public cloud storage gateways are able to mirror volume to a cloud but they are still just building blocks in the cloud. They do not scale capacity or share data to multiple locations This is exactly what we try to do with a gateway from object storage.
A gateway is supposed to distribute the traffic. If it sends it to the same single point of contention, it is not very useful When requests are served from separate caches, the performance generally improves over what might have been incurred in going to the backend. That is why different proxy servers behind a gateway could maintain their own cache.  A dedicated cache service like AppFabric may also be sufficient to handle requests. In this case, we are consolidating proxy server caches with a dedicated cache. This does not necessarily mean a single point of contention. Shared services may offer at par service level agreement as an individual service for requests. Since a gateway will not see a performance degradation when sending to a proxy server or a shared dedicated cache, it works in both these cases. Replacing a shared dedicated cache with a shared dedicated storage such as an Object Storage is therefore also a practical option.
#codingexercise
print all the combinations of a string in sorted order
void PrintSortedCombinations(String a)
{
  a.Sort();
  PrintCombinations(a);
  // uses the Combine() method implemented earlier
}

Sunday, August 26, 2018

Anecdotal quotes from industry on gateway for object storage.
We know gateways for object storage is provided as a convenience by public cloud providers. Therefore, there is value in that proposition. In addition, we are also having a novel approach in using it for Content Distribution Network and by proposing it to be built into the object storage as  a service. Today we use anecdotal quotes from industry in this regard.
They mention that gateways help connect systems that would otherwise require a lot of code to wire the APIs for data flow. This arduous task of rewriting applications to support web interfaces applies to those who are wanting to migrate to different object storage stacks. It does not really apply in our case.
Some experts argued that gateway is practical only for small and medium businesses which are small scale in requirements. This means that they are stretched on large scale and object storage deployments are not necessarily restricted in size. They are a true cloud storage. These experts point out that object storage is best for backup and archiving. Tools like duplicity use S3 apis to persist in object storage and in this case we are including any workflow for backup and archiving. These workflows do not require modifications of data objects and this makes object storage perfect for them. These experts argued that the problem with gateway is that it adds more complexity and limits performance. It is not used with primary storage applications which are more read and write intensive and do not tolerate latency. Some even argued that the gateway is diminished in significance when the object storage itself is considered raw.
On the other hand, other experts argued that gateways give predictable performance between on-premises infrastructure and a public cloud storage provider. They offer easy integration into existing infrastructure and they offer ability to integrate on a storage protocol by protocol basis. This may be true for cloud gateways in general but our emphasis was on virtual http endpoints within a single object storage.
When gateways solve problems where data does not have to move, they are very appealing to many usages across the companies that use cloud providers.  There have been several vendors in their race to find this niche. In our case, the http references to use copies of objects versus the same object is a way to do just that.  With object storage not requiring any maintenance or administration and providing ability to store as much content as necessary, this gateway service becomes useful for content distribution network purposes.
Some experts commented that public cloud storage gateways are able to mirror volume to a cloud but they are still just building blocks in the cloud. They do not scale capacity or share data to multiple locations This is exactly what we try to do with a gateway from object storage.