Cluster computing

Tuesday, July 3, 2018

We were discussing the storage as a network:

What used to be stored centrally is now being demanded to be stored in a distributed manner. Moreover, businesses are requiring virtualization to automate deployments, ease migrations and enable fast setup and tear-down over existing resources. Any attempt at meeting these requirements is also now expected to be elastic to growth and billing. While storage cost has been driving down, the cloud is eagerly absorbing on-premise assets into its networked storage.

In particular, I want to bring up a level of separation between storage and networking and show that by moving this separation further into one domain we get the possibilities of technologies that are vastly different than if it were pushed in the other domain. For example, Peer-to-Peer (P2P) networking provides a good base for large scale data sharing and application level multicasting. Some of the desirable features of P2P networks include selection of peers, redundant storage, efficient location, hierarchical namespaces, authentication as well as anonymity of users. In terms of performance, the P2P has desirable properties such as efficient routing, self-organizing, massively scalable and robust in deployments, fault tolerance, load balancing and explicit notions of locality. Perhaps the biggest takeaway is that the P2P is an overlay network with no restriction on size and there are two classes structured and unstructured. Structured P2P means that the network topology is tightly controlled and the content is placed on random peers and at specified location which will make subsequent queries more efficient. DHTs fall in this category where the location of the data objects is deterministic and the keys are unique. Napster was probably the first example to realize the distributed file sharing benefit with the assertion that requests for popular content does not need to be sent to a central server. P2P file sharing systems are self-scaling.

On the other hand, we have storage systems that propose a cluster-based file system, a universal S3 object store or a streaming store each with its own benefits. Essentially the users may choose to see these as storage or network first and depending on their purpose, a solution may be recommended.

#codingexercise

In a candy store there are N different types of candies each with its own price. We can buy a single candy from the store and get at most k other types of candies free. What is the minimum amount of money we need to spend to buy all N candies.
Solution: we sort the price. we purchase the low cost candies and for each we reduce k the high cost candies.
int GetMinimum(List<uint> prices, uint k)
{
uint res = 0;
uint n = prices.Count();
for (uint i = 0; i < n; i++)
{
res += prices[i];
n = n - k;
}
return res;

}

Cluster computing

Tuesday, July 3, 2018

No comments:

Post a Comment