Wednesday, July 4, 2018

We were discussing the storage as a network. In particular, I want to bring up a level of separation between storage and networking and show that by moving this separation further into one domain we get the possibilities of technologies that are vastly different than if it were pushed in the other domain. For example, Peer-to-Peer (P2P) networking provides a good base for large scale data sharing and application level multicasting.  Some of the desirable features of P2P networks include selection of peers, redundant storage, efficient location, hierarchical namespaces, authentication as well as anonymity of users.  In terms of performance, the P2P has desirable properties such as efficient routing, self-organizing, massively scalable and robust in deployments, fault tolerance, load balancing and explicit notions of locality.  Perhaps the biggest takeaway is that the P2P is an overlay network with no restriction on size and there are two classes structured and unstructured. Structured P2P means that the network topology is tightly controlled and the content is placed on random peers and at specified location which will make subsequent queries more efficient. DHTs fall in this category where the location of the data objects is deterministic and the keys are unique. Napster was probably the first example to realize the distributed file sharing benefit with the assertion that requests for popular content does not need to be sent to a central server. P2P file sharing systems are self-scaling.
On the other hand, we have storage systems that propose a cluster-based file system, a universal S3 object store or a streaming store each with its own benefits. Essentially the users may choose to see these as storage or network first and depending on their purpose, a solution may be recommended.
To summarize, both storage and networking in their modern forms use some kind of distributed hashes, indexes logging and co-ordination services. However, this smartness over traditional block level storage and connected networks may not need to be replicated in both domains ideally. In fact, it may even belong to the networking layer rather than the storage. The real question is do we want to create smarter storage in preference to smarter networking and who gets to be on top in layering. Neither cluster with their network on top design nor a file system that spans clusters is a true all purpose solution to everyone.
Reference:
Comparing the performance of distributed hash tables under churn by Li, Stribling et al.
Comparision of peer-to-peer overlay network schemes Lua, Crowcroft et al.

No comments:

Post a Comment