Wednesday, June 14, 2017

In continuation of our discussion on data architecture covered in previous posts, let us take an example and study it. We will review Amazon Dynamo as an example of a cloud based data storage and distributed system. Dynamo addresses the need for a reliable, scalable, highly available key-value storage system. It provides the ability to trade-off cost, consistency, durability and performance while maintaining high availability. With cloud based business such as Amazon, the compute and storage is distributed in several datacenters around the world. At this scale, there are many fault domains and and yet the persistent state has to be managed with high reliability and availability. To maitain this standard, Dynamo sacrifices consistency under certain conditions It makes extensive use of object-versioning and application assisted conflict resolution.  It is used to manage the state of services that have very high reliability requirements and need tight control over the tradeoffs between availability, consistency, cost effectiveness and performance. For a store like Amazon, a relational data store is limiting in terms of scale and availability. Dynamo provides a simple primary-key interface to meet this need.
Let us look at the techniques involved to achieve this scale and availability. Data is partitioned and replicated using consistent hashing and consistency is facilitated by object versioning. Replicas are maintained during updates based on a quorum like technique. The replicas are kept in sync with a decentralized synchronization protocol. Dynamo employs a gossip based distributed failure detection and to determine memberships. Storage nodes can be added and removed from Dynamo without requiring any manual partitioning or redistribution. This lets Dynamo handle the peak load traffic to the Amazon store.Usually replication is performed synchronously to provide a consistent data access interface. This trades off the availability under certain failures. It is well known that consistency and availability are hard to achieve together. Availability can be increased by optimistic replication techniques where changes are allowed to propagate to replicas in the background. Disconnectivity is tolerated. Conflicts arising from now needs to be handled by determining when to resolve them and who resolves them. Eventually a consistent state will be achieved. While some systems determine when to resolve during writes, Dynamo is designed to be an always writeable store.
Dynamo has been the underlying storage technology for a number of the core services of Amazon e-commerce platform. This store handles several millions of checkouts a day.  Dynamo is a single highly available system for this entire store.
Courtesy: Amazon paper on Dynamo
#codingexercise
Count number of bridges possible between cities on northern and southern parts of the river such that they don't intersect. The southern cities are random order of the northern cities and bridges are drawn between same cities.
  cities A  B  C  D   E   F  G  H
  pairs  H  A  B  F   E    D G   C
  best    1  1  2  3   3    3  4   3
   A-A
   B-B
   D-D
   G-G

Longest increasing subsequence
int GetBridges(List<int> northernCities, List<int> southernCities)
{
return min (GetBridgesLIS(northernCities), GetBridgesLIS(southernCities));
}
int GetBridgesLIS(List<Char> A)
{
var best = new int[A.Length+1];
for (int i = 0; i < best.Length; i++)
       best[i] = 1;
for (int i = 1; i < A.Length; i++)
    for (int j=0; j < i; j++)
         if (A[i] > A[j])
          {
               best[i] = Math.Max(best[i], best[j]+1);
          }
return best.ToList().max();
}

No comments:

Post a Comment