Cluster computing

Thursday, June 15, 2017

We were reviewing Amazon Dynamo as an example of a cloud based data storage and distributed system. Dynamo is a single highly available system for this entire Amazon store and it is designed to handle all loads including peak loads. Reading from their paper, we saw that Dynamo addresses the need for a reliable, scalable, highly available key-value storage system. It provides the ability to trade-off cost, consistency, durability and performance while maintaining high availability.
Let us look at the techniques involved to achieve this scale and availability. Data is partitioned and replicated using consistent hashing and consistency is facilitated by object versioning. Replicas are maintained during updates based on a quorum like technique. The replicas are kept in sync with a decentralized synchronization protocol. Dynamo employs a gossip based distributed failure detection and to determine memberships. Storage nodes can be added and removed from Dynamo without requiring any manual partitioning or redistribution.Conflicts need to be handled by determining when to resolve them and who resolves them. Eventually a consistent state will be achieved. While some systems determine when to resolve during writes, Dynamo is designed to be an always writeable store.
The writes are important to services because they could involve customer updates which cannot be lost or rejected.Therefore conflict resolution is not handled in writes. Instead, it is done during reads. To determine who performs the conflict resolution, we can choose between applications and datastore. The datastore is not context aware so it may choose to use the last write or such other straightforward mechanisms. The application is however savvy about the user context and can determine the best resolution. Therefore, the conflict resolution is performed by the application.

#codingexercise
Find a number in a rotated sorted array
int GetNumber(List<int> sortedRotated, int num)
{
int n = sortedRotated.Count;
int pivot = find_pivot(sortedRotated, 0, n-1, num); // binary search
if (pivot == -1)
return binary_search(sortedRotated, 0, n-1);
if (sortedRotated[pivot] == num)
return pivot;
if (sortedRotated[pivot] <= num)
return GetNumber(sortedRotated, 0, pivot-1, num);
else
return GetNumber(sortedRotated, pivot+1, n, num);
}
int find_pivot( List<int> sortedRotated, int start, int end)
{
if (start < end)
return -1;
if (start == end) return start;
int mid = (start + end ) / 2;
if ( mid < end && sortedRotated[mid] > sortedRotated[mid+1]) return mid;
if (mid > start && sortedRotated[mid-1] > sortedRotated[mid]) return mid-1;
if (sortedRotated[low] >= sortedRotated[mid])
return find_pivot(sortedRotated, start, mid-1);
return find_pivot(sortedRotated, mid+1, end);
}
#alternative : http://ideone.com/e.js/F7V2aj
Also, we can use another technique that relies on a combined array that concatenates the sortedRotated array with itself which gives a subarray that is already sorted with all of the elements.
eg: 8, 9, 9, 1, 3, 4, 4, 4, 6, 6, 7, 7
8, 9, 9, 1, 3, 4, 4, 4, 6, 6, 7, 7, 8, 9, 9, 1, 3, 4, 4, 4, 6, 6, 7, 7
1, 3, 4, 4, 4, 6, 6, 7, 7, 8, 9, 9,
As an example, since all the elements are sorted, finding the index of the element in this sorted sequence is a binary search. This length can then be added to the index of the pivot point to return the result.

DFS in a matrix with blockers
void DFS(int[,] A, int row, int col, int row, int col, ref int[,] visited, ref int size)
{

var rd = { -1,-1,-1,0,0,1,1, 1};
var cd = { -1,0,1,-1,0,1,-1,0,1 };

visited[row,col] = true;
for (int k = 0; k < 8; k++)
if (isSafe(A, row,col, row + rd[k], col +cd[k])){
size++;
DFS(A, row,col, row+rd[k], col+cd[k], ref visited, ref size);
}

}

Cluster computing

Thursday, June 15, 2017

No comments:

Post a Comment