Cluster computing

Sunday, February 26, 2017

We continue with a detailed study of Microsoft Azure stack as inferred from an introduction of Azure by Microsoft. We reviewed some more features of Azure storage.
The IaaS offerings from Azure storage services include disks and files where as the PaaS offerings from Azure storage services include objects, tables and queues. The storage offerings are built on a unified distributed storage system with guarantees for durability, encryption at Rest, strongly consistent replication, fault tolerance and auto load balancing.
The redundancy features are handled by storage stamps and a location service. A storage stamp is a cluster of N racks of storage nodes, where each rack is built out as a separate fault domain with redundant networking and power. Each storage stamp is located by a VIP and served by layers of Front-Ends, partition layer and stream layer that provides intra stamp replication.The intra stamp provides synchronous replication in the stream layer and ensures that the data is made durable within the storage stamp. The interstamp replication provides asynchronous replication and is focused on replicating data across stamps. We were looking at Blocks, Extents and Streams, the Stream Manager and the Extent Nodes.

We were discussing the replication flow, let us summarize it. First a stream is created. It has three replicas, a primary and two secondary. They are distributed across fault and upgrade domains. The SM says which replica is primary and the client gets to know which extent node has the locations of the replicas. This state is maintained in the SM cache. Writes go to the primary and from the primary to the secondary. The locations of the replicas don't change until the extent is sealed. No leases are used to represent the primary EN for an extent. When one extent gets sealed, the process repeats for another extent. The client can read the information from any replica including the last block. The primary decides the offset of the append in the extent.It orders the appends for concurrent requests. It relays the chosen offset to the secondary. It returns success to the client after all three appends. The sequence of success returned is in the order of append offsets. As appends commit in order for a replica, the last append position is considered to be the current commit length of the replica. This is helpful to determine the integrity of the replicas because the primary EN does not change, it sequences the appends and seals the extent.
When the process is repeated such as on failures, a write failure is returned to the client. The client then contacts the SM which then seals the extent at the current commit length. A new extent is allocated on a different ENs and the informations is relayed to the client.

#codingexercise
Determine if two arrays represent same BST without manipulating the arrays.
If the arrays A and B could be manipulated, we know we could check
A.sort() == B.sort() to see that they are the same BST.
or we could
MakeBST(A) == MakeBST(B)
where MakeBST(A) =
{
Node BST-A = null;
for (int i = 0; i < A.Length; i++)
Insert(BST-A, A[i])
return BST-A;
}
If the two BSTs are identical then each node will occupy the same level and have the same subtree for both arrays. Consequently we could try to check the same for pairwise match of elements in both trees. However, this means we do n-1 comparisions for each of the n elements and is much inefficient than rendering the BST with insertions. Moreover this kind of checking will requiring making BSTS for separate ranges of items - those that are lower than the root and those that are higher than the root. The check also is very inefficient because to keep track of level and siblings, we require the tree. Finally contrast this with just sorting the elements of the array to compare against each other for equality to say BSTs are identical. That kind of check would be most efficient.

Cluster computing

Sunday, February 26, 2017

No comments:

Post a Comment