Cluster computing

Thursday, March 9, 2017

We continue with a detailed study of Microsoft Azure stack as inferred from an introduction of Azure by Microsoft. We reviewed some more features of Azure storage. We saw how erasure coding works in Windows Azure Storage aka WAS and how its different from Pelican. We saw what steps WAS takes to improve durability. We also saw the benefits of journaling. We reviewed the partition layer which translates objects such as blob, table or queue to storage. We saw how the partition manager maintains a massive Object Table that is sharded on different partition servers which maintain their consistency across concurrent transactions using a lock service. We read the partition layer data model and the supported data types and operations, the partition layer architecture and the range partition data structures. we reviewed the load balancing, split and merge operations. Then we reviewed the intrastamp and interstamp replication. We also took stock of TCO, performance, application profiles and their comparision study.
We now review some of their design choices. First, they separated nodes dedicated to computation from nodes dedicated to storage. This lets them both scale and load balance independently. Second, they decided to use range based partitioning /indexing instead of hash based indexing for the partition layer's object table. An account's objects are stored within a set of RangePartitions. This makes object enumeration easier because of the locality of the objects. Additionally, performance isolation is easier because we can go object by object. The hash based schemes have the simplicity of distributing the load across server but do not have the objects local. Objects grouped together by account also enables account throttling. Third, throttling or isolation is very useful when accounts are not well behaved. The statistics is collected by the partition server which keeps track of request rates for AccountNames and PartitionNames. The same determination is used for load balancing. Fourth, automatic load balancing can now be built on range based partitioning approach and the account based throttling. This improves multi-tenancy in the environment as well as handling of peaks in traffic patterns. Moreover, the algorithm can be adaptive even choosing appropriate metrics to determine traffic patterns that are better known and have been handled before. Their approach first started with a single number to quantify load on each RangePartition and each server and then used the product of request latency and request rate to represent loads.Still it did not capture high CPU utilization. CPU and network load were then also used to quantify the load. This strategy is different from the one used for splitting partitions. That decision is based on whether partitions are becoming full. Both splitting and load balancing allow hot partitions to be reallocated to different servers.
It's interesting to observe that the adaptive algorithm and the movement of partitions is equally applicable to compute resources from an IaaS. Today many private compute resource providers are asking customers to choose regions. This can be automated for load balancing and handling traffic patterns when there is little or no difference to the customers which location their resources are tenants on. The strategy applies equally to storage nodes just as much as compute nodes. Furthermore, resources can be grouped and tiered to meet different service level classifications.
#codingexercise
Count of N digit numbers whose sum of digits equals to a given sum
long GetCount(int N, int sum)
{
if (N == 0) return sum == 0;
long count = 0;
for (int i = 0; i <= 9; i++)
if (sum-i >= 0)
count += GetCount(N-1, sum-i);
return count;
}
To skip leading zeros, we use a wrapper that counts from 1 to 9
long GetCountWrapper(int N, int sum)
{
long count = 0;
for (int i = 1; i<=9; i++)
if (sum-i >= 0)
count += GetCount(N-1, sum-i);
return count;
}

To count total number of N digit numbers whose difference between sum of even and odd digits is 1, we modify the above to keep track of both even and odd sum. The check inside thr for loop is then to see if the current sum is odd or even and toggle accordingly.

Cluster computing

Thursday, March 9, 2017

No comments:

Post a Comment