Friday, March 10, 2017

We continue with a detailed study of Microsoft Azure stack as inferred from an introduction of Azure by Microsoft. We reviewed some more features of Azure storage. We saw how erasure coding works in Windows Azure Storage aka WAS and how its different from Pelican. We saw what steps WAS takes to improve durability. We also saw the benefits of journaling. We reviewed the partition layer which translates objects such as blob, table or queue to storage. We saw how the partition manager maintains a massive Object Table that is sharded on different partition servers which maintain their consistency across concurrent transactions using a lock service. We read the partition layer data model and the supported data types and operations, the partition layer architecture and the range partition data structures. we reviewed the load balancing, split and merge operations. Then we reviewed the intrastamp and interstamp replication. We also took stock of TCO, performance, application profiles and their comparision study
We reviewed some of their design choices. They separated nodes dedicated to computation from nodes dedicated to storage. They decided to use range based partitioning /indexing instead of hash based indexing for the partition layer's object table. They used throttling or isolation which proved very useful when accounts are not well behaved. They built in automatic load balancing  on range based partitioning approach and the account based throttling. They used separate log streams for each RangePartition to isolate the load time of each RangePartition. They used journaling to overcome the read/writes contention on the same drive.
 #codingexercise
To count total number of N digit numbers whose difference between sum of even and odd digits is 1, we modify it as follows:
long GetCountDiff(int digits, int even, int odd, int n, bool isOdd)
{
if (digits == n) return even - odd == 1 || odd - even == 1;
long count = 0;
if (isOdd){
    for (int i = 0; i <= 9; i++)
           count += GetCountDiff(digits+1, even, odd+i, n, false);
}else{
    for (int i = 0; i <= 9; i++)
           count+= GetCountDiff(digits+1, even+i, odd, n, true);
}
return count;
}

To skip leading zeros, we use a wrapper that counts from 1 to 9

long GetCountWrapper(int N)

{

   long count = 0;

   int digits = 0;

   int even = 0;

   int odd = 0;


   for (int i = 1; i<=9; i++)
              count += GetCountDiff(digits+1, even+i, odd, N, true);

   return count;
}
notice that the recursion chains for the number of digits N specified.

Thursday, March 9, 2017

We continue with a detailed study of Microsoft Azure stack as inferred from an introduction of Azure by Microsoft. We reviewed some more features of Azure storage. We saw how erasure coding works in Windows Azure Storage aka WAS and how its different from Pelican. We saw what steps WAS takes to improve durability. We also saw the benefits of journaling. We reviewed the partition layer which translates objects such as blob, table or queue to storage. We saw how the partition manager maintains a massive Object Table that is sharded on different partition servers which maintain their consistency across concurrent transactions using a lock service. We read the partition layer data model and the supported data types and operations, the partition layer architecture and the range partition data structures. we reviewed the load balancing, split and merge operations. Then we reviewed the intrastamp and interstamp replication. We also took stock of TCO, performance, application profiles and their comparision study.
We now review some of their design choices. First, they separated nodes dedicated to computation from nodes dedicated to storage. This lets them both scale and load balance independently.  Second, they decided to use range based partitioning /indexing instead of hash based indexing for the partition layer's object table.  An account's objects are stored within a set of RangePartitions. This makes object enumeration easier because of the locality of the objects. Additionally, performance isolation is easier because we can go object by object. The hash based schemes have the simplicity of distributing the load across server but do not have the objects local. Objects grouped together by account also enables account throttling. Third, throttling or isolation is very useful when accounts are not well behaved.  The statistics is collected by the partition server which keeps track of request rates for AccountNames and PartitionNames.  The same determination is used for load balancing. Fourth, automatic load balancing can now be built on range based partitioning approach and the account based throttling.  This improves multi-tenancy in the environment as well as handling of peaks in traffic patterns. Moreover, the algorithm can be adaptive even choosing appropriate metrics to determine traffic patterns that are better known and have been handled before.  Their approach first started with a single number to quantify load on each RangePartition and each server and then used the product of request latency and request rate to represent loads.Still it did not capture high CPU utilization. CPU and network load were then also used to quantify the load. This strategy is different from the one used for splitting partitions. That decision is based on whether partitions are becoming full. Both splitting and load balancing allow hot partitions to be reallocated to different servers.
It's interesting to observe that the adaptive algorithm and the movement of partitions is equally applicable to compute resources from an IaaS. Today many private compute resource providers are asking customers to choose regions. This can be automated for load balancing and handling traffic patterns when there is little or no difference to the customers which location their resources are tenants on. The strategy applies equally to storage nodes just as much as compute nodes. Furthermore, resources can be grouped and tiered to meet different service level classifications.
#codingexercise
Count of N digit numbers whose sum of digits equals to a given sum
long GetCount(int N, int sum)
{
if (N == 0) return sum == 0;
long count = 0;
for (int i = 0; i <= 9; i++)
       if (sum-i >= 0)
           count += GetCount(N-1, sum-i);
return count;
}
To skip leading zeros, we use a wrapper that counts from 1 to 9
long GetCountWrapper(int N, int sum)
{
   long count = 0;
   for (int i = 1; i<=9; i++)
         if (sum-i >= 0)
              count += GetCount(N-1, sum-i);
   return count;
}

To count total number of N digit numbers whose difference between sum of even and odd digits is 1, we modify the above to keep track of both even and odd sum. The check inside thr for loop is then to see if the current sum is odd or even and toggle accordingly.

Wednesday, March 8, 2017


We continue with a detailed study of Microsoft Azure stack as inferred from an introduction of Azure by Microsoft. We reviewed some more features of Azure storage. We saw how erasure coding works in Windows Azure Storage aka WAS and how its different from Pelican. We saw what steps WAS takes to improve durability. We also saw the benefits of journaling. We reviewed the partition layer which translates objects such as blob, table or queue to storage. We saw how the partition manager maintains a massive Object Table that is sharded on different partition servers which maintain their consistency across concurrent transactions using a lock service. We read the partition layer data model and the supported data types and operations, the partition layer architecture and the range partition data structures. we reviewed the load balancing, split and merge operations. Then we reviewed the intrastamp and interstamp replication. We also took stock of TCO. We also reviewed the performance of WAS and a comparision study of the usages of Blobs, Tables and Queues by different cloud applications and services.
The applications and services were Facebook and Twitter search for Bing, XBox Gamesaves, XBox Telemetry and Microsoft Zune Backend. The percentage  of requests for all services showed that 17.9%of the requests used Blobs, 46.88% used Tables and 35.22% used queues. But in terms of capacity 70.31% of capacity was in Blobs, 29.68% of capacity was in Tables and 0.01% used by Queues. The tiny fraction for queues was expected because Queues use very little space as compared to Blobs and Tables. Applications profiles also seemed to have different weights of their usages of Blobs, Tables and Queues.  Zune and XBox Gamesaves had mostly unstructured data and put them into Blobs whereas customers like Bing and XBox Telemetry have to index a lot of data
Earlier I brought up that one of their design decisions was to separate VM based computations from storage  for Windows Azure so that both could scale and load balance independently due to their nature of access patterns. Therefore nodes running a customer's service code are separate from nodes providing their storage and WAS focused on high bandwidth without the data being on the same node or on the same rack. To improve this further, the datacenter networking topology will be flattened out to provide full bisectional bandwidth between compute and storage. 
#codingexercise
Friends Pairing problem:
Given n friends, each one can remain single or can be paired up with some other friend. Each friend can be paired only once so ordering is irrelevant
The total number of ways in which the friends can be paired is given by;
Int GetPairs(int n)
{
If (n <=2) return n;
Return GetPairs(n-1) + GetPairs(n-2)*(n-1);
}

Shortest superstring problem:
Given a set of distinct strings, find the shortest superstring  that contains each string in a given set as substring. The distinct strings cannot be substrings of each other:
The method to solve this is as follows:
1. Make a copy of the array to begin with
2. Find a pair that has maximum overlap as a and b
3. Replace a and b with the concatenation ab
4. The size of the array reduces by one and we repeat 1 to 4 until we are left with only one super string.
5. Since we find the maximum overlap at each point, we make a greedy choice for the solution. This is an approximate solution and not necessarily the most optimal.
String FindShortestSuperString(List<string> input)
{
if (input == null || input.count == 0) return string.empty;
Var items = new string[input.Length];
Input.CopyTo(items);
Var tmp = items.ToList();
While (tmp.Count > 1)
{
Int max = 0;
Int l = 0;
Int r = 0;
For (int I = 0; I < tmp.Count; I++)
   For (int j = 0; j < tmp.Count; j++)
     {
         If ( I != j ){
            var cur = GetOverlapLength(tmp[I], tmp[j]);
            If (cur > max) { max = cur; l = I; r = j; }
         }
     }
If (l == 0 && r==0) r = l + 1;
Var new = tmp[I] + tmp[r];
Tmp.RemoveAt(l);
Tmp.RemoveAt(r);
Tmp.InsertAt(l, new);
}
Return tmp.first();
}


Tuesday, March 7, 2017

We continue with a detailed study of Microsoft Azure stack as inferred from an introduction of Azure by Microsoft. We reviewed some more features of Azure storage. We saw how erasure coding works in Windows Azure Storage aka WAS and how its different from Pelican. We saw what steps WAS takes to improve durability. We also saw the benefits of journaling. We reviewed the partition layer which translates objects such as blob, table or queue to storage. We saw how the partition manager maintains a massive Object Table that is sharded on different partition servers which maintain their consistency across concurrent transactions using a lock service. We read the partition layer data model and the supported data types and operations, the partition layer architecture and the range partition data structures. we reviewed the load balancing, split and merge operations. Then we reviewed the intrastamp and interstamp replication. We also took stock of TCO.
The throughput for applications increased linearly when the application increases the number of computing resources to access WAS objects. The throughput increases linearly upto eight VMs but tapers off as the network capacity is reached. For table operations, the batch puts offer about three times more throughput compared to single entity puts because of reduced network round trips and fewer stream writes.
 A variety of applications use WAS as cloud storage. Facebook and Twitter search for Bing uses WAS. The XBox GameSaves saves game data into the cloud for XBox users. Similarly the XBox telemetry services stores console generated diagnostics and telemetry information for later secure retrieveal and offline processing. Microsoft's Zune backend uses Windows Azure for media file storage and delivery and it stores files as Blobs. A comparision study of Blobs, tables and queues  was made for the applications in terms of requests, capacity, ingress and egress. Generally for these applications, 17.9%of all requests used Blobs, 46.88% used Tables and 35.22% used queues.



#codingexercise

Given a sequence of digits, find the number of subsequences that are divisible by m. 



Int GetCount(List<int> digits, int index, int rem, int m) 

Int count = 0; 
If ( digits.count == 0 || m == 0) return 0; 
//if (digits.toInt() < m) return 0; 
if (index == digits.count) return (rem == 0) ? 1 : 0; 
//if (IsDivisible(digits.ToInt(), m) count += 1; 
return GetCount(digit, index+1, rem, m) + GetCount(digits, index+1, (rem*10+digits[index]) %m, m); 

Enumerate all subsequences using combinations of lengths 1 to digits.count and their permutations if reordering requested and check each for divisibility. 
Void  Combine(ref string digits, ref stringbuilder b, int m, int start, int level, ref List<int> results) 

for (int I =start; I < digits.length; I++) 
 
   if (b[level] != digits[I]){ 
     b[level] = digits[i]; 
     If(IsDivisible(ToInt(b.toString()),m) 
     results.add(ToInt(b.toString())); 
if (I < digits.length) 
    Combine(ref digits, ref b, m, i+1, level+1); 
if (b[level] == digits[I]){ 
     b[level] = '/0'; 

}
Tests for divisibility can be taken as an interchangeable implementation.

Monday, March 6, 2017

We continue with a detailed study of Microsoft Azure stack as inferred from an introduction of Azure by Microsoft. We reviewed some more features of Azure storage. We saw how erasure coding works in Windows Azure Storage aka WAS and how its different from Pelican. We saw what steps WAS takes to improve durability. We also saw the benefits of journaling. We reviewed the partition layer which translates objects such as blob, table or queue to storage. We saw how the partition manager maintains a massive Object Table that is sharded on different partition servers which maintain their consistency across concurrent transactions using a lock service. We read the partition layer data model and the supported data types and operations, the partition layer architecture and the range partition data structures. we reviewed the load balancing, split and merge operations. Then we reviewed intrastamp and interstamp replication.
Lets take some digression to talk about TCO. Yan Han in his tutorial on TCO in December 2011 gave two examples of Cost Analysis citing examples on AWS.  According to him at that time the TCO of an AWS small instance for five years was $2750-$3750 all of which was operational costs not hardware costs which he put at 0$.  The TCO of a physical server comparable to an AWS small instance  for 5 years was $5868-$7608.  The operation expense was $7190-$10,690 ignoring downtime and failure expenses, insurance cost, technology training, and backup process. Similarly he cited costs for an example using storage  instead of compute. He said the TCO of a 10TB in Amazon S3 per year is $16,800 all of which was exclusively operational expense. The TCO of an on-premise 10TB physical storage per year was $11,212-$12,612. He broke down the hardware costs as $4168 per year and the operation expense as $1438-$2138 per year. Together with his analysis, he concluded on-premise cost higher than any public cloud offering. In such case, an organization will find it uphill to reduce or justify cost of private cloud to the point of tipping it below the costs incurred on public cloud. Therefore the migration path to public cloud becomes mandatory to be paved.
The migration path for applications and customers could include the following initiatives:
1) classifying usages of customers based on IaaS, PaaS, and SaaS and separating them to more and more
2) surveying all compute and storage usages to determine priority and severity levels of workloads for their migration
3) Onboarding new applications to public cloud with translation of jargons for core services such as logging, backup, monitoring, databases and so on.
4) Determining security and compliance defaults for application destinations in the public cloud.
5) Transferring account ownership and tracking to these resources for individuals and groups based on accounts and roles in the cloud .
6) Setting up alerts and notifications  for management and activities involved.
7) Mapping project, resource and data transfers from private to public cloud
8) Perform automatic migrations where customer interactions can be avoided.
The migration plan may also depend on the TCO. Martens, Walterbusch and Teuteberg mention that a major issue of any TCO model is the disregard of a differentiation between private, public and hybrid cloud.


#codingexercise
Print Top view of a binary tree
void PrintTopView(Node root)
{
 // similar to level wise traversal
if (root == null) return;
var d =  new SortedDictionary<int, int>();
int left = 0;
int right = 0;
var q = new Queue<Node>();
q.enqueue(root);
root.dist = 0; 
q.enqueue(null);
while (q.empty() == false)
{
var item = q.dequeue();
if (item == null){
q.enqueue(null);
}else
{
//if (root.dist == 0) { d[root.dist] = root.data;}
if (root.dist < left ) { left = root.dist; d[root.dist] = root.data;}
if (root.dist > right) { right = root.dist; d[root.dist] = root.data;}
if (root.left){
root.left.dist = root.dist-1;
q.enqueue(root.left);
}
if (root.right){
root.right.dist = root.dist+1;
q.enqueue(root.right);
}
}
}
foreach (var kvp in d)
  Console.Write("{0} ", kvp.value);
}
int GetMid(Node root)
{
if (root == null) return 0;
var current = root;
while (current.left != null)
          current = current.left;
var min = root.data;
current = root;
while (current.right != null)
          current = current.right;
var max = root.data;

return (min+max)/2;
}

Sunday, March 5, 2017

We continue with a detailed study of Microsoft Azure stack as inferred from an introduction of Azure by Microsoft. We reviewed some more features of Azure storage. We saw how erasure coding works in Windows Azure Storage aka WAS and how its different from Pelican. We saw what steps WAS takes to improve durability. We also saw the benefits of journaling. We reviewed the partition layer which translates objects such as blob, table or queue to storage. We saw how the partition manager maintains a massive Object Table that is sharded on different partition servers which maintain their consistency across concurrent transactions using a lock service. We read the partition layer data model and the supported data types and operations, the partition layer architecture and the range partition data structures. we reviewed the load balancing, split and merge operations.
We now look at partition layer interstamp replication. All access pertaining to an AccountName goes to the primary stamp associated with the account as determined by the Location Service. WAS performs interstamp replication between primary and secondary stamps. One of the main scenarios for inter-stamp replication is to geo-replicate an accountant's data between two data centers for disaster recovery. The DNS naming convention is AccountName.service.core.windows.net which points to the VIP for the primary stamp.  For every write in the primary stamp, the change is fully replicated by intra stamp replication.  Once the changes are committed to the primary stamp, the changes are geo-replicated asynchronously to the secondary stamp using interstamp replication. When a change arrives at the secondary stamp, it is applied in the partition layer which fully replicates using intra stamp replication within the secondary stamp. 
We may observe that the interstamp replication is done asynchronously, recent updates that have not been inter-stamp replicated may be lost during disaster recovery.  This is minimized by reducing the delay in interstamp replication.  The interstamp replication is used for both account geo-replication and migration across stamps. In the disaster  recovery case, there is an abrupt failover so changes may be lost whereas in the migration case, we can perform a clean migration. The URIs for blobs, tables and queues do not change after the failover.
Compute and Storage is traditionally separated in independent clusters because they can do their own load balancing and scale independently. The throughput for applications increased linearly when the application increases the number of computing resources to access WAS objects. The throughput increases linearly upto eight VMs but tapers off as the network capacity is reached. This means we could have expected the throughput to increase even more if the network was not limiting.
#codingexercise
int GetMax(Node root)
{
if (root == null) return INT_MAX;
while (root.right != null)
          root = root.right;
return root.data;
}

Does a bottom view of a stack  => its leaves ?
If so, can we print the leaves of the tree as follows:
void GetLeaves(Node root, ref List<Node> leaves)
{
if (root == null) return;
GetLeaves(root.left, ref leaves);
if (root.left == null && root.right == null) 
     leaves.Add(root);
GetLeaves(root.right, ref leaves);
}

Print bottom view of a binary tree
void PrintBottomView(Node root)
{
 // similar to level wise traversal
if (root == null) return;
var d =  new SortedDictionary<int, int>();
int left = 0;
int right = 0;
var q = new Queue<Node>();
q.enqueue(root);
root.dist = 0; 
q.enqueue(null);
while (q.empty() == false)
{
var item = q.dequeue();
if (item == null){
q.enqueue(null);
}else
{
//if (root.dist == 0) { d[root.dist] = root.data;}
if (root.dist < left ) { left = root.dist;}
if (root.dist > right) { right = root.dist;}
d[root.dist] = root.data;
if (root.left){
root.left.dist = root.dist-1;
q.enqueue(root.left);
}
if (root.right){
root.right.dist = root.dist+1;
q.enqueue(root.right);
}
}
}
foreach (var kvp in d)
  Console.Write("{0} ", kvp.value);

}

Saturday, March 4, 2017

We continue with a detailed study of Microsoft Azure stack as inferred from an introduction of Azure by Microsoft. We reviewed some more features of Azure storage. We saw how erasure coding works in Windows Azure Storage aka WAS and how its different from Pelican. We saw what steps WAS takes to improve durability. We also saw the benefits of journaling. We reviewed the partition layer which translates objects such as blob, table or queue to storage. We saw how the partition manager maintains a massive Object Table that is sharded on different partition servers which maintain their consistency across concurrent transactions using a lock service. We read the partition layer data model and the supported data types and operations, the partition layer architecture and the range partition data structures.  We saw that the RangePartitions are made up of  the metadata stream, commit log stream, row data stream and blob data stream. The in-memory data structures use Memory Table, Index Cache, Row Data Cache and Bloom Filters.  When the PS receives  a write request to the Range Partition, it appends the operations into the commit log, and then puts the newly changed row into the memory table.  The Partition Server is responsible for assigning the broken massive Object Table to partition servers and their associated maintenance operations such as load balance, split and merge.  The load for each range operation as well as the overall load for each Partition server is tracked on the basis of transactions/second, average pending transaction count, throttling rate, the CPU usage, network usage, request latency and the data size for RangePartition. This information is sent by the PS in response to the hearbeat request from the PM. The PM decides if a PS has too much load and the PS then performs the actual split in the PartitionRanges. The PM could also take away RangePartitions and assign to another PS. For each stamp, there are usually 75 splits and merges  and 200 RangePartition load balances per day. In order to keep the loading time of a freshly arrived set of RangePartitions, the commit log is kept small with a checkpoint prior to the handoff. 
The Split and Merge operations are also done by the Partition Servers on request from the PM. The splits are made at the location specified by the key comprising of AccountName, PartitionName.  The RangePartitions maintain the sizes of the objects and the key at which the size can be halved.  The PS chooses a key based on Adaptive Range Profiling which is a technique to find the key ranges that have the most load and to use these to determine the split. Prior to each split the RangePartition is checkpointed. During the split, the PS uses a special operation called MultiModify that takes the source stream and creates a new set of streams for the splits with the same extents in the same order as in the source. This is pretty fast because its merely a pointer manipulation.
The Merge operation requires that the source RangePartitions all come to the same Partition Server. The PM moves these for the PS. The PS checkpoints both and runs the MultiModify operation which in this case is a concatenation of all of the extents from their respective streams. This concatenation takes all of the extents of one and then follows them with all of the extents of the other. The PS then makes the metadata stream with the names of the new commit log and data stream as well as the root of the data index in the merge data stream which can now be correctly loaded.
#codingexercise
Print top view of a binary tree
void PrintTopView(Node root)
{
 // similar to level wise traversal
if (root == null) return;

var d =  new SortedDictionary<int, int>();
int left = 0;
int right = 0;
var q = new Queue<Node>();
q.enqueue(root);
root.dist = 0; 
q.enqueue(null);
while (q.empty() == false)
{
var item = q.dequeue();
if (item == null){
q.enqueue(null);
}else
{
if (root.dist == 0) { if (root.contains(root.dist) == false) d[root.dist] = root.data;}
if (root.dist < left ) { if (root.contains(root.dist) == false) {left = root.dist; d[root.dist] = root.data;}}
if (root.dist > right) { if (root.contains(root.dist) == false){right = root.dist;. d[root.dist] = root.data;}}
if (root.left){
root.left.dist = root.dist-1;
q.enqueue(root.left);
}
if (root.right){
root.right.dist = root.dist+1;
q.enqueue(root.right);
}
}
}
foreach (var kvp in d)
  Console.Write("{0} ", kvp.value);
}