Sunday, July 31, 2016

In yesterday's post, we talked about extending OneFS over mesos. In that we maintained separation between the  layers of OneFS services and mesos. However, we could also consider a hypothetical substitute for mesos but one that is virtual and integrated with OneFS such that the nodes of the cluster is directly maintained by OneFS. The way this differs from the current use of OneFS clusters is that it could use its own cluster nodes as well as mesos so long as it can delegate. One of the requirements for OneFS is that it needs a high speed network between the nodes so that data transfer and metadata operations are supported. When we replace the current physical resources with virtual resources for the OneFS, we should be mindful that the performance is at par so that the services can work seamlessly.
Another important consideration is that some datacenters for the sake of performance choose to pull in the direction away from commodity hardware and use racks as the basic building block of modern data centers instead of individual servers. A rack-scale set of computers is a class of hardware with pooled storage, compute and network such that thousands of cores are interconnected at high speed.  For example, at Microsoft rack-scale computing, the hardware, storage, networking and OS stacks are redesigned in a cross layer co-design manner so as to improve efficiencies. By using a direct server to server network topology, they provide the ability to overcome the hardware restrictions that convention OneFS deployments have. While this is another dimension to source the resources for the OneFS, it highlights the abstraction that OneFS core services can make in terms of what it is operating on while continuing to provide their respective functionalities to form scalable self-healing file system.
The use of mesos or the use of racks do not take away any of the functionality that OneFS services provide to stretch the filesystem over resources well beyond the individual server. The design of OneFS already makes the necessary assumptions to support growth and shrink of the resources. The difference is that while this is being done directly based on nodes, we now suggest a redirection through an abstraction that is suitable for all three - direct virtual resources, mesos delegation and rack computing.
This abstraction implies that customers can expect the same seamless managed storage layer across deployments both in the on-premise enterprise, datacenters and large scale cloud deployments.
Moreover, by providing this managed layer instead of raw storage, the applications are relieved of many storage and data maintenance considerations of scale, cost and performance enabling them to focus more on the compute and being able to push down the service to where they themselves can be rolled from one place to another. This improves operational benefits while reducing impact to architecture.
#codingexercise
Given a string and a pattern state whether it follows the pattern. For eg: input = “redblueredgreen” matches pattern “abac” but not “aaab” as ‘red’ means ‘a’, ‘blue’ means ‘b’ and ‘green’ means ‘c’ here. String input only contains words “red”, “green” & “blue” and pattern can have any character.
We could optionally modify the problem above to include any text so that the input elements of red, green, blue don't form well known demarcations in input.
Solution : we build a hashtable for pattern elements  and maintain a sequence.
bool IsSimilar(string text, string pattern, ref Hashtable h)
{
if (text =="" && pattern == "") return true;
if (text == "" && pattern != "") return false;
if (text != "" && pattern == "") return false;
for (int i = 1; i <= text.Length; i++)
{
var sub = text.substring(0,i);
if (h.contains(pattern[0])){
If(h[pattern[0]]  != sub)
{
Continue;
}else{
h.Add(pattern[0], sub);
}
If( IsSimilar(text.trimleft(sub), pattern.removefirst(), ref h))
return true;
h.remove(pattern[0], sub);
}
Return false;
}

Given a N*M matrix how many different sizes of rectangles or squares can be formed
int GetNumberOfSquaresRectangles(int n, int m)
{
int count = 0;
for (int i =1; i <= M; i++)
   for (int j = 1; j <=N; j++)
    count += 1;
return count;
}
now count the number of such rectangles formed for each cell as the starting point
int GetAll(int n, int m)
{
int count = 0;
for (int i =1; i <= M; i++)
   for (int j = 1; j <=N; j++)
    count += GetNumberOfSquaresRectangles(M-i+1, N-j+1);
return count;
}

Saturday, July 30, 2016

Debunking data is sticky in cloud computing
Traditional enterprise software has focused on accumulating incredible amounts of data for their systems. As the data grows, it becomes more and more sticky requiring applications and services to change accordingly. Entire ecosystems then evolve around the data bringing many routines and much maintenance onus. 
Compute and Storage have often gone hand in hand with requirements shaping each other mutually. Yet they have fundamentally different needs and require different services to form their own layers.  While the separation between the layers may not matter for desktop applications, they do for cloud computing.  On premise data translated to a connection string or a file system or a message bus to the application. This was replaced by blobs, key-values, queue messages, file shares and object storage.  A database or a file no longer needs to be static and can be moved around, backed up and replicated, compressed, encrypted, aged and archived.
Object storage gave some of this benefits in exposing a raw storage platform that distributed the failure domains for the data.  By adding metadata and a globally unique identifier, the data could now live on multiple virtual machines or storage increasing availability, scalability and self-healing abilities. Applications took advantage of this by storing photos and songs. 
However object storage is not a managed platform. It is an alternative to file and block storage. If it were managed platform, it would provide services that storage appliances are used for such as deduplication, backup, aging and archival, and self healing enhancements. One such platform came close to providing these manged services. Isilon OneFS provided distributed file system with a storage cluster. It combines three layers of traditional storage architecture namely, a File System, Volume Manager, and data protection and it scales out because it relies on intelligent software, commodity hardware, and distributed architecture.
However OneFS used cluster storage from on-premise hardware. If Isilon could use cluster storage out of cloud resources such as Mesos clusters, then it would harness more cloud computing power with very little physical on-permise footprint. By using distributed systems kernel that runs on every machine, Isilon could use API for resource management and scheduling across entire data center and cloud environments. This provides several benefits: 
It provides linear scalability in that the clusters can grow to ten thousands of nodes where as OneFS max volume is around 143 nodes with hundred terabytes each.  
It provides high availability in that the master and agents are replicated in a fault tolerant manner and even the upgrades are non-disruptive.
It provides same cluster to support both cloud native and legacy applications with pluggable scheduling policies so that OneFS services can be ported or rewritten.
It provides APIs for operating the cluster and for monitoring leving much of the onus to mesos while enabling customizations. These APIs are already exercised in the web user interface made available from this layer.
It provides pluggable isolation in terms of CPU, memory, disks, ports, GPU and modules for custom resource isolation. This can help OneFS distribute and differentiate the workloads.
It provides a cross platform to run on Linux, OSX and Windows while being cloud provider agnostic. This makes it universal in the deployments and thus provides significant benefits for adoption and onboarding. 
With an existing customer base to validate the viability of OneFS and mesos individually, it should be safe to assume the viability and purpose of the one stacked over the other. Lastly, this provides a managed platform for storage rather than the raw services which lets the applications be more compute oriented and less storage conscious.

#codingexercise
Given a number n and k(no of swaps allowed), make the biggest number from n by making at most k swaps. If its already the biggest return the number itself
To solve the problem, we pick the elements that would appear in the sorted descending manner.
Int getnum(int num, int k)
{
var digits = num.toDigits();
int swaps;
for (int i = 0; i < digits.Count; i++)
{
    int pos = getmax(digits, i+1);
    if (digits[pos] > digits[i]){
         Swap(digits, i, pos);
         swaps++;
         if (swaps > k) 
              return digits.toNumber();
    }
}
return digits.toNumber();
}

Friday, July 29, 2016

Today we continue our discussion of the  paper titled "Pelican: a  building block for exascale cold data storage". Pelican treats a group of disks as a single schedulable unit. Resource restrictions such as power consumption, vibrations and failure domains are expressed as constraints over these units.
With the help of resource constraints and scheduling units, Pelican aims to be better than its over provisioned counterpart racks using computations in software stacks
Pelican uses resources from a set of resource domains which is a subset of disks. Pelican proposes a data layout and IO scheduling algorithms by expressing these resource domains as constraints over the disks. 
We were reading about the IO scheduler which does request ordering and rate limiting. The former is done to batch sets of operation for the same group to amortize the group spin up latency over the set of operations. The latter makes the rebuild operations complete within an upper time bound.
The scheduler maintains two queues - one for rebuild and another for regular operations and performs weighted fair queuing.
If the blob size is uniform, the upper bound on the tolerated reordering is expressed in terms of the number of operations. If the blob size is non-uniform, it is expressed in terms of wall clock time.  For each queued request, this process is greedily repeated. The batching is maximized unless it violates fairness for some requests. And the reordering bound is enforced for each request. These two guarantees using the upper bound which controls the reordering. The process expresses the tradeoff between throughput and fairness using this upper bound. If the value of the upper bound is close to zero, the queues provide fairness. On the other hand, if the value of the upper bound is close to infinity, it minimizes the number of spin-ups which increases the throughput.

#codingexercise
Print all possible strings that can be made by placing spaces

Void Permute (List<String> a, List<string> b, bool[] used) 
{ 
  If ( b.distinct().Count() >= a.Length { Console.Write(b.ToString()); return;} 
  For (int I = 0; I < a.Length ; i++) 
  { 
    If (used[i]) continue; 
     used[i] = true; 
     B.Add(A[i]); 
     Permute(a, b, used); 
     if (i < a.Length - 1){
         B.Add(" ");
         Permute(a,b,used);
         B.RemoveLast();
     }
     B.RemoveLast();
     used[i] = false; 
  } 
} 
Given a sorted dictionary (array of words) of an alien language, find order of characters in the language.
Recall that the topological sort helps list vertices in an order such that if u is before v in the list, then there is no edge from u to v.  The same criteria can be used to represent letters in a dictionary where the listing follows an order. All we have to do is add edges to support the topological sort.
topological sorting DFS ( V, E)
For each vertex v in V
       V.color=white
       V.d = nil
  Time = 0
 For each vertex v in V:
       If v.color == white:
              DFS-Visit (V, E)
     
 DFS-VISIT (V,E, u)
  time = time + 1
   u.d = time
   u.color = gray
   foreach  vertex v adjacent to u
        If v.color == white
           DFS-VISIT(V,E,v)
         Else
               If v.d  <= u.d < u.f <= v.f  throw back edge exception.
 u.color = black
time = time + 1
 u.f = time

void GetDictOrder(List<string> words, int V)
{
Graph g(V);
for (int i =0; i < words.Count-1; i++)
{
   var first = words[i]; 
   var second = words[i+1];
   for (int j = 0; j < min(first.length, second..length); j++)
   {
       if ( first.Length != second.Length)
       {
         g.AddE(first[j] - 'a', second[j] - 'a');
        break;
      }
   }
}
DFS-VISIT(g.V, g.E, words[0][0]);
}

Given a number n and k(no of swaps allowed), make the biggest number from n by making at most k swaps. If its already the biggest return the number itself
Int getnum(int num, int k)

{

Var digits = num.toDigits();
Int swaps = 0;
Var desc = digits.sort().reverse();

For( int i = 0; i < digits.count; i++)
    If (digits[i] != desc[j]){
           Swaps++;
           If (swaps > k)
                 Var part = desc.getRange(0,i+1)
                  Return part.union(digits.except(part)).toNumber();
           }
}
Return desc.toNumber();
}


Thursday, July 28, 2016

Today we continue our discussion of the  paper titled "Pelican: a  building block for exascale cold data storage". Pelican treats a group of disks as a single schedulable unit. Resource restrictions such as power consumption, vibrations and failure domains are expressed as constraints over these units.
With the help of resource constraints and scheduling units, Pelican aims to be better than its over provisioned counterpart racks using computations in software stacks
Pelican uses resources from a set of resource domains which is a subset of disks. Pelican proposes a data layout and IO scheduling algorithms by expressing these resource domains as constraints over the disks. 
We were reading about the IO scheduler. It works on groups and classes rather than the hardware constraints. It reorders requests between the classes to reduce spin latency. The IO scheduler manages costs from the requests as well as costs from the rebuild traffic. It does this using two mechanisms - request ordering and rate limiting.
If the blob sizes are uniform, the reordering is expressed in terms of the number of operations. If it is non-uniform, it is expressed in terms of wall clock time.



We now look at rate limiting methods. The rebuild operation interferes with the regular operations that are queued, hence the rate at which each queue is serviced is then controlled. The goal here is to make the rebuild operations complete within an upper time bound. This helps ensure data durability when the annual failure rates of the hardware are known.
If x is the amount of data on the failed disk and t is the average throughput of a single disk, then x / t is the time taken to transfer and if w is the fraction of the resources the scheduler allocates, then the total time to repair after a single disk failure is  (x / t times 1/w) .
This is used to compute the weights for two queues - one for rebuild and another for regular operations and the scheduler merely does weighted fair queuing.


Building bridges problem


Consider a 2D map with a river flowing from left to right through the center. There are n cities to the south of the river and same number on the northern bank. We have to bridge city i on the northern bank to city i on the southern bank however the order of cities can be random on both the northern and southern bank. We have to draw as many bridges without having them cross each other.

int GetBridges(List<int> cities)
{
int n = cities.Length;
int best = new List<int>(n);
for (int i =1; i < n; i++)
for (int j = 0; j < i j++)
{
    if (cities[i] < cities[j] && best[j] + 1 > best[i])
          best[i] = best[j] + 1;
}
return best.max();
}

return min (GetBridges(NorthernCities), GetBridges(SouthernCities));

Find the largest subarray with equal number of zeros and ones
Tuple<int, int> GetLargestSubArray(List<int> nums)
{
var ret = new Tuple<int, int>();
int sum = 0;
int size = INT_MIN;
for (int i = 0; i < nums.Count; i++)
{
   sum = (nums[i] == 0) ? -1: 1;
    int maxsize = -1, startindex;
   for (int j = i+1; j < nums.Count; j++)
   {
       if (nums[j] == 0)
            sum += -1;
       else
            sum += 1;
       if (sum == 0 && size < j-i+1)
      {
             size = j - i + 1;
             ret.first = i;
             ret.second = j;
      }
   }
}
return ret;  
}

Wednesday, July 27, 2016

Today we continue our discussion of the  paper titled "Pelican: a  building block for exascale cold data storage". Pelican treats a group of disks as a single schedulable unit. Resource restrictions such as power consumption, vibrations and failure domains are expressed as constraints over these units.
With the help of resource constraints and scheduling units, Pelican aims to be better than its over provisioned counterpart racks using computations in software stacks
Pelican uses resources from a set of resource domains which is a subset of disks. Pelican proposes a data layout and IO scheduling algorithms by expressing these resource domains as constraints over the disks. 
We were reading about the IO scheduler. It works on groups and classes rather than the hardware constraints. It reorders requests between the classes to reduce spin latency. The IO scheduler manages costs from the requests as well as costs from the rebuild traffic. It does this using two mechanisms - request ordering and rate limiting.
The scheduler does reordering to batch sets of operation for the same group to amortize the group spin up latency over the set of operations. The larger the batch size, the smaller the cost from the request but more the queuing delay for some operations. Let us see how they quantify this delay.
Queued operations have a timestamp. An operation r has a timestamp tr  and is assigned a reordering counter or. The difference between or and tr is the absolute change in ordering compared to first come first serve. There is an upper bound u on the tolerated re-ordering and each difference for every request must be within this limit. The scheduler first examines the queue to find l, the last operation in the same group as r. If there are no such operation, r is appended to the tail of the queue and the process completes. On the other hand, if there was an l then the impact of inserting r after l is determined. The scheduler performs a check to quantify the impact if r were inserted after l by considering all operations i following l. if for any of these operations i, the absolute change does not respect the upper bound, then r is appended to the tail. Otherwise all counters are incremented by 1 and r is inserted after l. Otherwise all oi counters are incremented by one, and r is inserted after l with its reordering counter reduced by the number of request i that it has jumped.
This remains true for all uniform blob sizes so the reordering is expressed in terms of the number of operations. If it is non-uniform, it is expressed in terms of wall clock time.
#codingexercise
Given a binary array and an integer m, find the position of zeroes flipping which creates maximum number of consecutive 1s in array.
The solution involves the following steps
while the number of zeros is no more than m, expand the window to the right and update the count of zeros
while the count of zeros exceeds m, shrink the window from the left and update the count of zeros
update the widest window along the way.
The positions of output zeros are inside the window.
void findZeroes(List<int> arr, int n, int m)
{
    int left = 0, right = 0; 
    int start = 0, window = 0; 
    int count = 0; 
    For (int i = 0; i  < n; i++)
    {
        if (count <= m)
        {
            if (arr[i] == 0){
              count++;
            }
            Right = i;
             
        }
        if (count > m)
        {
            if (arr[left] == 0)
              count--;
            left++;
        }
        if (right-left > window)
        {
            window = right-left;
            start = left;
        }
    }
    for (int i=0; i<window; i++)
    {
        if (arr[start+i] == 0)
           Console.writeline("{0}",start+i);
    }

}

#codingexercise
int getmin(int[] num, int n)
{
int min = INT_MAX;
for (int i = 0; i < num.count; i++)
     if (num[i] < min)
         min = num[i];
return min;
}



Today we continue our discussion of the  paper titled "Pelican: a  building block for exascale cold data storage". Pelican treats a group of disks as a single schedulable unit. Resource restrictions such as power consumption, vibrations and failure domains are expressed as constraints over these units.
With the help of resource constraints and scheduling units, Pelican aims to be better than its over provisioned counterpart racks using computations in software stacks
Pelican uses resources from a set of resource domains which is a subset of disks. Pelican proposes a data layout and IO scheduling algorithms by expressing these resource domains as constraints over the disks. 
We were reading about the IO scheduler. It works on groups and classes rather than the hardware constraints. It reorders requests between the classes to reduce spin latency. The IO scheduler manages costs from the requests as well as costs from the rebuild traffic. It does this using two mechanisms - request ordering and rate limiting.
The scheduler does reordering to batch sets of operation for the same group to amortize the group spin up latency over the set of operations. The larger the batch size, the smaller the cost from the request but more the queuing delay for some operations. Let us see how they quantify this delay.
Queued operations have a timestamp. An operation r has a timestamp tr  and is assigned a reordering counter or. The difference between or and tr is the absolute change in ordering compared to first come first serve. There is an upper bound u on the tolerated re-ordering and each difference for every request must be within this limit. The scheduler first examines the queue to find l, the last operation in the same group as r. If there are no such operation, r is appended to the tail of the queue and the process completes. On the other hand, if there was an l then the impact of inserting r after l is determined. The scheduler performs a check to quantify the impact if r were inserted after l by considering all operations i following l. if for any of these operations i, the absolute change does not respect the upper bound, then r is appended to the tail. Otherwise all counters are incremented by 1 and r is inserted after l. Otherwise all oi counters are incremented by one, and r is inserted after l with its reordering counter reduced by the number of request i that it has jumped.
This remains true for all uniform blob sizes so the reordering is expressed in terms of the number of operations. If it is non-uniform, it is expressed in terms of wall clock time.
#codingexercise
Given a binary array and an integer m, find the position of zeroes flipping which creates maximum number of consecutive 1s in array.
The solution involves the following steps
while the number of zeros is no more than m, expand the window to the right and update the count of zeros
while the count of zeros exceeds m, shrink the window from the left and update the count of zeros
update the widest window along the way.
The positions of output zeros are inside the window.
void findZeroes(List<int> arr, int n, int m)
{
    int left = 0, right = 0; 
    int start = 0, window = 0; 
    int count = 0; 
    while (right < n)
    {
        if (count <= m)
        {
            if (arr[right] == 0)
              count++;
            right++;
        }
        if (count > m)
        {
            if (arr[left] == 0)
              count--;
            left++;
        }
        if (right-left > window)
        {
            window = right-left;
            start = left;
        }
    }
    for (int i=0; i<window; i++)
    {
        if (arr[start+i] == 0)
           Console.rightite("{0}",start+i);
    }

}

Tuesday, July 26, 2016

Today we continue our discussion of the  paper titled "Pelican: a  building block for exascale cold data storage". Pelican treats a group of disks as a single schedulable unit. Resource restrictions such as power consumption, vibrations and failure domains are expressed as constraints over these units.
With the help of resource constraints and scheduling units, Pelican aims to be better than its over provisioned counterpart racks using computations in software stacks
Pelican uses resources from a set of resource domains which is a subset of disks. Pelican proposes a data layout and IO scheduling algorithms by expressing these resource domains as constraints over the disks. 
Groups are used instead of the underlying hardware constraints. The IO Scheduler in Pelican simply considers which class the group is in rather than the constraints on all the drives. If we increase the number of groups which totally collide, we also increase the number of independent groups leading to better throughput and lower latency.
Let us review next how this IO scheduler works. Pelican reorders requests in order to minimize the impact of spin latency. A set of groups form a class. Within a class only one group can be spinning at a time because of the domain conflicts. The remaining domains are not conflicted and form other classes of groups. In this case, the 48 groups divide into four classes of 12 groups and each class is independent from the others. An independent instance of the scheduler is run for each class and it services requests only for its class. Reordering is therefore class-level.
Traditional IO re-ordering attempts to order the request based on a continuous cost function. In contrast in Pelican there is a fixed constant cost of spinning up a group which is independent of the current set of disks spinning. The cost function is binary and is zero if the currently spinning group and the group to perform IO are the same and one otherwise.
The cost of a proposed schedule of IO request as the sum of the costs for each request. Only one group can be spun up at any time and if there are q requests queued then the cost is 0.92 q because the probability of two consecutive operations on different groups is 0.92 and the worst case is one where every request causes a group to spin. The goal of the IO scheduler is to try to minimize c. In the worst case each spin up incurs a latency of 8 seconds and in the best case only 8 requests are serviced per minute.
There's also cost from failures in addition to the success run.  A disk failure triggers a set of rebuild operations to regenerate the lost stripe-stacks. This rebuild requires activity on the group for a length of time equal to the data of the failed disk divided by the disk throughput. Rebuild traffic cannot be simply prioritized over the incoming requests. The IO scheduler addresses these two challenges using two mechanisms - request reordering and rate limiting.

#codingexercise
Given a Binary Search Tree (BST) and a range [min, max], remove all keys which are outside the given range. The modified tree should also be BST.

void Prune(ref node root, int min, int max)
{
if (root == null) return;
Prune(ref root.left, min, max);
Prune(ref root.right, min, max);
if (root.data < min)
{
node temp = root.right;
delete root;
root = temp
}
if (root.data > max)
{
node temp = root.left;
delete root;
root = temp
}
}

Monday, July 25, 2016

Today we continue our discussion of the  paper titled "Pelican: a  building block for exascale cold data storage". Pelican treats a group of disks as a single schedulable unit. Resource restrictions such as power consumption, vibrations and failure domains are expressed as constraints over these units.
With the help of resource constraints and scheduling units, Pelican aims to be better than its over provisioned counterpart racks using computations in software stacks
Pelican uses resources from a set of resource domains which is a subset of disks. Pelican proposes a data layout and IO scheduling algorithms by expressing these resource domains as constraints over the disks. 
Pelican uses a metadata service called Catalog that is durable and highly available. It maintains bookkeeping information such as the mapping from a blob key to the 18 disks and group which store metadata. It is updated during write, rebuild and delete requests and looked up during read requests.

#codingexercise
Write  a method to delete all the nodes from a binary tree that lie on a path whose sum from root to leaf is less than a given value K.  
Void delPathSums(Node root, int K, ref int cur) 
If (Root == null) return; 
Cur+= root.data; 
Int left = cur; 
Int right = cur; 
Root.left = delPathSums(root.left , K, ref int left); 
Root.right = delPathSums(root.right, K, ref int right); 
Cur += max(left, right); 
If (cur <k){ 
Tree_delete(root); 
Root = null; 
Return root; 
}

#DataDaySeattle2016 continued
Hi,

Here's the link to the file:

https://1drv.ms/w/s!Ashlm-Nw-wnWk3yjDmBBSqrd4PSr

Shared from Word for Android

https://1drv.ms/w/s!Ashlm-Nw-wnWk3y1Lm87oOphR2j6

Sunday, July 24, 2016

Today we continue our discussion of the  paper titled "Pelican: a  building block for exascale cold data storage". Pelican treats a group of disks as a single schedulable unit. Resource restrictions such as power consumption, vibrations and failure domains are expressed as constraints over these units.
With the help of resource constraints and scheduling units, Pelican aims to be better than its over provisioned counterpart racks using computations in software stacks
Pelican uses resources from a set of resource domains which is a subset of disks. Pelican proposes a data layout and IO scheduling algorithms by expressing these resource domains as constraints over the disks. 
The data layout algorithm of Pelican divides the disks into l groups. The value of l is selected such that it is the maximum when each group of size g with l  x g >= 1152 is such that g>= k + r. K is the number of fragments and r is the number of additional fragments containing redundancy information using a Cauchy Reed-Solomon erasure code. k+ r is the total number of fragments to store. and is referred to a stripe. Therefore it follows that each group must be as large as k + r 
One of the advantages of using groups is that it spans multiple failure domains. Disks belonging to a group are distributed across the trays and all the backplanes. In addition, groups reduce time required to recover from a failed disk because all the required data is contained within the same group.
Data is stored in unstructured immutable chunks called blobs. Blobs vary in size from 200MB to 1TB and each blob is uniquely identified by a 20 byte key. Pelican is designed to store blobs which are infrequently accessed. A blob is written to k+r disks in a single randomly selected group. 
Disks are selected to store the blog. They are first split into six sets each containing disks from the same backplane failure domain, then they are ordered on spare capacity and then the three disks with the highest spare capacity are selected.
#codingquestion
Int max(int[] nums)
{
Int max = int _min;
for(int i = 0; i < nums.count; i++)
      If (nums[i] > max)
           Max = nums[i];
Return max;
}

#DataDaySeattleContinued:
https://1drv.ms/w/s!Ashlm-Nw-wnWk3r3LxoLq77q9TSY 

Saturday, July 23, 2016

#codingexercise
Merge two binary search trees
Node Merge(Node tree1, Node tree2)
{
var in1 = new List<int>();
InOrder(Node tree1, ref in1);
var in2 = newList<int>();
InOrder(Node tree2, ref in2);
var result = in1.Merge(in2);
return ToBST(result, 0, result.count-1);
}

void InOrder (Node root, List<int> tree)
{
if (root == null) return;
InOrder(root.left);
tree.add(root.data);
InOrder(root.right);
}

Node ToBST(List<int> nums, int start, int end)
{
if (start > end) return null;
int mid = (start + end) / 2;
var root == new Node();
root.data = nums[mid];
root.left = ToBST(nums,start, mid-1);
root.right = ToBST(nums, mid+1, end);
return root;
}
Today we continue our discussion of the  paper titled "Pelican: a  building block for exascale cold data storage". Pelican treats a group of disks as a single schedulable unit. Resource restrictions such as power consumption, vibrations and failure domains are expressed as constraints over these units.
With the help of resource constraints and scheduling units, Pelican aims to be better than its overprovisioned counterpart racks using computations in software stacks
Pelican uses resources from a set of resource domains which is a subset of disks. Pelican proposes a data layout and IO scheduling algorithms by expressing these resource domains as constraints over the disks. 
Pelican reduces the disk layout problem from 1152 Choose 18 disks to one involving l ^ 2 pairs of logical groups.Furthermore, if one of the disks in one group collides with another group, all the disks of that group will collide with those in the other. This reduces the collision probability from g ^ 2 to being proportional to g. This is close to the lower bound on the domain collision probability because g is the number of cooling domains used by a group ( only one disk can be active per cooling domain and disks within a group are domain disjoint)
l and g can be determined in the following manner. we want to maximize the number l of groups of size g given l x g = 1152 and with g >= k + r where r is the groups large enough to store a stripe. In the current implementation the authors use g = 24 rather than g = 18 so that blob can entirely reside within a single group even after some disks have failed. Stripe stacks stored on failed drives are regenerated and stored on other drives in the group. This gives l = 48 which is then divided into 4 classes of 12 groups. Each class is independent of the other. 
Notice how the disk layout algorithm tries to form groups to improve the concurrency. In the paper on local reconstruction codes, there was another attempt to form groups to find reconstruction co-efficients. In this case each group holds within itself all the constraints because disks in  a group are all domain-disjointed which means they don't share any common resource domain.

Data day seattle talks:

https://1drv.ms/w/s!Ashlm-Nw-wnWk3Uo1l-ol2OcM4Nl