Wednesday, August 3, 2016

Today we continue our discussion of the  paper titled "Pelican: a  building block for exascale cold data storage". Pelican treats a group of disks as a single schedulable unit. Resource restrictions such as power consumption, vibrations and failure domains are expressed as constraints over these units.
With the help of resource constraints and scheduling units, Pelican aims to be better than its over provisioned counterpart racks using computations in software stacks
Pelican uses resources from a set of resource domains which is a subset of disks. Pelican proposes a data layout and IO scheduling algorithms by expressing these resource domains as constraints over the disks. We were discussing implementation considerations. Now we discuss evaluation of Pelican for which a discrete event based Pelican simulator was developed and cross validated against the storage stack. The simulator was cross validated using micro benchmarks from the rack.The mount delay distributions were generated  by taking samples of mount latencies during different load regimes on the Pelican. The CDF was plotted against spin up delay (ms), mount delay(ms), unmount delay (ms), Blob size(MB), Volume capacity (TB). The CDF stands for cumulative distribution function and is expressed as Fa,b(x) on a uniform distribution of [a,b] as  
Fa,b(x) = 0 for x<a,
Fa,b(x) = (1/ (b-a)) (x-a) for A<=x<=b,
Fa,b(x) =  1 for x> b,
It has a monotonic property which means that it is non-decreasing and lies in the range 0 to 1 which makes it great for charts!
The delays for spin up and the mounting and unmounting delays for a drive after spin up was measured. The disks were spun up and down 100,000 times to measure the spin up delay and the unmount delay. Both the charts against CDF  for spin up and unmounting delays showed a large increase initially and near saturation afterwards indicating that the distribution remained uniform. This was different for mount delays where distribution was widely affected by the load on the system. In general, to study the effect of workloads, Pelican was measured under different load regimes. 
The disk throughput was measured using a quiescent system. Seeks are simulated by including a constant latency of 4.2 ms for all disk accesses and it was found that seek latency has negligible impact on performance. It was also found that the latency of spinning the disk up and mounting the volume heavily dominate over seek time. 
As the blob size increased, the CDF showed irregular increase as well. This was probably due to the fact that large blobs were for a single contiguous file on disk. The simulator used a distribution of disk sizes as a normal distribution.
#codingexercise
Yesterday we were discussing counting number of squares within  a matrix with an alternative approach. This included iteration like this:
int  GetCountSquaresFromPosition(int n, int m, int startrow, int startcol)
{
int count = 0;
for (int i =startrow; i <= M; i++){
   for (int j = startcol; j <=N; j++){
     if (i == startrow && j == startcol) continue;
     if (i-startrow+1 = = j-startcol+1)
        count++;
   }
}
return count;

}
Now we can use this method with each cell as the starting position:
void GetCountSquares(int n, int m)
{
int count = 0;
for (int i = 1; i <= m; i++)
   for(int j = 1; j <=n; j++)
      count += GetCountSquaresFromPosition(n,m,i,j);
return count;
}

No comments:

Post a Comment