Cluster computing

Wednesday, July 13, 2016

#codingexercise
Find the number of islands of 1s in a binary 2d matrix. This problem is similar to the one in graph where we find connected components. In a 2d matrix, every cell has eight neighbors. A depth first search explores all these eight neighbors.
Int getIslands(int[,] m, int row, int col)
{
Var visited = new int[row,col];
Int count = 0;
For(int i =0; i < row; i++)
For(int j =0; j < col; j++)
If (m[i,j] == 1&& !visited[i,j])
{
DFS(m, ref visited, i,j);
Count++;
}
Return count;
}

Earlier we started reading about a paper titled "Pelican: a building block for exascale cold data storage". Cold data is the data that is rarely accessed. Pelican is a rack-scale hard-disk based storage unit designed as the basic building block for Exabyte scale storage for cold data. A rack-scale set of computers is a class of hardware with pooled storage, compute and network such that thousands of cores are interconnected at high speed. A rack is increasingly replacing the individual server as the basic building block of modern data centers. At Microsoft, the rack-scale computing, the Rack-scale computing project aims to redesign hardware, storage, networking and OS stacks in a cross layer co-design manner so as to improve efficiencies. By choosing a building block of this size, Pelican aims to make a building block of this size for cold data workloads. The provisioning is considered just right when the total cost of ownership reduces as compared to traditional disk-based storage. For example, only 8% of the drives can be concurrently spinning in this building block. This introduces complex resource management to be handled by Pelican.
This is met in the following way. Resource restrictions are expressed as constraints over the harddrive. The data layer and IO scheduling ensure that these constraints are not violated. Pelican and conventional storage are compared side by side using a cross validated simulator. Pelican demonstrates high throughput with acceptable latency for cold workloads.
The cost of storing data appears as a sliding scale. The storage in Amazon S3 for example is near the high end. But cold data is also growing phenomenally. Amazon Glacier and Facebook cold data storage are just a few examples where S3 doesn't suffice and cloud scale tiers need to be provisioned. Moreover these systems have to minimize up front capital costs of buying the storage as well as the costs of running the storage. By co-designing hardware and software stacks, Pelican aims to allow the entire rack's resources to be carefully balanced to support cold data workloads.
Pelican can withstand a peak sustainable read rate of 1 GB per second per PB of storage ( 1 GB/PB/sec) with 1GB blobs. A single rack can support upto 5 PB but since the rate is low, the entire data in Pelican can be transferred out every 13 days.
Pelican differs from a conventional rack in its behavior not structure. It uses a 52-U standard sized rack. It uses 40 Gbps of full duplex bandwidth and 1152 archive grade hard disks packed into the rack and using currently available drives of average size While a traditional storage rack would be provisioned for peak-performance involving all drives, Pelican allows only 96 drives to be spun up. All disks which are not spun up are in standby mode. However Pelican uses the software stack to give good performance even with restricted resources by using innovative algorithms to handle data layout and IO scheduling.
Pelican treats a group of disks as a single schedulable unit. Resource restrictions such as power consumption, vibrations and failure domains are expressed as constraints over these units.
There are 48 groups of such units and each disk is assigned to a group. Files are stored within a single group and the scheduling is done using groups rather than disks.
With the help of resource constraints and scheduling units, Pelican aims to be better than its overprovisioned counterpart racks.

Cluster computing

Wednesday, July 13, 2016

No comments:

Post a Comment