Cluster computing

Monday, January 7, 2019

Today we continue discussing the best practice from storage engineering:

295) A RAID level 4 uses a block inter-leaved parity with a striping unit of a disk block. Block-level striping has the advantage that read requests the size of a disk block can be served entirely by the disk where the requested block resides. The write of a single block still requires a read-modify-write cycle, but only one data disk and the check disk are involved and the difference between the old data block and the new data block is noted

296)
A RAID level 5 uses a block inter-leaved distributed parity. The parity blocks are distributed uniformly over all disks, instead of sorting them on a single check disk. This has two advantages. First, several write requests potentially be processed in parallel, since the bottleneck of a unique check is removed. Second, read requests have a higher degree of parallelism. This level usually has the best performance.

297) A RAID level 6 uses P+Q redundancy. Recovery from the failure of a single disk is usually not sufficient in very large disk arrays. First, a second disk might fail before replacement and second the probability of a second disk failing is not negligible. A RAID level 6 system uses Reed-Solomon codes to be able to recover from two simultaneous disk failures.

298) A RAID level 10+0 is a stripe of RAID1+0. In this array of RAID1+0, the RAID0 is implemented in software while RAID1+0 is implemented in hardware.

299) A RAID1+0, RAID3+0, RAID5+0, RAID6+0 and RAID10+0 are referred to as nested RAIDs and hybrid RAIDs and their usage diminishes with the growing and now established popularity of Network Attached Storage and cluster based partitioned approach. Blade servers with consolidated storage and centralized management have facilitated cutting VM slices.

300) Disks compete not only with other disks but also with other forms of storage such as Solid-State Drives.

Sunday, January 6, 2019

Today we continue our discussion on storage engineering

291) A RAID level 0 uses data striping to increase maximum bandwidth available. No redundant information is maintained

292) A RAID level 1 uses two copies of the data. This type of redundancy is often called mirroring There can be combinations of level 0 and 1 where like in level 1, read requests can be scheduled to both the disk and its mirror image and bandwidth for contiguous blocks is improved from the aggregation of all the disks.

293) A RAID level 2 uses a striping unit as a single bit. The number of check disks grows logarithmically with the number of data disks.

294) A RAID level 3 uses a bit inter-leaved parity where it keeps more redundant information than is necessary. Instead of using several disks to store hamming code that informs which disk has failed, we rely on that information from the disk controller and use a single check disk with parity information which is the lowest overhead possible.

295) A RAID level 4 uses a block inter-leaved parity with a striping unit of a disk block. Block-level striping has the advantage that read requests the size of a disk block can be served entirely by the disk where the requested block resides. The write of a single block still requires a read-modify-write cycle, but only one data disk and the check disk are involved and the difference between the old data block and the new data block is noted

296) A RAID level 5 uses a block inter-leaved distributed parity. The parity blocks are distributed uniformly over all disks, instead of sorting them on a single check disk. This has two advantages. First, several write requests potentially be processed in parallel, since the bottleneck of a unique check is removed. Second, read requests have a higher degree of parallelism. This level usually has the best performance.

Saturday, January 5, 2019

Today we continue discussing the best practice from storage engineering:

286) File Systems may implement byte range locking to enable concurrent access. Typically, they are not supported by File mapping operation. Poor use of file locks can result in performance issues or deadlock.

287) Disks are potential bottlenecks for system performance and storage system reliability. If the disk fails, the data is lost. A disk array is used to increase performance and reliability through data striping and redundancy.

288) Instead of having a single copy of data, redundant information is maintained and carefully organized so that in the case of a disk failure, it can be used to reconstruct the contents of the failed disk. These redundant array of independent disk organizations are referred as RAID levels and each level represents a tradeoff between reliability and performance.

289) In Data Striping, the data is segmented into equal-size partitions that are distributed over multiple disks. The size of the partition is called the striping unit. The partitions are usually distributed using a round robin mechanism.

290) In Redundancy, if the mean time to failure of a single disk is about a few years, it is smaller for a disk array. Hence check disks and parity schemes are involved to improve reliability. The check disk contains information that can be used to recover from failure of any one disk in the array. This group of data disks and check disks together constitute reliability groups.

#codingexercise
List <string> getDiskGroups (List <String> disks) {
return disks.stream ().map (x -> getGroup (x)).distinct ().collect (Collectors.toList ());
}

Friday, January 4, 2019

Today we continue discussing the best practice from storage engineering :

280) Optimistic concurrency control was introduced to allow each transaction to maintain histories of reads and writes so that those causing isolation conflicts can be rolled back.

281) Shared-memory systems have been popular for storage products. They include SMPs, multi-core systems and a combination of both. The simplest way to use it is to create threads in the same process. Shared-memory parallelism is widely used with big data.

282) Shared-Nothing model supports shared-nothing parallelism. When each node is independent and self-sufficient, there is no single point of contention. None of the nodes share memory or disk storage. Generally, these compete with any model that has a single point of contention in the form of memory or disk space.

283) Shared-Disk: This model is supported where a large space is needed. Some products implement shared-disk and some implement shared-nothing. Shared-nothing and shared-disk do not go together in the same code base.

284) The implementation of a content-distribution network such as for images or videos generally translates to random disk reads which means caching may not always help. Therefore, the disks that are RAIDed are tuned. It used to be a monolithic RAID 10 when it is served from a a single master with multiple slaves. Instead nowadays a sharded approach is taken and preferably served from Object Storage.

285) Image and video libraries will constantly run into cache misses especially with slow replication. It is better to separate traffic to different cluster pools. The replication and caching into the picture to handle the load. WIth a distribution to different cluster pools, we can distribute the load and avoid them.

Thursday, January 3, 2019

Today we continue discussing the best practice from storage engineering:

275) Workloads that are not well-behaved may be throttled till they are well-behaved. A workload with high request rate is more likely to be throttled. The opposite is also true.

276) Serializability of objects enables reconstruction on the remote destination. It is more than a protocol for data packing and unpacking on the wire. It includes constraints that enable data validation and helps prevent failures down the line. If the serialization includes encryption, it becomes tamper proof.

277) Serializability is also the notion of correctness when simultaneous updates happen to a resource. When multiple transactions commit their actions, their result can correspond the one from a serial execution of some transactions. This is very helpful to eliminate inconsistencies across transactions. Serializability differs from isolation only in that the latter tries to do the same from the point of view of a single transaction.

278) Databases were veritable storage systems that guaranteed transactions. Two-phase locking was introduced with transactions where a shared lock was acquired before read and an exclusive lock before write. The two-phase referred to intent and acquisition. With transactions blocking on a wait queue, this was a way to enforce serializability

279) Transaction locking and logging proved onerous and complicated. Multi-Version Concurrency control was brought in for the purpose of not acquiring locks. With consistent view of data at some points of tie in the past, we no longer need to keep track of every change made since the latest such point of time

280) Optimistic concurrency control was introduced to allow each transaction to maintain histories of reads and writes so that those causing isolation conflicts can be rolled back.

Wednesday, January 2, 2019

Today we continue discussing the best practice from storage engineering:

265) Most of the log-based replication methods are proprietary. A standard for this is hard to enforce and accepting all proprietary formats is difficult to maintain.

266) Statistics gathering: Every accounting operation within the storage product uses some form of statistics such as summation to building histograms and they inevitably take up memory especially if they can’t be done in one-pass. Some of these operations were done as aggregations that were synchronous but when the size of the data is very large, it was translated to batch or stream operations. With the SQL statement like query using partition and over, smaller chunks were processed in an online-manner. However, most such operations can be delegated to the background.

267) Index reconstruction: Users may request that data be re-organized in the background such as sorting them on different attributes or to repartition them across multiple disks. Online re-organization of files is very inefficient and costly to the user. Therefore, some form of separation is called for.

268) Physical re-organization: With data accesses over time that require multiple insertions and deletions, that storage on disk may become fragmented. In order to overcome this, routine reorganization becomes necessary. The same holds for index reconstruction which are fairly expensive and generally done at the restart of a service. Out of rotation methods where one index is fully reconstructed prior to switching is also tolerated in some cases.

269) Backup/Export: All storage products enable data to be imported and exported. Backup and replication are some of the common techniques to export the data. Since they are long running processes, they cannot take locks. Instead a fuzzy dump is taken and then the logs are processed to meet some form of consistency.

270) Queries consume a lot of resources that conflict with the read-write operations on the data path. When they cannot be separated, they must allow prioritization of queries and the tolerance of the elapsed time of long running queries.

Tuesday, January 1, 2019

Today we continue discussing the best practice from storage engineering :

261) Hardware techniques for replication are helpful when the inventory is something, we can control along with the deployment of the storage product. Even so, there has been a shift to software defined stacks and replication per se is not required to be hardware implemented any more. If it is offloaded to hardware, there is a total cost of ownership that increases so it must be offset with some gains.

262) However, the notion of physical replication when implemented in the software stack is perhaps the simplest of all. If the data is large, the time to replicate is proportional to the bandwidth. Then there are costs to reinstall the storage container and making sure it is consistent. This is therefore an option for the end users and typically a client-side workaround.

263) The notion of trigger-based replication is the idea of using incremental changes as and when they happen so that only that is propagated to the destination. The incremental changes are captured and shipped to the remote site and the modifications are replayed there.

264) The log-based replication is probably the most performant scheme where the log is actively watched for data changes which are intercepted and sent to the remote system. In this technique the log may be read and the data changes may be passed to the destination or the log may be read and the captures from the logs may be passed to the destination. This technique is performant because it has a low overhead.

265) Most of the log-based replication methods are proprietary. A standard for this is hard to enforce and accepting all proprietary formats is difficult to maintain.
Conclusion: Storage engineering has demonstrated consistent success in the industry with these and more salient considerations. We can find manifestations in the tiniest to the largest products.