Cluster computing

Friday, April 1, 2016

Today we continue to review the paper Queues are Databases by Jim Gray.
He talks about message oriented middleware (MOM) and made the observation that queues are better inside the database for transactions. When queues are outside database, they pose performance and concurrency control problems. The mechanisms for concurrency control within a database include Read_Past, Read_Through, Notify,
Read_Past allows skipping over dirty records to find the first committed record and is used in dequeue(). Read_Through allows a program to examine records that have not yet been committed. Notify allows a program to wait for the state change in a lock.
Queue managers are simple TP-monitors
Queues may have workerpools associated with them. This helps to service the entries in the queue. TP monitors configure, manage and load balance these pools. Queued processing has many variants. These include periodic, event and batch. Queues may also have a scheduling policy.
When the queues are implemented as part of the database, triggers may be invoked to give a simple queued transaction processing system.
Queue managers can be built in an Object Relational database system
#coding exercise
Implement promises on message queues.
function promise ( function (resolve, reject){

});
The constructor sends a message to the queue
The callback receives a message from the queue.

Thursday, March 31, 2016

Today we review the paper Queues are Databases by Jim Gray.
He talks about message oriented middleware (MOM) and makes the following mentions:
Queued transaction processing is less general than direct transaction processing. Queued systems are built on top of direct systems. Direct, conversational or distributed transactions are not built on top of a queued system.
Queues are interesting databases with interesting concurrency control. He suggests building it into the database directly.
Queues systems need DBMS functionality. Queues need security, configuration, performance monitoring, recovery and reorganization utilities. Database systems already have these features.
Queue Managers are simple TP-monitors managing server pools driven by queues. Database systems are encompassing many server pool features as they evolve to TP-like system.
He mentions that there are three ACID units in queued transaction processing of a request, response : 1) Client places request in queue 2) Server dequeues request, performs task, enqueues response and 3) Requester dequeues response from output queue.
Due to these three units, a multistep dialog cannot be wrapped in a single ACID transaction unit. There is a lot of overhead in implementing distributed transactions on top of queues. By contrast, it is easier for a direct transaction processing systems to add a queue mechanism as a new transaction type.
He recognizes that queues are important when he says that queue processing is increasingly important for workflow, disconnected operation and replication applications. He adds that it has always been the mainstay of batch and spool operations. With the acronyms MOM for queues and DADs for databa, ses, he says both MOM and DAD are needed.
When the queues become part of databases, the queue is a database class that gets create(), enqueue(), dequeue(), poll() and destroy() methods. By using a database, the queue manager becomes a native resource manager with no special code for startup, shutdown, checkpoint, commit, query, security or utilities. The database does all the hard stuff such as locking, logging, access paths, recovery, utilities, security, performance monitoring and so on. Even the query manager gets query, backup, restore, reorganize and replicate data for free.
If the queues were outside database they would pose performance and concurrency control issues.
In cloud computing with a distributed message queue cluster, we could perhaps do without transactions and instead use idempotent incremental operations with retries.
#codingexercise
Implement a Futures interface on top of queues
void Promise(new Function(resolve, reject))
{

}

Wednesday, March 30, 2016

We continue reading the paper "Erasure coding in Windows Azure storage" by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. It introduces new codes called the Local Reconstruction Codes.A (k,l,r) LRC divides k data fragments into l local groups. It encodes l local parities, one for each local group and r global parities. We were reviewing the performance of LRC in WAS. We now review related work. Erasure coding has been applied in many large scale distributed storage systems. Its better than simple replication because it provides higher reliability and it requires much lower storage for the same reliability. Even when the distributed systems apply erasure coding, they typically apply only Reed-Solomon. LRC has proved better than that both in terms of efficiency and performance tradeoff. We also read on related work and how LRC differs from those. Reed-Solomon codes are maximum distance separable (MDS) codes which require minimum storage overhead for given fault tolerance. LRC is not MDS and requires additional storage overhead for the same fault tolerance.This is exploited for efficient reconstruction. This tradeoff between storage overhead and reconstruction efficiency is also attempted by other systems such as Weaver codes, HoVer codes and Stepped Combination codes. LRC achieves better trade-offs than all of the above to meet WAS erasure coding design goals. One of the alternatives to reconstruction is to read from more fragments instead of fewer fragments but to reduce the amount read from each fragment.However, this does not have the same savings in terms of I/O and bandwidth as LRC does.
Thus we have seen that erasure coding reduces the cost of cloud storage, where our target storage overhead is 1.33x of the original data.The challenge in erasure coding is the fast reconstruction of offline data fragments such as due to disk, node, rack or switch failures as well as during upgrades. LRC reduces the number of fragments to read and perform the reconstruction. It has slightly more overhead but comparable latency for small I/Os and better latency for large I/Os At the same time it provide better durability than the traditional approach of keeping 3 copies. Lastly we reviewed some of the implementation and design considerations.
#codingexercise
Reed Solomon corrected errors theoretically by finding the most popular message polynomial. The potential polynomials are repeatedly produced using subsets. How many subsets are there
Solution : The number of subsets is the binomial coefficient n choose k.
int getSubsets (int n, int k)
{
return Factorial(n) / (Factorial(n-k) * Factorial(k));
}

Tuesday, March 29, 2016

We continue reading the paper "Erasure coding in Windows Azure storage" by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. It introduces new codes called the Local Reconstruction Codes.A (k,l,r) LRC divides k data fragments into l local groups. It encodes l local parities, one for each local group and r global parities. We were reviewing the performance of LRC in WAS. We now review related work. Erasure coding has been applied in many large scale distributed storage systems. Its better than simple replication because it provides higher reliability and it requires much lower storage for the same reliability. Even when the distributed systems apply erasure coding, they typically apply only Reed-Solomon. LRC has proved better than that both in terms of efficiency and performance tradeoff. Moreover, LRC reduces costs. For example, node failures trigger reconstruction which is a significant cost. No data is lost on such transient errors and these account for nearly 90% of data center failures. During these periods as well as upgrades, even though there is no background data rebuilding, reads accessing these unavailable nodes are served through reconstruction. Some system techniques like load balancing and prioritization help but LRC explores whether the erasure coding scheme itself can be optimized to reduce repair traffic and improve user I/Os. In addition to these benefits, LRC exploits non-uniform parity degrees, where some parities connect to fewer data nodes than others. This facilitates efficient reconstruction and follows the new trend in the storage industry. Previous work applied enumeration and heuristic methods to search for parity check erasure codes of small length, Due to exponential search space, the exploration was limited to 3,4 and 5 parities. These could not tolerate arbitrary three failures where as LRC can.
#codingexercise
List<int> GetReedSolomon(Int[,] coefficients, List<String> message, int n)
{
var Cx = new List<int>();
for (int i = 0; i < n; i++)
{
Cx.Add(getPolynomial(coefficients, ref message));
}
return Cx;
}
String getPolynomial( Int[,] coefficients ref List<string> message)
{
int sum = 0;
for (int i = 0; i <message.count; i++)
sum += get_coefficient(coefficients, i-1)*message[i];
return sum;

}

Monday, March 28, 2016

Sunday, March 27, 2016

We continue reading the paper "Erasure coding in Windows Azure storage" by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. It introduces new codes called the Local Reconstruction Codes.A (k,l,r) LRC divides k data fragments into l local groups. It encodes l local parities, one for each local group and r global parities. We were discussing data consistency checks and the use of CRC. We now look at performance considerations in WAS. Storage is in the firm of blobs, tables, queues and drive. Application have different workloadstyles with smaller size I/Os for tables and queues and larger I/Os for blobs and drives, typically greater than 4MB. A production cluster was used with significant load variations over time. The LRC was compared with ReedSolomon codes. The key metric for small I/Os was latency and the number of I/Os made by the requests. The key metric for large I/Os was latency and the bandwidth. In the small I/O case the experiments were run with a mixture of direct read of a single fragment reconstruction reads. When the cluster load is light all the reconstruction cost is very low. The reconstruction reads performed poorly due to effect of the slowest fragments. LRC performed better than Reed Solomon due to the lower number of fragments used.
The results from the large I/O were very different from the small I/O case. Even when the cluster load is light, the reconstruction with Erasure coding us much slower than direct read cost due to the bandwidth consumed. When the cluster load was heavy, the latency came from the network card. Even here the LRC performed better than Reed Solomon due to the reduced number of fragments.
#coding exercise
Implement Galois field polynomial addition or subtraction
Polynomial is represented as coefficient and power pairs in as ending order
They are all grouped and operated.
List <Term> AddPolynomial ( List <Term> first, List <Term> Second)
{
int ret = new List < Term>();
int max = first.last.power;
If ( max < second.last.power )
max = second.last.power
For (int I = 0; I <= max; I++)
{
var firstTerm = first.find( power = i); // returns empty if not found
var secondTerm = second.find(power = i);
Ret.add ( new Term (firstTerm.coefficient + secondTerm.coefficient, i));
}
Return ret;
}

Saturday, March 26, 2016

We continue reading the paper "Erasure coding in Windows Azure storage" by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. It introduces new codes called the Local Reconstruction Codes.A (k,l,r) LRC divides k data fragments into l local groups. It encodes l local parities, one for each local group and r global parities. We saw how to distribute the fragments in LRC for WAS. We were also reviewing the designing for Erasure Coding where we saw how various I/O types required to be scheduled. We also saw the reconstruction read-ahead and caching. Now we read the consistency of coded data. This is largely done to mitigate data corruption which can occur when data is at rest, being read and written or in transit. Therefore it is essential to check the consistency of data in every step of the storage system and to periodically scrub the data at rest. This is done with checksum and parity which are two primary mechanisms to protect against data corruption.In Windows Azure Storage each append block contains a CRC of the data block which can help detect data and metadata corruptions. If the CRC check fails, then the data is reconstructed using other combinations of erasure coded fragments on the next available EN.
To ensure that the erasure coding operation does not itself introduce data inconsistency, each operation is validated by a few decoding operations. For LRC, these validations include randomly choosing one data fragment in each local group and reconstruct it using its local group, doing the same with one global parity and the remaining data fragments, doing it with the other global parity and the remaining data fragments, choosing two data fragments and reconstructing them, choosing three data fragments and reconstructing them and randomly choosing four data fragments. Finally the coordinator EN performs a CRC of all of the final data fragments. and checks this CRC against that of the original extent Together all these checks ensure data consistency.
Erasure Coding involves Galois Field arithmetic but using it directly is quite expensive. Instead addition and multiplication tables are used. Reed Solomon codes are better implemented with the use of XOR operations and more so by ordering the XOR operations based on the patterns in the coding and decoding matrices. This way some redundancies and repetitions are avoided.
#codingexercise
Implement a 3 bit CRC on a 14 bit message
List<Bit> GetCRC(List<bit> message, List<bit> polynomial, int n)
{
for (int i = 0; i < n; i ++) message.add(0); // pad right to n
int last = message.Length - polynomial.Length;
for (int start = 0; start < last; start++)
if (message.Value() != 0)
for (int i = start; < polynomial.Length; i++) message[i] = message[i] ^ polynomial[i];
return message.GetRange(last. n);
}