Cluster computing

Sunday, March 3, 2019

We were discussing the implementation of a ledger with our first and second posts. We end it today.
There have been two modes to improve sequential access. First is the batch processing which allows data to be partitioned in batches on parallel threads. This works very well for data where the results can also behave as if they were data for the same calculations and the data can be abstracted into batches This is called summation form. Second, the batches can be avoided if the partitions are tiled over. This is called streaming access and it uses a window over partitions to make calculations and adjust them accordingly as the windows slides over the data in a continuous manner. This works well for data which is viewed as continuous and limitless such as from a pipeline.
Operations on the writer side too can be streamlined when it has to scale to large volumes. Some form of parallelization is also used here after the load is split into groups of incoming requests. To facilitate faster and better ledger writes, they are written once and as detailed as possible to avoid conflicts with others and enable more operations to be read-only. This separation of read-write and read-only activities on the ledger improve not only the ledger but also let it remain the source of truth. Finally, ledgers have grown to be distributed even while most organizations continue to keep the ledger in-house and open up only for troubleshooting, inspection, auditing and compliance.
Translations are one of the most frequently performed operations in the background. An example of translation is one where two different entries are to be reconciled the same as one uniform entry. These entries so that the calculations can be simpler.
Some of these background operations involve forward only scanning of a table or list with no skipping. They achieve this with the help of a progress marker for themselves where they keep track of the sequence number that they last completed their actions on. This works well in the case where the listing order remains unchanged.
Let us consider a case where this progressive scan may skip range. Such a case might arise when the listing is ordered but not continuous. There are breaks in the table as it gets fragmented between writes and the scanner does not see the writes between the reads. There are two ways to handle this. The first way to handle it is to prevent the write between the reads. This can be enforced with a simple sealing of the table prior to reading so that the writes cascade to a new page. The second way is to revisit the range and see if the count of processed table entries matches the sequence and redo it when it doesn’t agree. Since the range is finite, the retries are not very expensive and requires no alteration of the storage. Both approaches will stamp the progress marker at the end of the last processed range. Typically there is only one progress marker which moves from the ends on one range to the next.
Sometimes it is helpful to take actions to check that the table is stable and serving even for analysis. A very brief lock and release is sufficient in this regard.

#codingexercise
int GetPermutations (int n, int k) {
If (n == 0 || k > n) return 0;
if (k == 0 || k == n)
return 1;
return Factorial (n) / Factorial (n-k);
}

Cluster computing

Sunday, March 3, 2019

No comments:

Post a Comment