Cluster computing

Thursday, March 7, 2019

Today we continue discussing the best practice from storage engineering:

539) From supercomputers to large scale clusters, the size of compute, storage and network can be made to vary quite a bit. However, the need to own or manage such capability reduces significantly once it is commoditized and outsourced.

540) Some tasks are high priority and are usually smaller in number than the general class of tasks. If they arrive out of control, it can be significant cost. Most storage products try to control the upstream workload for which they are designed. For example, if the tasks can be contrasted significantly, it can be advantageous.

541) The scheduling policies for tasks can vary from scheduler to scheduler. Usually a simple policy scales much better than complicated policies. For example, if all the tasks have a share in a pie representing the scheduler, then it is simpler to expand the pie rather than re-adjusting the pie slices dynamically to accommodate the tasks.

542) The weights associated with tasks are set statically and then used in computations to determine the scheduling of the tasks. This can be measured in quantums of time and if a task takes more than what is expected, it is called a quantum thief. A scheduler uses tallying to find and make a quantum thief yield to other tasks.

543) Book-keeping is essential for both scheduler and allocator not only to keep track of grants but also for analysis and diagnostics.

544) A scheduler and allocator can each have their own manager that separates the concerns of management from their work

545) The more general purpose the scheduler and allocator become, the easier it is to use them in different components. Commodity implementations win hands down against specialized ones because they scale.

546) The requests for remote resources are expected to perform longer than local operations. If they incur timeouts, the quantum grants may need to stretch over.

547) Timeout must expand to include timeouts from nested operations.

548) Some event notification schemes are helpful to handle them at the appropriate scope.
549) A recovery state machine can help with global event handling for outages and recovery.
550) The number of steps taken to recover from outages can be reduced by dropping scoped containers in favor of standby
551) Adding and dropping containers are easy to address cleanup.
552) The number of replication groups is determined by the data that needs to be replicated. Generally there is very little data

Wednesday, March 6, 2019

The operation on inactive entries
Recently I came across an unusual problem of maintaining active and inactive entries. It was unusual because there were two sets and they were updated differently and at different times. Both the sets were invoked separately and there was no sign whether the sets were mutually exclusive. Although we could assume the invocations of operations on the sets were from the same component, they were invoked in different iterations. This meant that the operations taken on the set would be repeated several times.
The component only took actions on the active set. It needed to take an action on the entries that were inactive. This action was added subsequent to the action on the active set. However, the sets were not mutually exclusive so they had to be differentiated to see what was available in one but not the other. Instead this was overcome by delegating the sets to be separated at the source. This made it a lot easier to work with the actions on the sets because they would be fetched each time with some confidence that they would be mutually exclusive. There was no confirmation that the source was indeed giving up to date sets. This called for a validation on the entries in the set prior to taking the action. The validation was merely to check if the entry was active or not.
However, an entry does not remain in the same set forever. It could move from the active set to the inactive set and back. The active set and the inactive set would always correspond to their respective actions. This meant that the actions needed to be inverted between the entries so that they could flip their state between the two processing.
There were four cases for trying this out. The first case was when the active set was called twice. The second case was when the inactive set was called twice. The third case was when the active set was followed by the inactive set. The fourth case was when the inactive set was followed by the active set.
With these four cases, the active and the inactive set could have the same operations taken deterministically no matter how many times they were repeated and in what order.
The only task that remained now was to ensure that the sets returned from the source were good to begin with. The source was merely subscribed to events that added entries to the sets. However, the events could be called in any order and for arbitrary number of times. The event handling did not all exercise the same logic so the entries did not appear final in all the cases. This contributed to the invalid entries in the set. When the methods used to retrieve the active and inactive set were made consistent, deterministic, robust and correct, it became easier to work with the operations on the set in the calling component.
This concluded the cleaning up of the logic to handle the active and inactive sets.
#codingexercise
Selecting four from a set of n:
double GetNChooseK(double n, double k)
{
if (k <0 || k > n || n = 0) return 0;
if ( k == 0 || k == n) return 1;
return Factorial(n) / (Factorial(n-k) * Factorial(k));
}

GetNChooseK (n, 4);

Tuesday, March 5, 2019

Monday, March 4, 2019

Today we continue discussing the best practice from storage engineering:

537) The number of times a network is traversed also matters in the overall cost for data. The best cost for data is when data is at rest rather than in transit.

538) The choice between a faster processor or a large storage or both is a flexible choice if the dollar value is the same. In such cases, the strategy can be sequential, streaming or batched. Once the strategy is in place, the dollar TCO significantly increases when business needs change.

539) From supercomputers to large scale clusters, the size of compute, storage and network can be made to vary quite a bit. However, the need to own or manage such capability reduces significantly once it is commoditized and outsourced.

540) Some tasks are high priority and are usually smaller in number than the general class of tasks. If they arrive out of control, it can be significant cost. Most storage products try to control the upstream workload for which they are designed. For example, if the tasks can be contrasted significantly, it can be advantageous.

541) The scheduling policies for tasks can vary from scheduler to scheduler. Usually a simple policy scales much better than complicated policies. For example, if all the tasks have a share in a pie representing the scheduler, then it is simpler to expand the pie rather than re-adjusting the pie slices dynamically to accommodate the tasks.

542) The weights associated with tasks are set statically and then used in computations to determine the scheduling of the tasks. This can be measured in quantums of time and if a task takes more than what is expected, it is called a quantum thief. A scheduler uses tallying to find and make a quantum thief yield to other tasks.

543) Book-keeping is essential for both scheduler and allocator not only to keep track of grants but also for analysis and diagnostics.

544) A scheduler and allocator can each have their own manager that separates the concerns of management from their work

545) The more general purpose the scheduler and allocator become, the easier it is to use them in different components. Commodity implementations win hands down against specialized ones because they scale.

Sunday, March 3, 2019

We were discussing the implementation of a ledger with our first and second posts. We end it today.
There have been two modes to improve sequential access. First is the batch processing which allows data to be partitioned in batches on parallel threads. This works very well for data where the results can also behave as if they were data for the same calculations and the data can be abstracted into batches This is called summation form. Second, the batches can be avoided if the partitions are tiled over. This is called streaming access and it uses a window over partitions to make calculations and adjust them accordingly as the windows slides over the data in a continuous manner. This works well for data which is viewed as continuous and limitless such as from a pipeline.
Operations on the writer side too can be streamlined when it has to scale to large volumes. Some form of parallelization is also used here after the load is split into groups of incoming requests. To facilitate faster and better ledger writes, they are written once and as detailed as possible to avoid conflicts with others and enable more operations to be read-only. This separation of read-write and read-only activities on the ledger improve not only the ledger but also let it remain the source of truth. Finally, ledgers have grown to be distributed even while most organizations continue to keep the ledger in-house and open up only for troubleshooting, inspection, auditing and compliance.
Translations are one of the most frequently performed operations in the background. An example of translation is one where two different entries are to be reconciled the same as one uniform entry. These entries so that the calculations can be simpler.
Some of these background operations involve forward only scanning of a table or list with no skipping. They achieve this with the help of a progress marker for themselves where they keep track of the sequence number that they last completed their actions on. This works well in the case where the listing order remains unchanged.
Let us consider a case where this progressive scan may skip range. Such a case might arise when the listing is ordered but not continuous. There are breaks in the table as it gets fragmented between writes and the scanner does not see the writes between the reads. There are two ways to handle this. The first way to handle it is to prevent the write between the reads. This can be enforced with a simple sealing of the table prior to reading so that the writes cascade to a new page. The second way is to revisit the range and see if the count of processed table entries matches the sequence and redo it when it doesn’t agree. Since the range is finite, the retries are not very expensive and requires no alteration of the storage. Both approaches will stamp the progress marker at the end of the last processed range. Typically there is only one progress marker which moves from the ends on one range to the next.
Sometimes it is helpful to take actions to check that the table is stable and serving even for analysis. A very brief lock and release is sufficient in this regard.

#codingexercise
int GetPermutations (int n, int k) {
If (n == 0 || k > n) return 0;
if (k == 0 || k == n)
return 1;
return Factorial (n) / Factorial (n-k);
}

Saturday, March 2, 2019

Today we continue discussing the best practice from storage engineering:

528) Live updates versus backup traffic, for instance, qualify for separate products. Aging and tiering of data also qualify for separate storage. Data for reporting can similarly be separated into its own stack. Generated data that drains into logs can similarly feed diagnostic stacks.

529) The number of processors or resources assigned to a specific stack is generally earmarked with T-shirt sizing. This is helpful for cases where the increment or decrement of resources doesn’t have to be done by a notch by notch level.

530) Public cloud and hybrid cloud storage discussions are elaborated on many forums. The hybrid storage provider is focused on letting the public cloud appear as front-end to harness the traffic from the users while allowing storage best practice for the on-premise data.

531) Data can be pushed or pulled from source to destination. If it’s possible to pull, it helps in relieving the workload to another process.

532) Lower level data transfers are favored over higher level data transfers involving say HTTP.

533) The smaller the data transfers the larger the number which results in more chatty and potentially fault prone traffic. We are talking about very small amount of data per request.

534) The larger size reads and writes are best served by multiple parts as opposed to long running requests with frequent restarts

535) The up and down traversal of the layers of the stack are expensive operations. These need to be curtailed.

static int GetNChooseKDP(int n, int k)
{
if ( n == 0 || k > n)
return 0;
if (k == 0 || k == n)
return 1;
return GetNChooseKDP(n - 1, k - 1) + GetNChooseKDP(n - 1, k);
}

Friday, March 1, 2019

Today we continue discussing the best practice from storage engineering :

517) Customers also prefer ability to switch products and stacks. They are willing to try out new solutions but have become increasingly wary of tying to any one product or the increasing encumbrances

518) Customers have a genuine problem with data being sticky. They cannot keep up with data transfers

519) Customers want the expedient solution first but they are not willing to pay for re- architectures

520) Customers need to evaluate the cost of even data transfer over the network. Their priority and severity is most important to them.

521) Customers have concerns with the $/resource whether it is network, compute or storage. They have to secure ownership of data and yet have it spread out between geographical regions. This means they have trade-offs from the business perspectives rather than the technical perspectives

522) Sometimes the trade-offs are not even from the business but more so from the compliance and regulatory considerations around housing and securing data. Public cloud is great to harness traffic to the data stores but there are considerations when data has to be on-premise.

523) Customers have a genuine problem with anticipating growth and planning for capacity. The success of an implementation done right enables future prospect but implementations don’t always follow the design and it is also hard to get the design right.

524) Similarly, customers cannot predict what technology will hold up and what won’t in the near and long term. They are more concerned about the investments they make and the choices they have to face.

525) Traffic, usage and patterns are good indicators for prediction once the implementations is ready to scale.