Cluster computing

Sunday, March 10, 2019

Today we continue discussing the best practice from storage engineering:

565) The focus on business value does not remain confined to the people on the border with the customers. It comes from deep within product development and engineering.

566) The storage product can relieve compute altogether where results are computed once and saved for all subsequent usages. This works well for data that does not change over time.

567) When the data changes frequently, it helps to organize it in a way such that those that don’t change are on one side and those that do are on the other side. This helps to making incremental results from the data.

568) Data will inevitably have patterns with reuse. We can call them sequences. While most data might be stored with general purpose btree, the sequences call for more efficient data structures such as radix tree. These help insert and lookup sequences easier.

569) Sequences are more efficiently stored if they are sorted. This canonicalizes them. It also makes lookup use binary search.

570) The number of sequences might become very large. In such case, it might be better to not make it part of the same tree and user other data structures like better to navigate shards

Saturday, March 9, 2019

Today we continue discussing the best practice from storage engineering:

560) The number of applications using the same storage is usually not a concern. The ability to serve them with storage classes is noteworthy

561) When an application wants to change the workload on the storage, architects to prefer to swap the storage product with something more suitable. However, a performance engineer can circumvent the approach with optimizations that leverage the existing product. It is always a good practice to give this a try.

562) System architecture holds in favor of changing business needs from the smallest components to the overall product. However, it is rather centralized and sometimes using another instance of the product with customizations can mitigate the urgency while giving ample time for consolidation.

563) The use of storage product also depends on the developer community. Many products such as time series databases and graph databases have generated greater acceptance by endearing the product to developers.

564) Sales and support need to be armed with the latest information and remain current on all features from the customers. They need to have those features work exactly as they say it would.

565) The focus on business value does not remain confined to the people on the border with the customers. It comes from deep within product development and engineering.

#codingexercise

when we have to select groups as well as individuals, we use stars and bars methods
We put n objects in k bins with (n-1) Choose (k-1)
int getGroups ( int n, int k) {
return GetNChooseK (n-1, k-1);
}

double GetNChooseK(double n, double k)
{
if (k <0 || k > n || n = 0) return 0;
if ( k == 0 || k == n) return 1;
return Factorial(n) / (Factorial(n-k) * Factorial(k));
}
Alternatively,

static int GetNChooseKDP(int n, int k)
{
if ( k <0 || k > n || n = 0)
return 0;
if (k == 0 || k == n)
return 1;
return GetNChooseKDP(n - 1, k - 1) + GetNChooseKDP(n - 1, k);
}

Friday, March 8, 2019

Today we continue discussing the best practice from storage engineering:

551) Adding and dropping containers are easy to address cleanup.

552) The number of replication groups is determined by the data that needs to be replicated.

553) Some containers can remain open all the time. Some of these can even be reserved for System purposes.

554) When containers are split, they contribute individually to shared statistics. Such stats do not differentiate between containers. Consequently either the statistics must be differentiated or the origin registered with the collector

555) The statistics may themselves be stored in a container belonging to the system. Since the system containers are treated differently from the user,
they will need to be serviced separately.

556) System and shared notions go well together. They don’t have the isolations required for user containers. System only adds privilege and ownership to otherwise merely shared containers. The elevation to system may not be required in all cases

557) Application and system both publish statistics. They may both need to be the source of truth for their data

558) When the same container is replicated in different zones, there is a notion of local and remote. Only one of them is designated as primary. The remote is usually secondary

559) With primary and secondary containers for a replicated container, they become four when the replication group is split

Thursday, March 7, 2019

Today we continue discussing the best practice from storage engineering:

539) From supercomputers to large scale clusters, the size of compute, storage and network can be made to vary quite a bit. However, the need to own or manage such capability reduces significantly once it is commoditized and outsourced.

540) Some tasks are high priority and are usually smaller in number than the general class of tasks. If they arrive out of control, it can be significant cost. Most storage products try to control the upstream workload for which they are designed. For example, if the tasks can be contrasted significantly, it can be advantageous.

541) The scheduling policies for tasks can vary from scheduler to scheduler. Usually a simple policy scales much better than complicated policies. For example, if all the tasks have a share in a pie representing the scheduler, then it is simpler to expand the pie rather than re-adjusting the pie slices dynamically to accommodate the tasks.

542) The weights associated with tasks are set statically and then used in computations to determine the scheduling of the tasks. This can be measured in quantums of time and if a task takes more than what is expected, it is called a quantum thief. A scheduler uses tallying to find and make a quantum thief yield to other tasks.

543) Book-keeping is essential for both scheduler and allocator not only to keep track of grants but also for analysis and diagnostics.

544) A scheduler and allocator can each have their own manager that separates the concerns of management from their work

545) The more general purpose the scheduler and allocator become, the easier it is to use them in different components. Commodity implementations win hands down against specialized ones because they scale.

546) The requests for remote resources are expected to perform longer than local operations. If they incur timeouts, the quantum grants may need to stretch over.

547) Timeout must expand to include timeouts from nested operations.

548) Some event notification schemes are helpful to handle them at the appropriate scope.
549) A recovery state machine can help with global event handling for outages and recovery.
550) The number of steps taken to recover from outages can be reduced by dropping scoped containers in favor of standby
551) Adding and dropping containers are easy to address cleanup.
552) The number of replication groups is determined by the data that needs to be replicated. Generally there is very little data

Wednesday, March 6, 2019

The operation on inactive entries
Recently I came across an unusual problem of maintaining active and inactive entries. It was unusual because there were two sets and they were updated differently and at different times. Both the sets were invoked separately and there was no sign whether the sets were mutually exclusive. Although we could assume the invocations of operations on the sets were from the same component, they were invoked in different iterations. This meant that the operations taken on the set would be repeated several times.
The component only took actions on the active set. It needed to take an action on the entries that were inactive. This action was added subsequent to the action on the active set. However, the sets were not mutually exclusive so they had to be differentiated to see what was available in one but not the other. Instead this was overcome by delegating the sets to be separated at the source. This made it a lot easier to work with the actions on the sets because they would be fetched each time with some confidence that they would be mutually exclusive. There was no confirmation that the source was indeed giving up to date sets. This called for a validation on the entries in the set prior to taking the action. The validation was merely to check if the entry was active or not.
However, an entry does not remain in the same set forever. It could move from the active set to the inactive set and back. The active set and the inactive set would always correspond to their respective actions. This meant that the actions needed to be inverted between the entries so that they could flip their state between the two processing.
There were four cases for trying this out. The first case was when the active set was called twice. The second case was when the inactive set was called twice. The third case was when the active set was followed by the inactive set. The fourth case was when the inactive set was followed by the active set.
With these four cases, the active and the inactive set could have the same operations taken deterministically no matter how many times they were repeated and in what order.
The only task that remained now was to ensure that the sets returned from the source were good to begin with. The source was merely subscribed to events that added entries to the sets. However, the events could be called in any order and for arbitrary number of times. The event handling did not all exercise the same logic so the entries did not appear final in all the cases. This contributed to the invalid entries in the set. When the methods used to retrieve the active and inactive set were made consistent, deterministic, robust and correct, it became easier to work with the operations on the set in the calling component.
This concluded the cleaning up of the logic to handle the active and inactive sets.
#codingexercise
Selecting four from a set of n:
double GetNChooseK(double n, double k)
{
if (k <0 || k > n || n = 0) return 0;
if ( k == 0 || k == n) return 1;
return Factorial(n) / (Factorial(n-k) * Factorial(k));
}

GetNChooseK (n, 4);

Tuesday, March 5, 2019

Monday, March 4, 2019

Today we continue discussing the best practice from storage engineering:

537) The number of times a network is traversed also matters in the overall cost for data. The best cost for data is when data is at rest rather than in transit.

538) The choice between a faster processor or a large storage or both is a flexible choice if the dollar value is the same. In such cases, the strategy can be sequential, streaming or batched. Once the strategy is in place, the dollar TCO significantly increases when business needs change.

539) From supercomputers to large scale clusters, the size of compute, storage and network can be made to vary quite a bit. However, the need to own or manage such capability reduces significantly once it is commoditized and outsourced.

540) Some tasks are high priority and are usually smaller in number than the general class of tasks. If they arrive out of control, it can be significant cost. Most storage products try to control the upstream workload for which they are designed. For example, if the tasks can be contrasted significantly, it can be advantageous.

541) The scheduling policies for tasks can vary from scheduler to scheduler. Usually a simple policy scales much better than complicated policies. For example, if all the tasks have a share in a pie representing the scheduler, then it is simpler to expand the pie rather than re-adjusting the pie slices dynamically to accommodate the tasks.

542) The weights associated with tasks are set statically and then used in computations to determine the scheduling of the tasks. This can be measured in quantums of time and if a task takes more than what is expected, it is called a quantum thief. A scheduler uses tallying to find and make a quantum thief yield to other tasks.

543) Book-keeping is essential for both scheduler and allocator not only to keep track of grants but also for analysis and diagnostics.

544) A scheduler and allocator can each have their own manager that separates the concerns of management from their work

545) The more general purpose the scheduler and allocator become, the easier it is to use them in different components. Commodity implementations win hands down against specialized ones because they scale.