Cluster computing

Sunday, March 17, 2019

The operation on inactive entries
Recently I came across an unusual problem of maintaining active and inactive entries. It was unusual because there were two sets and they were updated differently and at different times. Both the sets were invoked separately and there was no sign whether the sets were mutually exclusive. Although we could assume the invocations of operations on the sets were from the same component, they were invoked in different iterations. This meant that the operations taken on the set would be repeated several times.
The component only took actions on the active set. It needed to take an action on the entries that were inactive. This action was added subsequent to the action on the active set. However, the sets were not mutually exclusive so they had to be differentiated to see what was available in one but not the other. Instead this was overcome by delegating the sets to be separated at the source. This made it a lot easier to work with the actions on the sets because they would be fetched each time with some confidence that they would be mutually exclusive. There was no confirmation that the source was indeed giving up to date sets. This called for a validation on the entries in the set prior to taking the action. The validation was merely to check if the entry was active or not.
However, an entry does not remain in the same set forever. It could move from the active set to the inactive set and back. The active set and the inactive set would always correspond to their respective actions. This meant that the actions needed to be inverted between the entries so that they could flip their state between the two processing.
There were four cases for trying this out. The first case was when the active set was called twice. The second case was when the inactive set was called twice. The third case was when the active set was followed by the inactive set. The fourth case was when the inactive set was followed by the active set.
With these four cases, the active and the inactive set could have the same operations taken deterministically no matter how many times they were repeated and in what order.
The only task that remained now was to ensure that the sets returned from the source were good to begin with. The source was merely subscribed to events that added entries to the sets. However, the events could be called in any order and for arbitrary number of times. The event handling did not all exercise the same logic so the entries did not appear final in all the cases. This contributed to the invalid entries in the set. When the methods used to retrieve the active and inactive set were made consistent, deterministic, robust and correct, it became easier to work with the operations on the set in the calling component.
This concluded the cleaning up of the logic to handle the active and inactive sets.
We now follow up with the improvements to the source when possible:
There are different lists maintained for active, inactive, failover, scanned collections. They could all be part of the same collection and synchronized so that all accesses are serialized. Attributes on the entries can describe the state of the entry. If the entries need to be on separate collections, they could be synchronized with the same lock.
The operations on the entries may be made available as simpler full service methods where validation and updates are included. When the source is rewritten, it must make sure that all the existing behavior is unchanged.

Saturday, March 16, 2019

Today we continue discussing the best practice from storage engineering:

595) Virtually every service utilized from the infrastructure is candidate for standardization and consistency so that one component/vendor in the infrastructure may be replaced with another with little or no disruption

596) There are a number of stack frames that a developer has to traverse in order to find the code path taken by the execution thread and they don’t always pertain to layers but if the stack frames can get simpler for the developer, the storage product on the whole improves tremendously. This is not a rule but just a rule of thumb that the simpler the better.

597) As with all one-point maintenance code, there is bloating and complexity to handle different use cases from the same code. Unfortunately, developers don’t have the luxury to rewrite core components without significant investment of time and effort. Therefore version 1 of the product must always strive for building it right from the get go.

598) As use cases increase and the business improves, the product management pays a lot of attention to sustainable growth in the face of business needs. It is at this cusp of technology and business that the system architecture plays its best.

599) There are very few cases where the process goes wrong. On the other hand, there is a lot of advantage to trusting the process. Consequently, the product must be improved with sound processes.

600) As with any product, a storage product also qualifies for the Specific-measureable-attainable-realistic-timely aka SMART process where improvements can be measured and the feedback used to improve the process and the product.

Friday, March 15, 2019

Today we continue discussing the best practice from storage engineering:
es time and cost.

588) There are notions of SaaS, PaaS and, IaaS with clear separation of concerns in the cloud. The same applies to a storage layer in terms of delegating to dedicated products.

589) The organization in the cloud does not limit the number and type of services available from the cloud. The same holds true for the feature as services within a storage product.

590) The benefits that come with the cloud can also come from a storage product.

591) There are times when the storage product will have imbalanced load. They will need to be load balanced. Since this is an ongoing activity, it can be periodically scheduled or responded when thresholds are crossed.

592) When the layers of infrastructure and storage services are clearly differentiated, the upper layer may utilize the alerting from the lower layers for health checks and to take corrective actions.

593) There are a number of ways to monitor a system whether it is for performance, statistics or health checks. A system center management system can consolidate and unify the operations management. The storage product merely needs to publish to a system center.

594) There are several formats of metrics and monitoring data and generally they are proprietary. Utilizing an external stack for these purposes via APIs helps alleviate the concerns from the storage service.

595) Virtually every service utilized from the infrastructure is candidate for standardization and consistency so that one component/vendor in th

Thursday, March 14, 2019

Today we continue discussing the best practice from storage engineering:

583) Storage products have a tendency to accumulate user artifacts such as rules, containers and settings. It should be easy to migrate and upgrade them.

584) The migration mentioned above is preferable to be done via user friendly mechanism because they matter more to the user than to the system.

585) There are several times that customers will run into issues with upgrade and migration. Unfortunately, there is usually no dry run for the instance. One of the best techniques is to plan the upgrade.

586) Storage products embrace compute as much as the services are needed over the raw storage but the line of separation between compute and storage remains clear in solutions that use storage. The purer the compute over the storage, the better for the storage.

587) The dependency of storage on healthy nodes is maintained with the help of detection and remedial measures. If the administrator does not have to rush to replace a bad unit, it saves time and cost.

588) There are notions of SaaS, PaaS and, IaaS with clear separation of concerns in the cloud. The same applies to a storage layer in terms of delegating to dedicated products.

589) The organization in the cloud does not limit the number and type of services available from the cloud. The same holds true for the feature as services within a storage product.

590) The benefits that come with the cloud can also come from a storage product.

Wednesday, March 13, 2019

Today we continue discussing the best practice from storage engineering:

578) This export of logic is very helpful in overcoming the limitations of static configuration and reload of service. Regardless of the need for a runtime to execute the logic, even listenable config values can help with changes to rules.

579) The rules can be flat conjunctive filters or expression trees. Their evaluation is in program order.

580) The outcome of the processing of rules is the treatment given to the resource. There can be classes in outcome.

581) The number of rules is generally not a concern to compute and it is also not a concern of storage. However, the maintenance of rules is a significant onus and is preferable to avoid first.

582) The type of rules and the classes of outcome generally don’t change even in most heavily used filters. IPSec for example has a lot of attributes to secure the network but its type of rules and outcomes are well-known. Rules can therefore be rewritten periodically to make them more efficient.

583) Storage products have a tendency to accumulate user artifacts such as rules, containers and settings. It should be easy to migrate and upgrade them.

584) The migration mentioned above is preferable to be done via user friendly mechanism because they matter more to the user than to the system.

585) There are several times that customers will run into issues with upgrade and migration. Unfortunately, there is usually no dry run for the instance. One of the best techniques is to plan the upgrade.

Tuesday, March 12, 2019

Today we continue discussing the best practice from storage engineering:

574) If the range of sequences can be limited to a window, the user and application can take on much of the processing relieving the compute requirements from storage. Such intensive scripts can run anywhere the user wants as long as the data is available.

575) If logic pertains specifically to some data and applicable only to that data, it is possible to register logic and load a runtime to execute that logic specific to data just as it is possible to externalize query processing over an iterative data set.

576) There are several containers for logic usually packaged as modules and they can be invoke by a common runtime. However, at its simplest form, this logic is merely a set of rules.

577) The rules are scoped to the artifacts they secure. For system wide resources, there is only a singleton. For user resources, they can be dynamically fetched and executed as long as they are registered.

578) This export of logic is very helpful in
overcoming the limitations of static
configuration and reload of service.
Regardless of the need for a runtime to execute the logic,
even listenable config values can help with changes to rules.

Monday, March 11, 2019

Today we continue discussing the best practice from storage engineering:

571) Storage products often make use of bitmap index to store sequences efficiently when they are rather sparse. Bitmaps also help with conjunctive filters and this is useful in sequences with repeating members

572) The sequences can be more efficiently queried than standard query operators if the predicates are pushed down closer to the storage.

573) Sequences work well with bloom filters which test whether a member is part of the sequence or not. Sometimes it is enough to rule out that a member is not part of the set

574) If the range of sequences can be limited to a window, the user and application can take on much of the processing relieving the compute requirements from storage. Such intensive scripts can run anywhere the user wants as long as the data is available.

575) If logic pertains specifically to some data and applicable only to that data, it is possible to register logic and load a runtime to execute that logic specific to data just as it is possible to externalize query processing over an iterative data set.

#codingexercise
GetCombinations with repetitions for r items among n
int GetCombinations (int n, int r) {
return GetNChooseK ( (n+r-1) , r ) ;
}

We put n objects in k bins with (n-1) Choose (k-1)
int getGroups ( int n, int k) {
return GetNChooseK (n-1, k-1);

}

We can do permutations with n!/(n-r)!