Cluster computing

Friday, March 15, 2019

Today we continue discussing the best practice from storage engineering:
es time and cost.

588) There are notions of SaaS, PaaS and, IaaS with clear separation of concerns in the cloud. The same applies to a storage layer in terms of delegating to dedicated products.

589) The organization in the cloud does not limit the number and type of services available from the cloud. The same holds true for the feature as services within a storage product.

590) The benefits that come with the cloud can also come from a storage product.

591) There are times when the storage product will have imbalanced load. They will need to be load balanced. Since this is an ongoing activity, it can be periodically scheduled or responded when thresholds are crossed.

592) When the layers of infrastructure and storage services are clearly differentiated, the upper layer may utilize the alerting from the lower layers for health checks and to take corrective actions.

593) There are a number of ways to monitor a system whether it is for performance, statistics or health checks. A system center management system can consolidate and unify the operations management. The storage product merely needs to publish to a system center.

594) There are several formats of metrics and monitoring data and generally they are proprietary. Utilizing an external stack for these purposes via APIs helps alleviate the concerns from the storage service.

595) Virtually every service utilized from the infrastructure is candidate for standardization and consistency so that one component/vendor in th

Thursday, March 14, 2019

Today we continue discussing the best practice from storage engineering:

583) Storage products have a tendency to accumulate user artifacts such as rules, containers and settings. It should be easy to migrate and upgrade them.

584) The migration mentioned above is preferable to be done via user friendly mechanism because they matter more to the user than to the system.

585) There are several times that customers will run into issues with upgrade and migration. Unfortunately, there is usually no dry run for the instance. One of the best techniques is to plan the upgrade.

586) Storage products embrace compute as much as the services are needed over the raw storage but the line of separation between compute and storage remains clear in solutions that use storage. The purer the compute over the storage, the better for the storage.

587) The dependency of storage on healthy nodes is maintained with the help of detection and remedial measures. If the administrator does not have to rush to replace a bad unit, it saves time and cost.

588) There are notions of SaaS, PaaS and, IaaS with clear separation of concerns in the cloud. The same applies to a storage layer in terms of delegating to dedicated products.

589) The organization in the cloud does not limit the number and type of services available from the cloud. The same holds true for the feature as services within a storage product.

590) The benefits that come with the cloud can also come from a storage product.

Wednesday, March 13, 2019

Today we continue discussing the best practice from storage engineering:

578) This export of logic is very helpful in overcoming the limitations of static configuration and reload of service. Regardless of the need for a runtime to execute the logic, even listenable config values can help with changes to rules.

579) The rules can be flat conjunctive filters or expression trees. Their evaluation is in program order.

580) The outcome of the processing of rules is the treatment given to the resource. There can be classes in outcome.

581) The number of rules is generally not a concern to compute and it is also not a concern of storage. However, the maintenance of rules is a significant onus and is preferable to avoid first.

582) The type of rules and the classes of outcome generally don’t change even in most heavily used filters. IPSec for example has a lot of attributes to secure the network but its type of rules and outcomes are well-known. Rules can therefore be rewritten periodically to make them more efficient.

583) Storage products have a tendency to accumulate user artifacts such as rules, containers and settings. It should be easy to migrate and upgrade them.

584) The migration mentioned above is preferable to be done via user friendly mechanism because they matter more to the user than to the system.

585) There are several times that customers will run into issues with upgrade and migration. Unfortunately, there is usually no dry run for the instance. One of the best techniques is to plan the upgrade.

Tuesday, March 12, 2019

Today we continue discussing the best practice from storage engineering:

574) If the range of sequences can be limited to a window, the user and application can take on much of the processing relieving the compute requirements from storage. Such intensive scripts can run anywhere the user wants as long as the data is available.

575) If logic pertains specifically to some data and applicable only to that data, it is possible to register logic and load a runtime to execute that logic specific to data just as it is possible to externalize query processing over an iterative data set.

576) There are several containers for logic usually packaged as modules and they can be invoke by a common runtime. However, at its simplest form, this logic is merely a set of rules.

577) The rules are scoped to the artifacts they secure. For system wide resources, there is only a singleton. For user resources, they can be dynamically fetched and executed as long as they are registered.

578) This export of logic is very helpful in
overcoming the limitations of static
configuration and reload of service.
Regardless of the need for a runtime to execute the logic,
even listenable config values can help with changes to rules.

Monday, March 11, 2019

Today we continue discussing the best practice from storage engineering:

571) Storage products often make use of bitmap index to store sequences efficiently when they are rather sparse. Bitmaps also help with conjunctive filters and this is useful in sequences with repeating members

572) The sequences can be more efficiently queried than standard query operators if the predicates are pushed down closer to the storage.

573) Sequences work well with bloom filters which test whether a member is part of the sequence or not. Sometimes it is enough to rule out that a member is not part of the set

574) If the range of sequences can be limited to a window, the user and application can take on much of the processing relieving the compute requirements from storage. Such intensive scripts can run anywhere the user wants as long as the data is available.

575) If logic pertains specifically to some data and applicable only to that data, it is possible to register logic and load a runtime to execute that logic specific to data just as it is possible to externalize query processing over an iterative data set.

#codingexercise
GetCombinations with repetitions for r items among n
int GetCombinations (int n, int r) {
return GetNChooseK ( (n+r-1) , r ) ;
}

We put n objects in k bins with (n-1) Choose (k-1)
int getGroups ( int n, int k) {
return GetNChooseK (n-1, k-1);

}

We can do permutations with n!/(n-r)!

Sunday, March 10, 2019

Today we continue discussing the best practice from storage engineering:

565) The focus on business value does not remain confined to the people on the border with the customers. It comes from deep within product development and engineering.

566) The storage product can relieve compute altogether where results are computed once and saved for all subsequent usages. This works well for data that does not change over time.

567) When the data changes frequently, it helps to organize it in a way such that those that don’t change are on one side and those that do are on the other side. This helps to making incremental results from the data.

568) Data will inevitably have patterns with reuse. We can call them sequences. While most data might be stored with general purpose btree, the sequences call for more efficient data structures such as radix tree. These help insert and lookup sequences easier.

569) Sequences are more efficiently stored if they are sorted. This canonicalizes them. It also makes lookup use binary search.

570) The number of sequences might become very large. In such case, it might be better to not make it part of the same tree and user other data structures like better to navigate shards

Saturday, March 9, 2019

Today we continue discussing the best practice from storage engineering:

560) The number of applications using the same storage is usually not a concern. The ability to serve them with storage classes is noteworthy

561) When an application wants to change the workload on the storage, architects to prefer to swap the storage product with something more suitable. However, a performance engineer can circumvent the approach with optimizations that leverage the existing product. It is always a good practice to give this a try.

562) System architecture holds in favor of changing business needs from the smallest components to the overall product. However, it is rather centralized and sometimes using another instance of the product with customizations can mitigate the urgency while giving ample time for consolidation.

563) The use of storage product also depends on the developer community. Many products such as time series databases and graph databases have generated greater acceptance by endearing the product to developers.

564) Sales and support need to be armed with the latest information and remain current on all features from the customers. They need to have those features work exactly as they say it would.

565) The focus on business value does not remain confined to the people on the border with the customers. It comes from deep within product development and engineering.

#codingexercise

when we have to select groups as well as individuals, we use stars and bars methods
We put n objects in k bins with (n-1) Choose (k-1)
int getGroups ( int n, int k) {
return GetNChooseK (n-1, k-1);
}

double GetNChooseK(double n, double k)
{
if (k <0 || k > n || n = 0) return 0;
if ( k == 0 || k == n) return 1;
return Factorial(n) / (Factorial(n-k) * Factorial(k));
}
Alternatively,

static int GetNChooseKDP(int n, int k)
{
if ( k <0 || k > n || n = 0)
return 0;
if (k == 0 || k == n)
return 1;
return GetNChooseKDP(n - 1, k - 1) + GetNChooseKDP(n - 1, k);
}