Cluster computing

Thursday, March 14, 2019

Today we continue discussing the best practice from storage engineering:

583) Storage products have a tendency to accumulate user artifacts such as rules, containers and settings. It should be easy to migrate and upgrade them.

584) The migration mentioned above is preferable to be done via user friendly mechanism because they matter more to the user than to the system.

585) There are several times that customers will run into issues with upgrade and migration. Unfortunately, there is usually no dry run for the instance. One of the best techniques is to plan the upgrade.

586) Storage products embrace compute as much as the services are needed over the raw storage but the line of separation between compute and storage remains clear in solutions that use storage. The purer the compute over the storage, the better for the storage.

587) The dependency of storage on healthy nodes is maintained with the help of detection and remedial measures. If the administrator does not have to rush to replace a bad unit, it saves time and cost.

588) There are notions of SaaS, PaaS and, IaaS with clear separation of concerns in the cloud. The same applies to a storage layer in terms of delegating to dedicated products.

589) The organization in the cloud does not limit the number and type of services available from the cloud. The same holds true for the feature as services within a storage product.

590) The benefits that come with the cloud can also come from a storage product.

Wednesday, March 13, 2019

Today we continue discussing the best practice from storage engineering:

578) This export of logic is very helpful in overcoming the limitations of static configuration and reload of service. Regardless of the need for a runtime to execute the logic, even listenable config values can help with changes to rules.

579) The rules can be flat conjunctive filters or expression trees. Their evaluation is in program order.

580) The outcome of the processing of rules is the treatment given to the resource. There can be classes in outcome.

581) The number of rules is generally not a concern to compute and it is also not a concern of storage. However, the maintenance of rules is a significant onus and is preferable to avoid first.

582) The type of rules and the classes of outcome generally don’t change even in most heavily used filters. IPSec for example has a lot of attributes to secure the network but its type of rules and outcomes are well-known. Rules can therefore be rewritten periodically to make them more efficient.

583) Storage products have a tendency to accumulate user artifacts such as rules, containers and settings. It should be easy to migrate and upgrade them.

584) The migration mentioned above is preferable to be done via user friendly mechanism because they matter more to the user than to the system.

585) There are several times that customers will run into issues with upgrade and migration. Unfortunately, there is usually no dry run for the instance. One of the best techniques is to plan the upgrade.

Tuesday, March 12, 2019

Today we continue discussing the best practice from storage engineering:

574) If the range of sequences can be limited to a window, the user and application can take on much of the processing relieving the compute requirements from storage. Such intensive scripts can run anywhere the user wants as long as the data is available.

575) If logic pertains specifically to some data and applicable only to that data, it is possible to register logic and load a runtime to execute that logic specific to data just as it is possible to externalize query processing over an iterative data set.

576) There are several containers for logic usually packaged as modules and they can be invoke by a common runtime. However, at its simplest form, this logic is merely a set of rules.

577) The rules are scoped to the artifacts they secure. For system wide resources, there is only a singleton. For user resources, they can be dynamically fetched and executed as long as they are registered.

578) This export of logic is very helpful in
overcoming the limitations of static
configuration and reload of service.
Regardless of the need for a runtime to execute the logic,
even listenable config values can help with changes to rules.

Monday, March 11, 2019

Today we continue discussing the best practice from storage engineering:

571) Storage products often make use of bitmap index to store sequences efficiently when they are rather sparse. Bitmaps also help with conjunctive filters and this is useful in sequences with repeating members

572) The sequences can be more efficiently queried than standard query operators if the predicates are pushed down closer to the storage.

573) Sequences work well with bloom filters which test whether a member is part of the sequence or not. Sometimes it is enough to rule out that a member is not part of the set

574) If the range of sequences can be limited to a window, the user and application can take on much of the processing relieving the compute requirements from storage. Such intensive scripts can run anywhere the user wants as long as the data is available.

575) If logic pertains specifically to some data and applicable only to that data, it is possible to register logic and load a runtime to execute that logic specific to data just as it is possible to externalize query processing over an iterative data set.

#codingexercise
GetCombinations with repetitions for r items among n
int GetCombinations (int n, int r) {
return GetNChooseK ( (n+r-1) , r ) ;
}

We put n objects in k bins with (n-1) Choose (k-1)
int getGroups ( int n, int k) {
return GetNChooseK (n-1, k-1);

}

We can do permutations with n!/(n-r)!

Sunday, March 10, 2019

Today we continue discussing the best practice from storage engineering:

565) The focus on business value does not remain confined to the people on the border with the customers. It comes from deep within product development and engineering.

566) The storage product can relieve compute altogether where results are computed once and saved for all subsequent usages. This works well for data that does not change over time.

567) When the data changes frequently, it helps to organize it in a way such that those that don’t change are on one side and those that do are on the other side. This helps to making incremental results from the data.

568) Data will inevitably have patterns with reuse. We can call them sequences. While most data might be stored with general purpose btree, the sequences call for more efficient data structures such as radix tree. These help insert and lookup sequences easier.

569) Sequences are more efficiently stored if they are sorted. This canonicalizes them. It also makes lookup use binary search.

570) The number of sequences might become very large. In such case, it might be better to not make it part of the same tree and user other data structures like better to navigate shards

Saturday, March 9, 2019

Today we continue discussing the best practice from storage engineering:

560) The number of applications using the same storage is usually not a concern. The ability to serve them with storage classes is noteworthy

561) When an application wants to change the workload on the storage, architects to prefer to swap the storage product with something more suitable. However, a performance engineer can circumvent the approach with optimizations that leverage the existing product. It is always a good practice to give this a try.

562) System architecture holds in favor of changing business needs from the smallest components to the overall product. However, it is rather centralized and sometimes using another instance of the product with customizations can mitigate the urgency while giving ample time for consolidation.

563) The use of storage product also depends on the developer community. Many products such as time series databases and graph databases have generated greater acceptance by endearing the product to developers.

564) Sales and support need to be armed with the latest information and remain current on all features from the customers. They need to have those features work exactly as they say it would.

565) The focus on business value does not remain confined to the people on the border with the customers. It comes from deep within product development and engineering.

#codingexercise

when we have to select groups as well as individuals, we use stars and bars methods
We put n objects in k bins with (n-1) Choose (k-1)
int getGroups ( int n, int k) {
return GetNChooseK (n-1, k-1);
}

double GetNChooseK(double n, double k)
{
if (k <0 || k > n || n = 0) return 0;
if ( k == 0 || k == n) return 1;
return Factorial(n) / (Factorial(n-k) * Factorial(k));
}
Alternatively,

static int GetNChooseKDP(int n, int k)
{
if ( k <0 || k > n || n = 0)
return 0;
if (k == 0 || k == n)
return 1;
return GetNChooseKDP(n - 1, k - 1) + GetNChooseKDP(n - 1, k);
}

Friday, March 8, 2019

Today we continue discussing the best practice from storage engineering:

551) Adding and dropping containers are easy to address cleanup.

552) The number of replication groups is determined by the data that needs to be replicated.

553) Some containers can remain open all the time. Some of these can even be reserved for System purposes.

554) When containers are split, they contribute individually to shared statistics. Such stats do not differentiate between containers. Consequently either the statistics must be differentiated or the origin registered with the collector

555) The statistics may themselves be stored in a container belonging to the system. Since the system containers are treated differently from the user,
they will need to be serviced separately.

556) System and shared notions go well together. They don’t have the isolations required for user containers. System only adds privilege and ownership to otherwise merely shared containers. The elevation to system may not be required in all cases

557) Application and system both publish statistics. They may both need to be the source of truth for their data

558) When the same container is replicated in different zones, there is a notion of local and remote. Only one of them is designated as primary. The remote is usually secondary

559) With primary and secondary containers for a replicated container, they become four when the replication group is split