Cluster computing

Friday, December 14, 2018

Today we continue discussing the best practice from storage engineering:

165) Event Subscription versus appenders: Just like log appenders, there is a possibility to transfer the same event collection result to a large number of destinations simultaneously. These destinations can include files, databases, email recipients and so on.

166) Optional versus mandatory. Every feature in the storage server that is not on the data critical path, is candidate for being turned off to save on resources and improve the data path. This allows equally to components that are not essential. Reducing the number of publisher subscribers is another example of this improvement

167) The number of layers encountered in some operations may become a lot. In such case layering can accommodate components that directly talk to lower layers Layering is not always stripped. However, the justifications to bypass layers must be well made. This counts towards performance by design.

168) There are time periods of peak workload for any storage product. These products serve annual holiday sales, specific anniversaries and anticipated high demand. Utilization of the product under such activity is unusually high. While capacity may be planned to meet the demand, there are ways to tune the existing system to extract more performance. Part of these efforts include switching from being disk intensive to performing more in-memory computation and utilizing other software products to be used with the storage server such as memcache.

169) When the load is high, it is difficult to turn on the profiler to study bottlenecks. However, this can be safely done in advance in performance labs. However, there is an even easier strategy of selectively turning off components that are not required and scaling the component that is under duress. A priority list of mitigatory steps may be predetermined prior to periods of heavy loads.

170) The monitoring of the Service Level Agreements on the storage server allows us to determine the steps needed to maintain the optimum performance of the system. Maintaining standbys for servers and replacements for cluster nodes or spare hardware either on the chassis or as a plugin helps with the speedy resolution of outages.

# usage example
https://1drv.ms/w/s!Ashlm-Nw-wnWuCiBH-iYeyyYCrFG

Thursday, December 13, 2018

Today we continue discussing the best practice from storage engineering:

165) Event Subscription versus appenders: Just like log appenders, there is a possibility to transfer the same event collection result to a large number of destinations simultaneously. These destinations can include files, databases, email recipients and so on.
166) Optional versus mandatory. Every feature in the storage server that is not on the data critical path, is candidate for being turned off to save on resources and improve the data path. This allows equally to components that are not essential. Reducing the number of publisher subscribers is another example of this improvement
167) The number of layers encountered in some operations may become a lot. In such case layering can accommodate components that directly talk to lower layers Layering is not always stripped. However, the justifications to bypass layers must be well made. This counts towards performance by design.
168) There are time periods of peak workload for any storage product. These products serve annual holiday sales, specific anniversaries and anticipated high demand. Utilization of the product under such activity is unusually high. While capacity may be planned to meet the demand, there are ways to tune the existing system to extract more performance. Part of these efforts include switching from being disk intensive to performing more in-memory computation and utilizing other software products to be used with the storage server such as memcache.
169) When the load is high, it is difficult to turn on the profiler to study bottlenecks. However, this can be safely done in advance in performance labs. However, there is an even easier strategy of selectively turning off components that are not required and scaling the component that is under duress. A priority list of mitigatory steps may be predetermined prior to periods of heavy loads.
170) The monitoring of the Service Level Agreements on the storage server allows us to determine the steps needed to maintain the optimum performance of the system. Maintaining standbys for servers and replacements for cluster nodes or spare hardware either on the chassis or as a plugin helps with the speedy resolution of outages.

Wednesday, December 12, 2018

Today we continue discussing the best practice from storage engineering:

160) Nativitity of registries – User registries, on the other hand, are welcome and can be arbitrary. In such cases, the registries are about their own artifacts. However, such rregsitries can be stored just the same way as user data. Consequently, the system does not need to participate in the user registries and they can ear mark storage designated storage artifacts for this purpose.

161) Sequences – Sequences hold a special significance in the storage server. If there are several actions taken by the storage server and the actions don’t belong to the same group and there is no way to assign a sequence number, then we rely on the names of the actions as they appear on the actual timeline such as in the logs. When the names can be collected as sequences, we can perform standard query operations on the collections to determine patterns. This kind of pattern recognition is very useful when there are heterogenous entries and the order in which the user initiates them is dynamic.

162) Event driven framework: Not all user defined actions are fire and forget. Some of them may be interactive and since there can be any amount of delay between interactions, usually some form of event driven framework consistently finds its way into the storage server. From storage drivers, background processors and even certain advanced UI controls use event driven framework.

163) Tracing: The most important and useful application of sequences is the tracing of actions for any activity. Just like logs, event driven framework may provide the ability to trace user actions as different system components participate in their completion. Tracing is very similar to profiling but there needs to be a publisher-subscriber model. Most user mode completion of tasks are done with the help of a message queue and a sink.

164) Event Querying: Event driven framework have the ability to not just operate on whole data but also on streams. This makes it very popular to write stream-based queries involving partitions and going over a partition.

165) Event Subscription versus appenders: Just like log appenders, there is a possibility to transfer the same event collection result to a large number of destinations simultaneously. These destinations can include files, databases, email recipients and so on.

Tuesday, December 11, 2018

Today we continue discussing the best practice from storage engineering:

155) Pooled external resources: It is not just the host and its operating system resources that the storage product requires, it may also require resources that are remote from the local stack. Since such resources can be expensive, it is helpful for the storage product to be thrifty by pooling the resources and servicing as many workers in the storage product as possible.

156) Leveraging monitoring of the host: When an application is deployed to Platform-as-a-service, it no longer has the burden of maintaining its own monitoring. The same applies to storage server as a product depending on where it is deployed. The deeper we go in the stack including the fabric below the storage server, the more amenable they are for the monitoring of the layer above.

157) Immutables: Throughout the storage server we have to use constants for everything from identifiers, names and even temporary data as immutables. While we can differentiate them with number sequences, it is more beneficial to use strings. Strings not only allow names to be given but also help prefix and suffix matching. Even enums have names and we can store them as single instances throughout the system.

158) System artifacts must have names that are not user-friendly because they are reserved and should potentially not come in the way of the names that the user wants to use. Moreover, these names have to be somewhat hidden from the users

159) Registries – When there are collections of artifacts that are reserved for a purpose, they need to be maintained somewhere as a registry. It facilitates lookups. However, registries like lists cannot keep piling up. As long as we encapsulate the logic to determine the lsit, the list is redundant because we can execute the logic and over again. However, this is often hard to enforce as a sound architecture principle

160) Nativitity of registries – User registries, on the other hand, are welcome and can be arbitrary. In such cases, the registries are about their own artifacts. However, such rregsitries can be stored just the same way as user data. Consequently, the system does not need to participate in the user registries and they can ear mark storage designated storage artifacts for this purpose.

Int getIndexOf(node*root, node* moved){
if (root == null) return 0;
if (root.next == null && root == moved) return 0;
if (root.next == null && root != moved) return -1;
int count = 0;
node* tail=root;
while (tail->next) {
If (tail == moved) break;
tail = tail->next;
count++;
}
if (count ==0) return -1;
Return count;
}

Monday, December 10, 2018

Today we continue discussing the best practice from storage engineering:

Querying: Storage products are not required to form a query layer. Still they may participate in the query layer on enumeration basis. Since the query layer can come separately over the storage entities, it is generally written in the form of a web service. However, SQL like queries may also be supported on the enumerations as shown here: https://1drv.ms/w/s!Ashlm-Nw-wnWt1QqRW-GkZlfg0yz

152) Storage product code is usually a large code base. Component interaction diagrams can be generated and reorganizations may be done to improve the code. The practice associated with large code bases applies to storage server code as well.

153) Side-by-Side: Most storage products have to be elastic to growing data. However, customers may choose to user more than one instances. In such a case, the storage product must work well in a side by side manner. This is generally the case when we don’t bind to pre-determined operating system resources and acquire them on initialization using the Resource Acquisition is Initialization principle.

154) Multiplexing: By the same argument above, it is not necessary for the Operating System to see multiple instances of the storage product as separate processes. For example, if the storage product is accessible via an http endpoint, multiple instances can register different endpoints while using a host virtualization for utilizing operating system resources. This is the case with Linux containers.

155) Pooled external resources: It is not just the host and its operating system resources that the storage product requires, it may also require resources that are remote from the local stack. Since such resources can be expensive, it is helpful for the storage product to be thrifty by pooling the resources and servicing as many workers in the storage product as possible.

Sunday, December 9, 2018

Today we continue discussing the best practice from storage engineering

148) Standalone mode: Most storage products offer capabilities in a standalone mode. This makes it easy to try out the product without dependencies. One Box deployments also work in this manner. When we remove the dependency on the hardware, we enable the product to be easier to study and try out the features.

149) End User License Agreement: Products have EULAs which the user is required to accept. If the EULA conditions are not read or accepted, certain functionalities of the server are turned off. Typically, this check happens at the initialization time of the product or the restart of the service. However, the initialization code is global and has very sensitive sequence of steps. Therefore, placing the EULA criteria without leaving the product in a bad state is necessary.

150) Provisioning: Storage artifacts are not just created by user. Some may be offered as built-ins so that the user can start working with their data rather than handling the overhead of setting up containers and describing them. These ready to use built-in storage artifacts lets the user focus on their tasks rather than the system tasks. A judicious choice of all such artifacts and routines are therefore useful to the user.

151) Querying: Storage products are not required to form a query layer. Still they may participate in the query layer on enumeration basis. Since the query layer can come separately over the storage entities, it is generally written in the form of a web service. However SQL like queries may also be supported on the enumerations as shown here: https://1drv.ms/w/s!Ashlm-Nw-wnWt1QqRW-GkZlfg0yz

152) Storage product code is usually a large code base. Component interaction diagrams can be generated and reorganizations may be done to improve the code. The practice associated with large code bases applies to storage server code as well.

#codingexercise
int getCountEven(node* root)
{
if (root == null) return 0;
int count;
while (root) {
if (root->value %2 == 0) {
count++;
}
root = root->next;
}
return count;
}

Saturday, December 8, 2018

Today we continue discussing the best practice from storage engineering:

146) Footprint: The code for a storage server can run on any device. Java for example runs on billions of devices and a storage server written in Java can run even on pocket devices. If the storage is flash and the entire storage server runs only on flash, it makes a great candidate for usability.

147) Editions: Just like the code for storage server can be made suitable for different devices, it can ship as different editions. One way to determine the different kinds of editions is to base it on where the customers demand it. Although there are many perspectives in these decisions, the ultimate service of the product is for the customer.

#codingexercise
int getCountOdd(node* root)
{
if (root == null) return 0;
int count;
while (root) {
if (root->value %2 == 1) {
count++;
}
root = root->next;
}
return count;
}