Cluster computing

Sunday, December 2, 2018

Today we continue discussing the best practice from storage engineering:

127) Directory Tree: Perhaps files and folders are the most common way of organizing data and many other forms of storage also enable some notion of such organization. While documents and graphs and other forms are also becoming popular forms of storage, they generally require their own language for storage and are not amenable as a tree. Storage organized as trees should facilitate importing and exporting from other data stores so that the user is freed up from the notion that a storage product is defined by the syntax of storage and transfers the data regardless of whether it is stored on file shares or databases.

128) Runtime: Storage has never intruded into compute. Programs requiring storage have no expectation for storage to do their processing. Yet predicates are often pushed down within the compute so that it is as close to data as possible. However, storage does have a notion of callbacks that can be registered. FileSystemWatchers are an example of this requirement. It is also possible for the Storage Layer to host a runtime so that the logic on its layer can be customized.

129) Privacy: guest access to artifacts and routines are sometimes allowed in the product to provide a minimum functionality that is available outside provisioning. The same access can also be used when the user does not need to be identified. In such case, there is no audit information available on the user and the actions

130) Boundaries: The boundaries in a continuous stream of data is usually dependent on two things: a fixed length boundary of segments or a variable length depending on the application logic. However, application refers to the component and the scope performing demarcation on the data. Therefore, boundaries can be nested, non-overlapping and frequently require translations.

131) Virtual Timeline: Most entries captured in the log are based on actual timeline. With messages passed between distributed components of the storage, there is a provision for using sequence numbers as a virtual timeline. Together boundaries and virtual timeline enable spatial and temporal capture of the changes to the data suitable for watching.

Saturday, December 1, 2018

Today we continue discussing the best practice from storage engineering:

123) StackTraces: When the layers of a software product are traversed by shared data structures such as login contexts , then it is helpful to capture and accumulate the stacktraces at the boundaries for troubleshooting purposes. These ring buffer of stack traces provide instant history for the data structures

124) Wrapping: This is done not just for certificates or exceptions. Wrapping works for any artifact that is refreshed or renewed and we want to preserve the old with the new. This may apply even to headers and metadata of data structures that fall in the control path.

125) Bundling: Objects are not only useful as a standalone. They appear in collections. Files for instance can be archived into a tar ball that behaves like any other file. The same is true for all storage artifacts that can be bundled and products do well to promote these.

126) Mount: A remote file share may be mounted such that it appears local to the user. The same is true for any storage product that does not need to be reached over http. File protocols already enable remote file shares and there are many protocols to choose from. Syslog, ssh are also used to transfer data. Therefore, a storage product may choose from all conventional modes of data transfer from remote storage.

127) Directory Tree: Perhaps files and folders are the most common way of organizing data and many other forms of storage also enable some notion of such organization. While documents and graphs and other forms are also becoming popular forms of storage, they generally require their own language for storage and are not amenable as a tree. Storage organized as trees should facilitate importing and exporting from other data stores so that the user is freed up from the notion that a storage product is defined by the syntax of storage and transfers the data regardless of whether it is stored on file shares or databases.

Friday, November 30, 2018

Today we continue discussing the best practice from storage engineering:

119) When storage operations don’t go as planned, Exceptions need to be raised and reported. Since the exceptions bubble up from deep layers, the need to be properly wrapped and translated for them to be actionable to the user. Such exception handling and the chaining often breaks leading to costly troubleshooting. Consequently, code revisits and cleanup become a routine chore

120) Exceptions and alerts don’t matter to the customer if they don’t come with a wording that explains the mitigatory action needed to be taken by the user. Error code, level and severity are other useful ways to describe the error. Diligence in preparing better error messages go a long way to help end users.

121) The number of background jobs to data path workers is an important ratio. It is easy to delegate jobs to the background in order to make the data path fast. However, if there is only one data path worker and the number of background jobs is very high, then efficiency reduces and message passing increases. Instead it might be better to serialize the tasks on the same worker. The trade-off is even more glaring when the background workers are polling or executing in scheduled cycles because it introduces delays.

122) Event based programming is harder to co-ordinate and diagnose as compared to sequential programming yet it is fondly used in many storage drivers and even in user mode components which do not need to be highly responsive or where there might be significant delay between action triggers. This requires a driver verifier to analyze all the code paths. Instead, synchronous execution suffices with object oriented design for better organization and easier troubleshooting. While it is possible to mix the two, the notion that the execution follows the timeline in the logs for the activities performed by the storage product helps, reduce overall cost of maintenance.

Thursday, November 29, 2018

Today we continue discussing the best practice from storage engineering:

115) Upgrade scenarios : As with any other product, a storage server also has similar concerns for changes to data structures or requests/responses. While it is important for each feature to be backward compatible, it is also important to have a design which can introduce flexibility without encumberance.

116) Multi-purpose applicability of logic: When we covered diagnostic tools, scripts, we mentioned a logic that make use of feedback from monitored data paths. This is one example but verification and validation such as these are also equally applicable for external monitoring and troubleshooting of product. The same logic may apply in a diagnostic API as well as in product code as active data path corrections. Furthermore, such a logic may be called from User Interface, command line or other SDKs. Therefore, validations throughout the product are candidates for repurposing.

117) Read-only/ Read-Write It is better to separate out read -only from read-write portion of data because it separates the task access for the data. Usually online processing can be isolated to read write while analytical processing can be separated to read only. The same holds true for static plans versus dynamic policies and the server side resources from the client side policies.

118) While control path is easier to describe and maintain, the data path is more difficult to determine upfront because customers use it any way they want. When we discussed assigning labels to incoming data and workloads, it was a reference to classify the usages of the product so we can gain insight into how it is being used. I’m a feedback cycle, such labels provide convenient understanding of the temporal and spatial nature of the data flow.

119) When storage operations don’t go as planned, Exceptions need to be raised and reported. Since the exceptions bubble up from deep layers, the need to be properly wrapped and translated for them to be actionable to the user. Such exception handling and the chaining often breaks leading to costly troubleshooting. Consequently, code revisits and cleanup become a routine chore

120) Exceptions and alerts don’t matter to the customer if they don’t come with a wording that explains the mitigatory action needed to be taken by the user. Error code, level and severity are other useful ways to describe the error. Diligence in preparing better error messages go a long way to help end users.

Wednesday, November 28, 2018

Today we continue discussing the best practice from storage engineering:
111) Memory configuration: In a cluster environment, most of the nodes are commodity. Typically, they have reasonable memory. However, the amount of storage data that can be processed by a node depends on fitting the corresponding data structures in memory. The larger the memory, the higher the capability of the server component in the control path. Therefore, there must be some choice of memory versus capability in the overall topology of the server so that it can be recommended to customers.
112) Cpu Configuration: Typically, VMs added as nodes to storage cluster come in T-shirt size configurations with the number of CPUs and the memory configuration defined for each T-shirt size. There is no restriction for the storage server to be deployed in a container or a single Virtual Machine. And since the virtualization of the compute makes it harder to tell the scale up of the host, the general rule of thumb has been more the better. This does not need to be so and a certain configuration may provide the best advantage. Again, the choice and the recommendation must be conveyed to the customer.
113) Serverless Computing: Many functions of a storage server/product are written in the form of microservices or perhaps as components within layers if they are not separable. However, the notion of modularity can be taken in the form of serverless computing so that the lease on named compute servers does not affect the storage server.
114) Expansion: Although some configurations may be optimal, the storage server must be flexible to what the end user want as configuration for the system resources. Availability of flash storage and its configuration via external additions to the hardware is a special case. But upgrading the storage server from one hardware to another must be permitted.
115) Upgrade scenarios : As with any other product, a storage server also has similar concerns for changes to data structures or requests/responses. While it is important for each feature to be backward compatible, it is also important to have a design which can introduce flexibility without encumberance.

Tuesday, November 27, 2018

Today we continue discussing the best practice from storage engineering:

106) CAP theorem states that a system cannot have availability, consistency, and partition tolerance at the same time. However, it is possible to work around this when we use layering and design the system around a specific fault model. The append only layer provides high availability. The partitioning layer provides strong consistency guarantees. Together they can handle specific set of faults with high availability and strong consistency.
107) Workload profiles: Every storage engineering product will have data I/O and one of the ways to try out the product is to use a set of workload profiles with varying patterns of data access.
108) Intra/Inter: Since data I/O crosses multiple layers, a lower layer may perform operations that are similarly applicable to artifacts in another layer at a higher scope. For example, replication may be between copies of objects within a single PUT operation and may also be equally applicable to objects spanning sites designated for content replication. This not only emphasizes reusability but also provides a way to check the architecture for consistency.
109) Local/Remote: While many components within the storage server take the disk operations to be local there are certain components that gather information across disks and components directly writing to it. In such case, even if the disk is local, it would prove consistent to access local via a loopback and simplify the logic to assuming every such operation as remote.
110) Resource consumption: We referred to performance engineering for improving the data operations. However, the number of resources used per request was not called out because it may have been perfectly acceptable if the elapsed time was within bounds. However, resource conservation has a lot to do with reducing interactions which in turn leads to efficiency.

Monday, November 26, 2018

Today we continue discussing the best practice from storage engineering:
100)
Conformance to verbs: Service oriented architecture framework of providing web services defined contract and behavior in addition to address and binding for services but the general shift in the industry has been towards RESTful services from that architecture. This paradigm introduces well known verbs for operations permitted. Storage products that provide RESTful services must conform to the well-defined mapping of verbs to create-update-delete operations on their resources.

101) Storage products tend to form a large code base which significantly hurts developer productivity when build time takes more than a few minutes. Consequently code base may need to be constantly refactored or the build needs to be completed with more workers, memory and profiling.

102) Profiling is not limited to build time. Like the performance counters mentioned earlier, there is a way to build instrumented code so that bottlenecks may be identified. Like build profiling, this has to be repeated and trends need to be monitored.

103) Stress testing the storage product also helps gain valuable insights into whether the product’s performance changes over time. This covers everything from memory leak to resource starvation.

104) Diagnostic tools and scripts including log queries that are used to troubleshoot during development time also become useful artifacts to share with the developer community for the storage product. Even if the storage product is used mostly for archival, there is value in sharing these and documentation with the community

105) Older versions of the storage product may have had to be diagnosed with scripts and log queries but bringing them into the product in its current version as diagnostic API makes it mainstream. Documentation for these and other APIs make it easier on the developer community.