Cluster computing

Friday, December 7, 2018

Today we continue discussing the best practice from storage engineering:

143) One of the most prone software faults is heap memory usage especially by the java virtual machine. This requires a lot of effort to investigate and narrow down. Often the remedial steps taken are to increase the memory usage all the way to 4GB for the process. Since leaks have their own stacktrace if they occur deterministically, the finding of the root cause involves trials.

144) Among the various usages of process level statistics, one of the most reviewed usage is memory. Since functionalities within the storage server can be broken down as processes for microservices, we are able to narrow down on individual process. Since processes can be restarted, their restarts is a good indication of malfunction

145) Distributed ledger: this is gaining popularity where there is no central ledger and no requirement for a central ownership for verification of grants and revocations. It mitigates tampering. It is a great storage for storing and processing digital identity data and works for products that do not need belong to an organization or cloud.

146) Footprint: The code for a storage server can run on any device. Java for example runs on billions of devices and a storage server written in Java can run even on pocket devices. If the storage is flash and the entire storage server runs only on flash, it makes a great candidate for usability.

147) Editions: Just like the code for storage server can be made suitable for different devices, it can ship as different editions. One way to determine the different kinds of editions is to base it on where the customers demand it. Although there are many perspectives in these decisions, the ultimate service of the product is for the customer.

#codingexercise
In the segregated list discussed earlier, find the count of nodes swapped:
int getCount Swapped (node* root) {
return getCount (root)/2;
}

Thursday, December 6, 2018

Today we're discussing the best practice from storage engineering:
140) The topology for data transfers has been changing together with the technology stack. Previously even master data or the product catalog of a company was a singleton and today there is a practice to rebuild it constantly. The data is also allowed to be stagnant as with data lakes and generally hosted in the cloud. On-Premise servers and SAN are being replaced in favor of cloud technologies wherever possible. Therefore, toolsets and operations differ widely and a conformance to ETL semantics for data transfers from the product will generally be preferred by their audience.
141) Most storage products work best when they are warmed up. A storage product may use its own initialization sequence and internal activities to reach this stage. Since the initialization is not done often, it is a convenient time to put all the activities together so that the subsequent operations are efficient. This has been true at all levels starting from class design to product design.
142) The graceful shutdown of products and recovery from bad states has been equally important to most products even if they are not required to maintain strong guarantees. This is particularly the case with those that are susceptible to a large number of faults and failures. For, example, faults may range from disk and node errors to power failures, network issues, bit flip and random hardware-failures. Reed Solomon erasure coding or Pelican coding try to overcome their faults by determining the m number of error correction chunks from n number of data chunks.

143) One of the most prone software faults is heap memory usage especially by the java virtual machine. This requires a lot of effort to investigate and narrow down. Often the remedial steps taken are to increase the memory usage all the way to 4GB for the process. Since leaks have their own stacktrace if they occur deterministically, the finding of the root cause involves trials.

Wednesday, December 5, 2018

Today we continue discussing the best practice from storage engineering:

136) Data transfer does not always have to be to and from the software product. Transfer within the product using its organizational hierarchy could also be supported. For example, object storage provides the convenience of copying bucket of objects at a time even he objects may have any folder path like prefix.

137) One of the overlooked facts about data transfer is that it is often done by production support personnel because of the sensitivity of the data involved. They prefer safe option to complicated and most efficient operations. If the data transfer can be done with the help of a tool and a shell script, it works very well for such transfers. Consequently, there must be a handoff between the developer and the production support and the interface must be something that is easier to use from the production support side.

138) The administrative chores around production data also increases significantly as compared to the datasets that the product is build on and tested. There is absolutely no room for data corruption and outages that are unplanned. If the data transfer tool itself is defective, it cannot be handed over to production. Consequently, the data transfer tools must be proven and preferably part of the product so that merely the connections can be setup.

139) The data transfers that involve read only operations on the production data are a lot more favored over write-only data. Together this reason and the above constitute the general shift towards Extract-Transform-Load packages to be used with production data instead of writing and leveraging any code for such customization.

140) The topology for data transfers has been changing together with the technology stack. Previously even master data or the product catalog of a company was a singleton and today there is a practice to rebuild it constantly. The data is also allowed to be stagnant as with data lakes and generally hosted in the cloud. On-Premise servers and SAN are being replaced in favor of cloud technologies wherever possible. Therefore, toolsets and operations differ widely and a conformance to ETL semantics for data transfers from the product will generally be preferred by their audience.

Tuesday, December 4, 2018

Today we continue discussing the best practice from storage engineering:

134) Data migration wizards help movement of data as collections of storage artifacts. With the judicious choice of namespace hierarchy and organizational units, the users are relieved from the syntax of storage artifacts and their operations and view their data as either in transit or at rest.

135) Extract-Transform-Load operations are very helpful when transferring data between data stores. In such cases, it is best to author them via user Interface with the help of designer like tools. Logic for these operations may even be customized and saved as modules.

136) Data transfer does not always have to be to and from the software product. Transfer within the product using its organizational hierarchy could also be supported. For example, object storage provides the convenience of copying bucket of objects at a time even he objects may have any folder path like prefix.

#codingexercise
Segregate odd and even nodes in a linked list

node* segregate(node* root) {

if (root == null) return root;
if (root.next == null) return root;
node* prev = null;
node* current = root;
node* next = current->next;
node* head = root;
node* tail=head;

while (tail->next) tail = tail->next;
node* mid= tail;
while( next && next->next &&
prev!= mid && current != mid && next != mid) {
prev = current;
current = cut (next, current);
prev.next = current;

tail.next = next;
tail = next;
tail.next = null;
next = current->next;

}

return root;

}

node* cut (node* next, node* current) {
current.next = next.next;
next.next = null;
current = current.next;
return current;
}

Monday, December 3, 2018

Today we continue discussing the best practice from storage engineering
:
131) Virtual Timeline: Most entries captured in the log are based on actual timeline. With messages passed between distributed components of the storage, there is a provision for using sequence numbers as a virtual timeline. Together boundaries and virtual timeline enable spatial and temporal capture of the changes to the data suitable for watching. Recording the virtual time with the help of a counter and using it for comparisons is one thing, reconstructing the event sequence is another.
132) Tags: Storage artifacts should support tags so that the users may be able to form collections based on tags. With the help of tags, we can specify access control policies, cross-region replication and manage object life-cycles.
133) Classes: Classes are service levels of storage. Object Storage for instance supports two or more storage classes. A class named standard is used with objects that require frequent access of data. A class named infrequent may be used with objects that have less frequently-accessed data. Objects that have changing access patterns over time may be placed in yet another class.
134) Data migration wizards help movement of data as collections of storage artifacts. With the judicious choice of namespace hierarchy and organizational units, the users are relieved from the syntax of storage artifacts and their operations and view their data as either in transit or at rest.
135) Extract-Transform-Load operations are very helpful when transferring data between data stores. In such cases, it is best to author them via user Interface with the help of designer like tools. Logic for these operations may even be customized and saved as modules.

#codingexercise

List<int> SegregateBasedOddAndEvenValueOfElements(List<int> A)
{
List<int>result = new List<int>();
A.forEach((x,I) => {if (A [i]%2 == 0) result.add(x);});
Var odd = A.Except(result).toList();
result.AddRange(odd);
return result;
}

Sunday, December 2, 2018

Today we continue discussing the best practice from storage engineering:

127) Directory Tree: Perhaps files and folders are the most common way of organizing data and many other forms of storage also enable some notion of such organization. While documents and graphs and other forms are also becoming popular forms of storage, they generally require their own language for storage and are not amenable as a tree. Storage organized as trees should facilitate importing and exporting from other data stores so that the user is freed up from the notion that a storage product is defined by the syntax of storage and transfers the data regardless of whether it is stored on file shares or databases.

128) Runtime: Storage has never intruded into compute. Programs requiring storage have no expectation for storage to do their processing. Yet predicates are often pushed down within the compute so that it is as close to data as possible. However, storage does have a notion of callbacks that can be registered. FileSystemWatchers are an example of this requirement. It is also possible for the Storage Layer to host a runtime so that the logic on its layer can be customized.

129) Privacy: guest access to artifacts and routines are sometimes allowed in the product to provide a minimum functionality that is available outside provisioning. The same access can also be used when the user does not need to be identified. In such case, there is no audit information available on the user and the actions

130) Boundaries: The boundaries in a continuous stream of data is usually dependent on two things: a fixed length boundary of segments or a variable length depending on the application logic. However, application refers to the component and the scope performing demarcation on the data. Therefore, boundaries can be nested, non-overlapping and frequently require translations.

131) Virtual Timeline: Most entries captured in the log are based on actual timeline. With messages passed between distributed components of the storage, there is a provision for using sequence numbers as a virtual timeline. Together boundaries and virtual timeline enable spatial and temporal capture of the changes to the data suitable for watching.

Saturday, December 1, 2018

Today we continue discussing the best practice from storage engineering:

123) StackTraces: When the layers of a software product are traversed by shared data structures such as login contexts , then it is helpful to capture and accumulate the stacktraces at the boundaries for troubleshooting purposes. These ring buffer of stack traces provide instant history for the data structures

124) Wrapping: This is done not just for certificates or exceptions. Wrapping works for any artifact that is refreshed or renewed and we want to preserve the old with the new. This may apply even to headers and metadata of data structures that fall in the control path.

125) Bundling: Objects are not only useful as a standalone. They appear in collections. Files for instance can be archived into a tar ball that behaves like any other file. The same is true for all storage artifacts that can be bundled and products do well to promote these.

126) Mount: A remote file share may be mounted such that it appears local to the user. The same is true for any storage product that does not need to be reached over http. File protocols already enable remote file shares and there are many protocols to choose from. Syslog, ssh are also used to transfer data. Therefore, a storage product may choose from all conventional modes of data transfer from remote storage.

127) Directory Tree: Perhaps files and folders are the most common way of organizing data and many other forms of storage also enable some notion of such organization. While documents and graphs and other forms are also becoming popular forms of storage, they generally require their own language for storage and are not amenable as a tree. Storage organized as trees should facilitate importing and exporting from other data stores so that the user is freed up from the notion that a storage product is defined by the syntax of storage and transfers the data regardless of whether it is stored on file shares or databases.