Cluster computing

Sunday, December 9, 2018

Today we continue discussing the best practice from storage engineering

148) Standalone mode: Most storage products offer capabilities in a standalone mode. This makes it easy to try out the product without dependencies. One Box deployments also work in this manner. When we remove the dependency on the hardware, we enable the product to be easier to study and try out the features.

149) End User License Agreement: Products have EULAs which the user is required to accept. If the EULA conditions are not read or accepted, certain functionalities of the server are turned off. Typically, this check happens at the initialization time of the product or the restart of the service. However, the initialization code is global and has very sensitive sequence of steps. Therefore, placing the EULA criteria without leaving the product in a bad state is necessary.

150) Provisioning: Storage artifacts are not just created by user. Some may be offered as built-ins so that the user can start working with their data rather than handling the overhead of setting up containers and describing them. These ready to use built-in storage artifacts lets the user focus on their tasks rather than the system tasks. A judicious choice of all such artifacts and routines are therefore useful to the user.

151) Querying: Storage products are not required to form a query layer. Still they may participate in the query layer on enumeration basis. Since the query layer can come separately over the storage entities, it is generally written in the form of a web service. However SQL like queries may also be supported on the enumerations as shown here: https://1drv.ms/w/s!Ashlm-Nw-wnWt1QqRW-GkZlfg0yz

152) Storage product code is usually a large code base. Component interaction diagrams can be generated and reorganizations may be done to improve the code. The practice associated with large code bases applies to storage server code as well.

#codingexercise
int getCountEven(node* root)
{
if (root == null) return 0;
int count;
while (root) {
if (root->value %2 == 0) {
count++;
}
root = root->next;
}
return count;
}

Saturday, December 8, 2018

Today we continue discussing the best practice from storage engineering:

146) Footprint: The code for a storage server can run on any device. Java for example runs on billions of devices and a storage server written in Java can run even on pocket devices. If the storage is flash and the entire storage server runs only on flash, it makes a great candidate for usability.

147) Editions: Just like the code for storage server can be made suitable for different devices, it can ship as different editions. One way to determine the different kinds of editions is to base it on where the customers demand it. Although there are many perspectives in these decisions, the ultimate service of the product is for the customer.

#codingexercise
int getCountOdd(node* root)
{
if (root == null) return 0;
int count;
while (root) {
if (root->value %2 == 1) {
count++;
}
root = root->next;
}
return count;
}

Friday, December 7, 2018

Today we continue discussing the best practice from storage engineering:

143) One of the most prone software faults is heap memory usage especially by the java virtual machine. This requires a lot of effort to investigate and narrow down. Often the remedial steps taken are to increase the memory usage all the way to 4GB for the process. Since leaks have their own stacktrace if they occur deterministically, the finding of the root cause involves trials.

144) Among the various usages of process level statistics, one of the most reviewed usage is memory. Since functionalities within the storage server can be broken down as processes for microservices, we are able to narrow down on individual process. Since processes can be restarted, their restarts is a good indication of malfunction

145) Distributed ledger: this is gaining popularity where there is no central ledger and no requirement for a central ownership for verification of grants and revocations. It mitigates tampering. It is a great storage for storing and processing digital identity data and works for products that do not need belong to an organization or cloud.

146) Footprint: The code for a storage server can run on any device. Java for example runs on billions of devices and a storage server written in Java can run even on pocket devices. If the storage is flash and the entire storage server runs only on flash, it makes a great candidate for usability.

147) Editions: Just like the code for storage server can be made suitable for different devices, it can ship as different editions. One way to determine the different kinds of editions is to base it on where the customers demand it. Although there are many perspectives in these decisions, the ultimate service of the product is for the customer.

#codingexercise
In the segregated list discussed earlier, find the count of nodes swapped:
int getCount Swapped (node* root) {
return getCount (root)/2;
}

Thursday, December 6, 2018

Today we're discussing the best practice from storage engineering:
140) The topology for data transfers has been changing together with the technology stack. Previously even master data or the product catalog of a company was a singleton and today there is a practice to rebuild it constantly. The data is also allowed to be stagnant as with data lakes and generally hosted in the cloud. On-Premise servers and SAN are being replaced in favor of cloud technologies wherever possible. Therefore, toolsets and operations differ widely and a conformance to ETL semantics for data transfers from the product will generally be preferred by their audience.
141) Most storage products work best when they are warmed up. A storage product may use its own initialization sequence and internal activities to reach this stage. Since the initialization is not done often, it is a convenient time to put all the activities together so that the subsequent operations are efficient. This has been true at all levels starting from class design to product design.
142) The graceful shutdown of products and recovery from bad states has been equally important to most products even if they are not required to maintain strong guarantees. This is particularly the case with those that are susceptible to a large number of faults and failures. For, example, faults may range from disk and node errors to power failures, network issues, bit flip and random hardware-failures. Reed Solomon erasure coding or Pelican coding try to overcome their faults by determining the m number of error correction chunks from n number of data chunks.

143) One of the most prone software faults is heap memory usage especially by the java virtual machine. This requires a lot of effort to investigate and narrow down. Often the remedial steps taken are to increase the memory usage all the way to 4GB for the process. Since leaks have their own stacktrace if they occur deterministically, the finding of the root cause involves trials.

Wednesday, December 5, 2018

Today we continue discussing the best practice from storage engineering:

136) Data transfer does not always have to be to and from the software product. Transfer within the product using its organizational hierarchy could also be supported. For example, object storage provides the convenience of copying bucket of objects at a time even he objects may have any folder path like prefix.

137) One of the overlooked facts about data transfer is that it is often done by production support personnel because of the sensitivity of the data involved. They prefer safe option to complicated and most efficient operations. If the data transfer can be done with the help of a tool and a shell script, it works very well for such transfers. Consequently, there must be a handoff between the developer and the production support and the interface must be something that is easier to use from the production support side.

138) The administrative chores around production data also increases significantly as compared to the datasets that the product is build on and tested. There is absolutely no room for data corruption and outages that are unplanned. If the data transfer tool itself is defective, it cannot be handed over to production. Consequently, the data transfer tools must be proven and preferably part of the product so that merely the connections can be setup.

139) The data transfers that involve read only operations on the production data are a lot more favored over write-only data. Together this reason and the above constitute the general shift towards Extract-Transform-Load packages to be used with production data instead of writing and leveraging any code for such customization.

140) The topology for data transfers has been changing together with the technology stack. Previously even master data or the product catalog of a company was a singleton and today there is a practice to rebuild it constantly. The data is also allowed to be stagnant as with data lakes and generally hosted in the cloud. On-Premise servers and SAN are being replaced in favor of cloud technologies wherever possible. Therefore, toolsets and operations differ widely and a conformance to ETL semantics for data transfers from the product will generally be preferred by their audience.

Tuesday, December 4, 2018

Today we continue discussing the best practice from storage engineering:

134) Data migration wizards help movement of data as collections of storage artifacts. With the judicious choice of namespace hierarchy and organizational units, the users are relieved from the syntax of storage artifacts and their operations and view their data as either in transit or at rest.

135) Extract-Transform-Load operations are very helpful when transferring data between data stores. In such cases, it is best to author them via user Interface with the help of designer like tools. Logic for these operations may even be customized and saved as modules.

136) Data transfer does not always have to be to and from the software product. Transfer within the product using its organizational hierarchy could also be supported. For example, object storage provides the convenience of copying bucket of objects at a time even he objects may have any folder path like prefix.

#codingexercise
Segregate odd and even nodes in a linked list

node* segregate(node* root) {

if (root == null) return root;
if (root.next == null) return root;
node* prev = null;
node* current = root;
node* next = current->next;
node* head = root;
node* tail=head;

while (tail->next) tail = tail->next;
node* mid= tail;
while( next && next->next &&
prev!= mid && current != mid && next != mid) {
prev = current;
current = cut (next, current);
prev.next = current;

tail.next = next;
tail = next;
tail.next = null;
next = current->next;

}

return root;

}

node* cut (node* next, node* current) {
current.next = next.next;
next.next = null;
current = current.next;
return current;
}

Monday, December 3, 2018

Today we continue discussing the best practice from storage engineering
:
131) Virtual Timeline: Most entries captured in the log are based on actual timeline. With messages passed between distributed components of the storage, there is a provision for using sequence numbers as a virtual timeline. Together boundaries and virtual timeline enable spatial and temporal capture of the changes to the data suitable for watching. Recording the virtual time with the help of a counter and using it for comparisons is one thing, reconstructing the event sequence is another.
132) Tags: Storage artifacts should support tags so that the users may be able to form collections based on tags. With the help of tags, we can specify access control policies, cross-region replication and manage object life-cycles.
133) Classes: Classes are service levels of storage. Object Storage for instance supports two or more storage classes. A class named standard is used with objects that require frequent access of data. A class named infrequent may be used with objects that have less frequently-accessed data. Objects that have changing access patterns over time may be placed in yet another class.
134) Data migration wizards help movement of data as collections of storage artifacts. With the judicious choice of namespace hierarchy and organizational units, the users are relieved from the syntax of storage artifacts and their operations and view their data as either in transit or at rest.
135) Extract-Transform-Load operations are very helpful when transferring data between data stores. In such cases, it is best to author them via user Interface with the help of designer like tools. Logic for these operations may even be customized and saved as modules.

#codingexercise

List<int> SegregateBasedOddAndEvenValueOfElements(List<int> A)
{
List<int>result = new List<int>();
A.forEach((x,I) => {if (A [i]%2 == 0) result.add(x);});
Var odd = A.Except(result).toList();
result.AddRange(odd);
return result;
}