Today we're discussing the best practice from storage engineering:
140) The topology for data transfers has been changing together with the technology stack. Previously even master data or the product catalog of a company was a singleton and today there is a practice to rebuild it constantly. The data is also allowed to be stagnant as with data lakes and generally hosted in the cloud. On-Premise servers and SAN are being replaced in favor of cloud technologies wherever possible. Therefore, toolsets and operations differ widely and a conformance to ETL semantics for data transfers from the product will generally be preferred by their audience.
141) Most storage products work best when they are warmed up. A storage product may use its own initialization sequence and internal activities to reach this stage. Since the initialization is not done often, it is a convenient time to put all the activities together so that the subsequent operations are efficient. This has been true at all levels starting from class design to product design.
142) The graceful shutdown of products and recovery from bad states has been equally important to most products even if they are not required to maintain strong guarantees. This is particularly the case with those that are susceptible to a large number of faults and failures. For, example, faults may range from disk and node errors to power failures, network issues, bit flip and random hardware-failures. Reed Solomon erasure coding or Pelican coding try to overcome their faults by determining the m number of error correction chunks from n number of data chunks.
143) One of the most prone software faults is heap memory usage especially by the java virtual machine. This requires a lot of effort to investigate and narrow down. Often the remedial steps taken are to increase the memory usage all the way to 4GB for the process. Since leaks have their own stacktrace if they occur deterministically, the finding of the root cause involves trials.
140) The topology for data transfers has been changing together with the technology stack. Previously even master data or the product catalog of a company was a singleton and today there is a practice to rebuild it constantly. The data is also allowed to be stagnant as with data lakes and generally hosted in the cloud. On-Premise servers and SAN are being replaced in favor of cloud technologies wherever possible. Therefore, toolsets and operations differ widely and a conformance to ETL semantics for data transfers from the product will generally be preferred by their audience.
141) Most storage products work best when they are warmed up. A storage product may use its own initialization sequence and internal activities to reach this stage. Since the initialization is not done often, it is a convenient time to put all the activities together so that the subsequent operations are efficient. This has been true at all levels starting from class design to product design.
142) The graceful shutdown of products and recovery from bad states has been equally important to most products even if they are not required to maintain strong guarantees. This is particularly the case with those that are susceptible to a large number of faults and failures. For, example, faults may range from disk and node errors to power failures, network issues, bit flip and random hardware-failures. Reed Solomon erasure coding or Pelican coding try to overcome their faults by determining the m number of error correction chunks from n number of data chunks.
143) One of the most prone software faults is heap memory usage especially by the java virtual machine. This requires a lot of effort to investigate and narrow down. Often the remedial steps taken are to increase the memory usage all the way to 4GB for the process. Since leaks have their own stacktrace if they occur deterministically, the finding of the root cause involves trials.
No comments:
Post a Comment