Cluster computing

Wednesday, December 5, 2018

Today we continue discussing the best practice from storage engineering:

136) Data transfer does not always have to be to and from the software product. Transfer within the product using its organizational hierarchy could also be supported. For example, object storage provides the convenience of copying bucket of objects at a time even he objects may have any folder path like prefix.

137) One of the overlooked facts about data transfer is that it is often done by production support personnel because of the sensitivity of the data involved. They prefer safe option to complicated and most efficient operations. If the data transfer can be done with the help of a tool and a shell script, it works very well for such transfers. Consequently, there must be a handoff between the developer and the production support and the interface must be something that is easier to use from the production support side.

138) The administrative chores around production data also increases significantly as compared to the datasets that the product is build on and tested. There is absolutely no room for data corruption and outages that are unplanned. If the data transfer tool itself is defective, it cannot be handed over to production. Consequently, the data transfer tools must be proven and preferably part of the product so that merely the connections can be setup.

139) The data transfers that involve read only operations on the production data are a lot more favored over write-only data. Together this reason and the above constitute the general shift towards Extract-Transform-Load packages to be used with production data instead of writing and leveraging any code for such customization.

140) The topology for data transfers has been changing together with the technology stack. Previously even master data or the product catalog of a company was a singleton and today there is a practice to rebuild it constantly. The data is also allowed to be stagnant as with data lakes and generally hosted in the cloud. On-Premise servers and SAN are being replaced in favor of cloud technologies wherever possible. Therefore, toolsets and operations differ widely and a conformance to ETL semantics for data transfers from the product will generally be preferred by their audience.

Cluster computing

Wednesday, December 5, 2018

No comments:

Post a Comment