Friday, November 30, 2018

Today we continue discussing the best practice from storage engineering:

119) When storage operations don’t go as planned, Exceptions need to be raised and reported. Since the exceptions bubble up from deep layers, the need  to be properly wrapped and translated for them to be actionable to the user. Such exception handling and the chaining often breaks leading to costly troubleshooting. Consequently, code revisits and cleanup become a routine chore

120) Exceptions and alerts don’t matter to the customer if they don’t come with a wording that explains the mitigatory action needed to be taken by the user. Error code, level and severity are other useful ways to describe the error. Diligence in preparing better error messages go a long way to help end users.

121) The number of background jobs to data path workers is an important ratio. It is easy to delegate jobs to the background in order to make the data path fast. However, if there is only one data path worker and the number of background jobs is very high, then efficiency reduces and  message passing increases. Instead it might be better to serialize the tasks on the same worker. The trade-off is even more glaring when the background workers are polling or executing in scheduled cycles because it introduces delays.

122) Event based programming is harder to co-ordinate and diagnose as compared to sequential programming yet it is fondly used in many storage drivers and even in user mode components which do not need to be highly responsive or where there might be significant delay between action triggers. This requires a driver verifier to analyze all the code paths. Instead, synchronous execution suffices with object oriented design for better organization and easier troubleshooting. While it is possible to mix the two, the notion that the execution follows the timeline in the logs for the activities performed by the storage product helps, reduce overall cost of maintenance.

No comments:

Post a Comment