Tuesday, February 26, 2019

We continue with the discussion on ledger parser
Checks and balances need to be performed as well by some readers. The entries may become spurious and there are ways a ghost entry can be made. Therefore, there is some trade-off in deciding whether the readers or the writers or both do the cleanup work.
The trouble with cleanup is that it requires modifications to the ledger and while they can be done independently by designated agents, they don’t give as much confidence as when they came from the source of the truth as the online writer. Since cleanup is a read-write task, it involves some form of writer.
If all the writes were uniform and clean entries, it does away with the need for designated agents and their lags and delays. At this end of the writer only solution, we have traded-off performance. At the other end of high-performance online writing, we are introducing all kinds of background processors, stages and cascading complexity in analysis.
The savvy online writer may choose to use partitioned ledgers for clean and accurate differentiated entries or quick translations to uniform-entries by piggy-backing the information on the transactions where it can keep a trace of entries for that transaction. The cleaner the data, the simpler the solution for downstream systems.
The elapsed time for analysis is the sum of whole. Therefore, all lags and delays from background operations are included. The streamlining of analysis operations requires data to be clean so that the analysis operations are read only transactions themselves.
If the operations themselves contribute to delays, the streamlining of operations for the overall analysis requires the use of persisting results from these operations where the input does not change. Then the operations need not be run each time and the results can be re-used for other queries. If the stages are sequential, the overall analysis becomes streamlined.
Unfortunately, the data from users cannot be confined to a size. Passing the results from one stage to another breaks down when there is a large amount of data to be processed in that stage and sequential processing does not scale. In such cases, partial results retrieved earlier help enable downstream systems to have something meaningful to do. Often, this is adequate even if all the data has not been processed.

No comments:

Post a Comment