Thursday, September 26, 2013

In the previous post I mentioned an archival policy for a token table. Today during implementation I used the following logic
While exists records to be moved from active to archive
BEGIN
SELECT the id of the candidates a few at a time
INSERT the candidates that have not already been inserted
DELETE from active table these records
END
There are a couple of points that should be called out. For one, there is no user transaction scope involved. This is intentional. I don't want bulk inserts that can take an enormous time as compared to these. These are also subject to failures and are not effective in moving records when the bulk insert fails.
Similarly the user transaction to cover all cases is almost unnecessary when we can structure the operations such that there are checks in each step and preceding steps. The latter helps with identifying failures and taking corrective actions Moreover the user transaction only helps to tie the operations together while each operation is itself transacted. The user transaction is typically used with larger data movement. With the approach to take only a few records even say a handful at a time and checking that they won't be actively used, that they have not already been copied to the destination from  a previous aborted run and deleting the records from the source so that they won't come up again, helps with removing the onus of a user transaction and taking locks on a range of records. Besides, keeping the number of records to a handful during the move, we don't have to join the source and destination tables and instead join only with the handful of records we are interested in. This tremendously improves query performance.
But how do we really guarantee that we are indeed moving the records without failures and that this is done in a rolling manner until all records have been moved.
We do this by finding the records correctly.
How ? We identify the set of records with the keyword to denote only the top few that we are interested in. Our search criteria is based on a predicate that does not change for these records so that these records will match the predicate again and again. Then we keep track of their IDs in a table variable that consists only of one column comprising of these IDs. So long as these records are not being actively used and our logic is the only one doing the move, we are effectively owning these records. Then we take these records by their IDs and compare with source and destination. Since we  check for all the undesirable cases such as original still left behind in the source, duplicates inserts into the destination, we know that we are being successful in the move. Lastly once we are done with our operation, we will not find these records again to move over thus guaranteeing that our process works. We just have to repeat the process over and over again until all the records matching the criteria have been moved.
This process is robust, clean, tolerant to failures. and resumable from any time of the run.

No comments:

Post a Comment