This article focuses on the rolling back of changes when
the data transfer results in errors.
Both Azure Data Factory and Data Migration tool flag
errors that need to be corrected prior to performing a migration to Azure.
Warnings and errors can be received when preparing to migrate. After correcting each error, the validations
can be run again to verify resolution of all errors.
Any previously created dry-run artifacts must be
eliminated before a new one commences. Starting from a clean slate is
preferable with any data migration effort because it is harder to sift through
the artifacts from the previous run to say whether they are relevant or not.
Renaming imported data and containers to prevent
conflicts at the destination is another important preparation. Reserving
namespaces and allocating containers are essential for a smooth migration.
Even with the
most careful planning, errors can come from environmental factors such as API
failures, network disconnects, disk failures and rate limits. Proper response
can help ensure that the overall progress of the data transfer has a safe
start, incremental progress throughout the duration of the transfer and a good
finish. The monitoring and alerts from the copy activity during the
transformation is an important tool to guarantee that, just as much as it is
important to maximize the bandwidth utilization and parallelization of copy
activities to reduce the duration of the overall transfer.
A few numbers
might help indicate the spectrum of copy activity in terms of size and
duration. A 1GB data transfer over a
50Mbps connection takes about 2.7 min and on a 5GBps connection takes about
0.03 min. Organizations usually have data in the order of TB or PB which are
orders of magnitude greater than a GB. A
1 PB data transfer over 50 Mbps takes over 64.7 month and on a 10Gbps takes
over 0.3 month.
Restarting the
whole data transfer is impractical when the duration is in the order of days or
months. Some preparation is required to make progress incremental. Fortunately,
workload segregation helps isolate the data transfers so that they can happen parallelly
and the different containers to which the data is written can reduce the scope
and severity of errors.
Calls made for
copying are idempotent and retriable, so they detect the state of the destination
and do not make changes if the copying is completed earlier. The artifacts are
not found if the copying is not completed. Many times, the errors during
copying are transient and the logs would indicate that a retry succeeds.
However, some might not proceed further, and these would become visible via the
metrics and alerts that are set up. The dashboard provides continuous
monitoring and indication for the source of the error and helps to zero in on
the activity to rectify.
Finally, one of
the most important considerations is that the logic and customizations during
the copy activity must be reduced as the data transfers span the network. When
restructuring becomes part of the data transfer or there are additional
routines that include adding tags or metadata, then they can introduce more
failure points. If these could be done at the destination after the data
transfer has been avoided, the copy activities go smoothly.
No comments:
Post a Comment