Design considerations for backup of large files from vCenter:
Files in the vCenter can be upwards of a hundred GB. Several
files may need to be downloaded before they are archived using tools like rsync
or bbcp. If the number of files for simultaneous download is large and the size
of each file is large, even multiple streams may not be sufficient to collapse
the transfer time. Moreover, the local storage needed for repackaging these
files can be arbitrarily large.
The optimum solution would be to read the source as stream
and write to destination with the same stream. However, since repackaging is involved, a
local copy and transfer can only be addressed with the following techniques:
1. Keep
the local file only until it is repackaged and transferred to destination
2. Compress
and archive the packaged file before transfer
3. Maintain
a database with the mappings for the source and destination files, their metadata
and status for retries
4. Parallelize
based on each file and not folders. This gives more granularities.
5. Use
a task parallel library such as Celery along with a message broker for each transfer
of a file.
6. Use
of tools like duplicity may require either the source or the destination to be
local. This means they need to be invoked twice for using a local copy as
temporary. If the repackaging is not permissible at source, may be the repackaging
can be attempted at destination. This works well for remote file storage.
7. The
storage for files must be adequately large to support n number of active
downloads of an earmarked size.
8. There
must be policy to prevent more than a certain number of active downloads. This
can be facilitated with the bookkeeping status in the database.
9. Instead
of using transactions, it would be helpful to enable states ad retries for such
transfers.
Overall, the local storage option is expensive and when it
is unavoidable, the speed of the transfer, the number of active transfers, the ease
of parallelization and the robustness against failures together with retry
logic addresses these pain points.
#codingexerciseShuffle a deck of cards:
Knuth Shuffling :
split the cards into 1 to I and I+1 to n-1.
pick a card from 1 to I randomly and uniformly (I times)
replace with random number card between I+1 to n-1
Fisher-Yates shuffling:
loop over an array
swap each element with a random element past the iteration point
void shuffle(ref List<int> cards)
{
var random = new Random()
for (int i = cards.Count-1; i > 0; i--)
{
int j = random.next(0,i);
Swap(ref cards, i, j);
}
}
No comments:
Post a Comment