In the context of making backups, we typically have a long list of simple steps that, when executed, brings the backup in-sync with the source.
The order of steps is important, because some steps may depend on
the others. For example, we can't start copying a file until its
parent folder is created, nor can we delete a folder until all of
its contents are removed.
Fortunately, if a backup program implements a backup planner
all this is taken care of transparently. Two file trees go in -
one for the source, another for the backup - and a list of backup
steps comes out, in the right order and with all dependencies specified.
Still, the first detail to not be overlooked is that parallel
execution needs to observe and respect inter-step dependencies.
2. Thread count
Second is a question of how many threads to use.
More is not necessarily better.
For local operations
there should be as many threads as
there are CPU cores. Spawning more threads will only cause them
to compete for CPU time, resulting in slower overall
execution time. Spawning fewer threads will usually be
For remote operations
, but the local
CPU count is also a good starting point. For links and networks
with high latency increasing the thread count may improve
throughput, but for a regular LAN the effect is likely to be
small, if any.
Conversely, there's little point in reducing the thread
count, because spawning too many threads here
only cause a bit longer request queue there
is (usually) not harmful to performance.
That is, the rule of thumb for remote backups is to start with
the local CPU count and test larger counts when fine-tuning.
3. File copying
The third gotcha has to do with file copying.
When copying smaller
files, the time needed for opening and closing
them is comparable to that spent on reading and writing them.
If we are to copy a stream of small files, we will be looking at a lot of stalling -
=> We do
want to be copying more than one small file
at a time.
But remember how we can saturate the IO pipeline by copying a single large file?
If we are to throw another large copy in the mix, it will cause transfers to
compete for bandwidth and slow each other down.
=> We do not
want to copy more than one large file
at a time.
The solution lies on the surface - we limit file copying to a single
large file at a time, whereby a file is considered large
if it needs more than X requests to be read in full.
Bvckup 2 uses a threshold of 32 requests, which may seem a bit high,
but has proved to work well in practice.
4. Error handling
Retrying on transient errors in the presence of parallel execution
is another pitfall.
When there are X requests pending and one of them fails with,
say, network unreachable
, what's the right thing to do?
On one hand, the program may just retry each failed request
after a pause. However if the network remains down, this will
pollute the backup log with redundant failures and generally
create a lot of fuss where none is needed.
So instead the program would need to try and exit the retry
pause state gingerly, by retrying a single step first and
following up with the rest if all goes well.
But then there are also edge cases
Sometimes a network hiccup will cause only some requests
to fail right away. More will fail in a minute, few more in
10 minutes, and some will actually manage to complete, but
taking an hour to do so.
As a result the retry logic ends up containing a lot more
complexity than may seem initially necessary. Caveat
5. Memory utilization
Using extra threads invariably means higher memory usage.
Usually the per-thread overhead is not very big unless a
thread needs to copy file. Copying involves allocating
multiple IO buffers and with modern drives these often need
to be large to maximize performance.
It's not atypical for the copying module to use 32 MB in
buffers per file transfer. Multiplied, say, by 16 cores,
that's a half of GB.
However, since we do not
copy large files in parallel,
we kill two birds with one stone and get a free pass with this