is an internal name of a brand new bulk and delta copying module, shipping with
A complete rewrite, it builds on what we learned in the past 4 years and implements significant improvements in several key areas.
very fast drives
scalable delta copying
First-class support for
resuming of copying
What follows is a tour of the first two items. The other two will be covered in a separate post.
Copying lots of small files quickly is a challenge.
The per-file overhead of
work is often comparable to the time needed for the actual
. This fixed cost is split between program's own overhead and the time spent actually opening and closing files, copying meta info, etc.
drives this was not an area worthy of optimization, because the cost of merely
a file dwarfed that of any prep work that the app itself was doing.
drives and faster machines it's no longer the case. All that trivial activity like allocating buffers, writing to the log, pre-configuring the IO - it all suddenly adds up and starts to matter.
For this reason the ultra copier now aggressively pre-allocates, caches, recycles and otherwise streamlines the prep/post phases to keep its per-file overhead to an absolute minimum.
The effect of this obviously varies, but it
be as eye-popping as a
, for example, when cloning
Bvckup 2 has been using
multi-buffer async IO
from its very first release.
The core of the technique is that the program doesn't wait for read/write requests to complete, but it rather just queues them with Windows and later checks if they are done.
That last bit -
check if it's done
- is where the new code does things differently.
Ultra's IO pipeline is built around
IO completion ports
(IOCP) which it uses to track,
, the completion of IO requests.
The program issues read/write requests as before, but it also asks Windows to queue a "done" notification once a request is completed.
is called an "IO completion...
", which tends to muddy the waters somewhat, but it is one of more elegant and useful mechanisms of the Windows kernel.
The key point of IOCP is in the last line:
With IOCP we no longer need to drag the full list of pending requests across the userspace boundary just to learn which of them might've been completed.
This makes IOCP really quite
But wait, there's more.
IOCP can also accommodate async
The ultra copier makes a full use of this when delta-copying a file. It feeds a stream of
requests to a pool of worker threads and then receives their completions events via the same port that it uses for IO requests.
This allows for uniform handling of
async operations in the IO code. Reading, hashing and writing now become equal parts of the IO pipeline, leading to a simpler code.
when async IO requests may complete synchronously. In fact, Microsoft also says that your code should be prepared for this to happen at all times.
We care about synchronous completion, because it makes the pipeline stutter, so it's not good for performance.
When this happens, Windows still queues an IOCP notification, so the simplest thing to do is to ignore
request completes and just wait for an IOCP ping.
This however adds a small delay to the IO flow, because we end up completing a request later than we could've.
You probably see where this is headed and you are correct - the ultra copier suppresses IOCP pings for sync reads/writes and processes them immediately.
If you ever wondered what
is for - here you go, now you know :)
Locked IO buffers
Among smaller performance tweaks, the ultra copier defaults to using
when reading larger files. Larger files aren't likely to be cached in full, so bypassing the cache has a small, but noticeable effect on the reading speed.
There also happens to be a way to further improve performance by
locking IO buffers
* Some conditions apply
In particular, this requires holding a rather exotic privilege and it
may lead to memory starvation for the rest of the system. Ask me how I know.
But when used with care it does appear to improve bulk IO rate on faster drives.
, the delta copying improvements.