On 2017-09-12 16:00, Adam Borowski wrote:
On Tue, Sep 12, 2017 at 03:11:52PM -0400, Austin S. Hemmelgarn wrote:
On 2017-09-12 14:43, Adam Borowski wrote:
On Tue, Sep 12, 2017 at 01:36:48PM -0400, Austin S. Hemmelgarn wrote:
On 2017-09-12 13:21, Adam Borowski wrote:
There's fallocate -d, but that for some reason touches mtime which makes
rsync go again. This can be handled manually but is still not nice.
Yeah, the underlying ioctl does modify the file, it's merely fallocate -d
calling it on regions that are already zero. The ioctl doesn't know that,
so fallocate would have to restore the mtime by itself.
There's also another problem: such a check + ioctl are racey. Unlike defrag
or FILE_EXTENT_SAME, you can't thus use it on a file that's in use (or could
suddenly become in use). Fixing this would need kernel support, either as
FILE_EXTENT_SAME with /dev/zero or as a new mode of fallocate.
A new fallocate mode would be more likely. Adding special code to the
EXTENT_SAME ioctl and then requiring implementation on filesystems that
don't otherwise support it is not likely to get anywhere. A new fallocate
mode though would be easy, especially considering that a naive
implementation is easy
Sounds like a good idea. If we go this way, there's a question about
interface: there's choice between:
A) check if the whole range is zero, if even a single bit is one, abort
B) dig many holes, with a given granulation (perhaps left to the
filesystem's choice)
or even both. The former is more consistent with FILE_EXTENT_SAME, the
latter can be smarter (like, digging a 4k hole is bad for fragmentation but
replacing a whole extent, no matter how small, is always a win).
The first. It's more flexible, and the logic required for the second
option is policy, which should not be in the kernel. Matching the
EXTENT_SAME semantics would probably also make the implementation
significantly easier, and with some minor work might give a trivial
implementation for any FS that already supports that ioctl.
That said, I'm not 100% certain if it's necessary. Intentionally calling
fallocate on a file in use is not something most people are going to do
normally anyway, since there is already a TOCTOU race in the fallocate -d
implementation as things are right now.
_Current_ fallocate -d suffers from races, the whole gain from doing this
kernel-side would be eliminating those races. Use cases about the same as
FILE_EXTENT_SAME: you don't need to stop the world. Heck, as I mentioned
before, it conceptually _is_ FILE_EXTENT_SAME with /dev/null, other than
your (good) point about non-btrfs non-xfs.
I meant we shouldn't worry about a race involving the mtime check given
that there's an existing race inherent in the ioctl already.
For now, though, I wonder -- should we send fine folks at util-linux a patch
to make fallocate -d restore mtime, either always or on an option?
It would need to be an option, because it also suffers from a TOCTOU race
(other things might have changed the mtime while you were punching holes),
and it breaks from existing behavior. I think such an option would be
useful, but not universally (for example, I don't care if the mtime on my VM
images changes, as it typically matches the current date and time since the
VM's are running constantly other than when doing maintenance like punching
holes in the images).
Noted. Both Marat's and my use cases, though, involve VMs that are off most
of the time, and at least for me, turned on only to test something.
Touching mtime makes rsync run again, and it's freaking _slow_: worse than
40 minutes for a 40GB VM (source:SSD target:deduped HDD).
40 minutes for 40GB is insanely slow (that's just short of 18 MB/s) if
you're going direct to a hard drive. I get better performance than that
on my somewhat pathetic NUC based storage cluster (I get roughly 20 MB/s
there, but it's for archival storage so I don't really care). I'm
actually curious what the exact rsync command you are using is (you can
obviously redact paths as you see fit), as the only way I can think of
that it should be that slow is if you're using both --checksum (but if
you're using this, you can tell rsync to skip the mtime check, and that
issue goes away) and --inplace, _and_ your HDD is slow to begin with.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html