On Tue, Dec 7, 2010 at 2:12 AM, Freddie Cash <fjwcash@xxxxxxxxx> wrote: > On Mon, Dec 6, 2010 at 12:30 PM, Nirbheek Chauhan > <nirbheek.chauhan@xxxxxxxxx> wrote: >> But the behaviour of --inplace is not entirely to write out *only* the >> blocks that have changed. From what I could make out, it does the >> following: >> >> (1) Calculate a delta b/w the src and trg files >> (2) Seek to the first difference in the target file >> (3) Start writing data > > That may be true, I've never looked into the actual algorithm(s) that > rsync uses. ÂJust played around with CLI options until we found the > set that works best in our situation (--inplace --delete-during > --no-whole-file --numeric-ids --hard-links --archive, over SSH with > HPN patches). > >> I'm glossing over the final step because I didn't look deeper, but I >> think you can safely assume that after the first difference, all data >> is rewritten. So this is halfway between "rewrite the whole file" and >> "write only the changed bits into the file". It doesn't actually use >> any CoW features from what I can see. There is lots of room for btrfs >> reflinking magic. :) >> >> Note that I tested this behaviour on a btrfs partition with a vanilla >> rsync-3.0.7 tarball; the copy you use with ZFS might be doing some CoW >> magic. > > All the CoW "magic" is handled by the filesystem, and not the tools on > top. ÂIf the tool only updates X bytes, which fit into 1 block on the > fs, then only that 1 block gets updated via CoW. > I'm quite sure that's what happens in btrfs too, but the thing about updating in-place is that if you have ABCDXXXEFGH which needs to change to ABCDZZZEFGH You're all good. Only the blocks corresponding to XXX will be updated. But if the change is ABCDZZZZEFGH You'll need to start rewriting EFGH since there's no way to insert data in the middle (afaik) of a file with standard syscalls. Maybe later you get a set of changes which sync you up with the file's contents again, but the chances of that happening in a large file are quite remote. That's why I said that it can be safely assumed that after the first difference, all data is rewritten. The only way to get around this on the filesystem level that I can think of is data de-duplication; the filesystem doesn't let go of the blocks for a while, and does reflinking if the same data is written again. Perhaps that's what ZFS is doing, I have no idea :) -- ~Nirbheek Chauhan Gentoo GNOME+Mozilla Team -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
