Re: Any chance to get snapshot-aware defragmentation?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



пн, 21 мая 2018 г. в 16:16, Austin S. Hemmelgarn <ahferroin7@xxxxxxxxx>:

> On 2018-05-19 04:54, Niccolò Belli wrote:
> > On venerdì 18 maggio 2018 20:33:53 CEST, Austin S. Hemmelgarn wrote:
> >> With a bit of work, it's possible to handle things sanely.  You can
> >> deduplicate data from snapshots, even if they are read-only (you need
> >> to pass the `-A` option to duperemove and run it as root), so it's
> >> perfectly reasonable to only defrag the main subvolume, and then
> >> deduplicate the snapshots against that (so that they end up all being
> >> reflinks to the main subvolume).  Of course, this won't work if you're
> >> short on space, but if you're dealing with snapshots, you should have
> >> enough space that this will work (because even without defrag, it's
> >> fully possible for something to cause the snapshots to suddenly take
> >> up a lot more space).
> >
> > Been there, tried that. Unfortunately even if I skip the defreg a simple
> >
> > duperemove -drhA --dedupe-options=noblock --hashfile=rootfs.hash rootfs
> >
> > is going to eat more space than it was previously available (probably
> > due to autodefrag?).
> It's not autodefrag (that doesn't trigger on use of the EXTENT_SAME
> ioctl).  There's two things involved here:

> * BTRFS has somewhat odd and inefficient handling of partial extents.
> When part of an extent becomes unused (because of a CLONE ioctl, or an
> EXTENT_SAME ioctl, or something similar), that part stays allocated
> until the whole extent would be unused.
> * You're using the default deduplication block size (128k), which is
> larger than your filesystem block size (which is at most 64k, most
> likely 16k, but might be 4k if it's an old filesystem), so deduplicating
> can split extents.

That's a metadata node leaf != fs block size.
btrfs fs block size == machine page size currently.

> Because of this, if a duplicate region happens to overlap the front of
> an already shared extent, and the end of said shared extent isn't
> aligned with the deduplication block size, the EXTENT_SAME call will
> deduplicate the first part, creating a new shared extent, but not the
> tail end of the existing shared region, and all of that original shared
> region will stick around, taking up extra space that it wasn't before.

> Additionally, if only part of an extent is duplicated, then that area of
> the extent will stay allocated, because the rest of the extent is still
> referenced (so you won't necessarily see any actual space savings).

> You can mitigate this by telling duperemove to use the same block size
> as your filesystem using the `-b` option.   Note that using a smaller
> block size will also slow down the deduplication process and greatly
> increase the size of the hash file.

duperemove -b control "how hash data", not more or less and only support
4KiB..1MiB

And size of block for dedup will change efficiency of deduplication,
when count of hash-block pairs, will change hash file size and time
complexity.

Let's assume that: 'A' - 1KiB of data 'AAAA' - 4KiB with repeated pattern.

So, example, you have 2 of 2x4KiB blocks:
1: 'AAAABBBB'
2: 'BBBBAAAA'

With -b 8KiB hash of first block not same as second.
But with -b 4KiB duperemove will see both 'AAAA' and 'BBBB'
And then that blocks will be deduped.

Even, duperemove have 2 modes of deduping:
1. By extents
2. By blocks

Thanks.

--
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux