Re: btrfs-dedupe broken and unsupported but in official wiki

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 22, 2020 at 09:49:55PM +0200, Goffredo Baroncelli wrote:
> On 6/19/20 3:11 PM, David Sterba wrote:
> > > If there wasn't some insurmountable reason
> > > why duperemove can't be merged with btrfs-progs, then it would have
> > > happened already, so there must be a reason why this can't ever happen
> > > (which might be as simple as neither maintainer wants to merge).
> > I'm not against adding the functionality to btrfs-progs, but merging
> > whole duperemove feature set might not happen due to additional
> > dependencies. This would need to be evaluated, but I'm not aware of any
> > other technical reasons.
> > 
> > I don't remember exactly why duperemove started as a separate project
> > instead of a subcommand or progs, but we can revisit that.
> > 
> Even tough I don't think that this was the reason at the time, now the
> ioctl FIDEDUPERANGE (aka BTRFS_IOC_FILE_EXTENT_SAME) is "filesystem
> agnostic". So I think that does make sense a tool more generic than
> btrfs(-progs).
> 
> What I mean is: because this is not a BTRFS specific ioctl anymore,
> why we should have a BTRFS specific implementation ?

First, to take advantage of unique btrfs capabilities:  incremental
scanning using transid and TREE_SEARCH_V2, and user data block csums.
Second, to take advantage of generic filesystem capabilities that
require btrfs-specific implementation details.  Third, btrfs has immutable
extents while other filesystems don't, and ignoring that fact in a generic
multi-filesystem tool will cost a lot of dedupe efficiency on btrfs.

On a big filesystem, the difference between a filesystem-specific
dedupe tool and a filesystem-agnostic one could be many orders of
magnitude better performance and a doubling of space recovery.

duperemove is implemented using generic filesystem APIs:  you point it at
a directory tree, it scans all the files in the tree (including
previously deduped files) and dedupes them.  In incremental mode it
scans the entire tree and compares the tree with a database.  This is
the slowest way to keep a filesystem deduplicated at scale.

XFS and btrfs are both capable of doing dedupe at wire speeds by
bypassing most of the filesystem (similar to a scrub, and can even
be combined with scrub).  That level of performance makes incremental
scanning and filesystem csum support unnecessary for many use cases,
since users would just run full dedupe instead of scrub.  One tool
can support both XFS and btrfs this way, though it would have to have
specialized support for each individual filesystem as the details on each
filesystem are very different (GETFSMAP and pread, vs LOGICAL_INO and all
the different btrfs raid profiles and compression formats).  It could be
done as a dedupe core with plugin support for each filesystem, provided
that the core algorithm is designed to handle btrfs's immutable extents.
AFAIK nobody has built such a tool yet.

XFS doesn't maintain csums of user data or support incremental scans,
so XFS can dedupe _only_ as fast as it can scrub (*).  btrfs has the
extra information in the filesystem, so in theory we can start with the
wire-speed dedupe from above, and make it up to 1000 times faster by
reading the csums instead of reading the data blocks, and then faster
still by scanning only the parts of the filesystem that changed from one
dedupe run to the next.

(*) XFS has some very fast tools for rapidly finding modified inodes,
and it doesn't have immutable extents like btrfs does.  XFS might win
by brute force against btrfs's slower equivalents.  It would depend on
the mix of file sizes in the workload.

> From a technical point of view: dupremover could take advantage of
> the btrfs csum. So the question could be : is it better to add the
> capability to use the BTRFS csum to duperemover or to add the code of
> dupremover to BTRFS ?

The options are orthogonal.  csum read support can be added to any dedupe
tool, whether it's part of the official btrfs code or not.  We can decide
on an official tool and add csum support to that tool in either order.

> From an user point of view, I think that the former makes sense.
> 
> BR
> G.Baroncelli
> 
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux