On 2014-08-01 09:23, David Sterba wrote:
> On Fri, Aug 01, 2014 at 06:17:44AM -0400, Austin S Hemmelgarn wrote:
>> I do think however that having the option of a background thread doing
>> deduplication asynchronously is a good idea, but then you would have to
>> have some way to trigger it on individual files/trees, and triggering on
>> writes like the autodefrag thread does doesn't make much sense. Having
>> some userspace program to tell it to run on a given set of files would
>> probably be the best approach for a trigger. I don't remember if this
>> kind of thing was also included in the online deduplication patches that
>> got posted a while back or not.
>
> IIRC the proposed implementation only merged new writes with existing
> data.
>
> For the out-of-band ("off-line") dedup there's bedup
> (https://github.com/g2p/bedup) or Mark's duperemove tool
> (https://github.com/markfasheh/duperemove) that work on a set of files.
>
Something kernel-side to do the work asynchronously would be nice,
especially if it could leverage the check-sums that BTRFS already stores
for the blocks. Having a userspace interface for offline deduplication
similar to that for scrub operations would even better.
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
