Re: out-of-band dedup status?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 8, 2016 at 7:26 PM, Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote:
> On Thu, Dec 08, 2016 at 05:45:40PM -0700, Chris Murphy wrote:
>> OK something's wrong.
>>
>> Kernel 4.8.12 and duperemove v0.11.beta4. Brand new file system
>> (mkfs.btrfs -dsingle -msingle, default mount options) and two
>> identical files separately copied.
>>
>> [chris@f25s]$ ls -li /mnt/test
>> total 2811904
>> 260 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
>> Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
>> 259 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
>> Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
>>
>> [chris@f25s]$ filefrag /mnt/test/*
>> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: 3 extents found
>> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2: 2 extents found
>>
>>
>> [chris@f25s duperemove]$ sudo ./duperemove -dv /mnt/test/*
>> Using 128K blocks
>> Using hash: murmur3
>> Gathering file list...
>> Using 4 threads for file hashing phase
>> [1/2] (50.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
>> [2/2] (100.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
>> Total files:  2
>> Total hashes: 21968
>> Loading only duplicated hashes from hashfile.
>> Using 4 threads for dedupe phase
>> [0xba8400] (00001/10947) Try to dedupe extents with id e47862ea
>> [0xba84a0] (00003/10947) Try to dedupe extents with id ffed44f2
>> [0xba84f0] (00002/10947) Try to dedupe extents with id ffeefcdd
>> [0xba8540] (00004/10947) Try to dedupe extents with id ffe4cf64
>> [0xba8540] Add extent for file
>> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
>> 1182924800 (4)
>> [0xba8540] Add extent for file
>> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
>> 1182924800 (5)
>> [0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800,
>> 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
>
> Ew, it's deduping these two 1.4GB files 128K at a time, which results in
> 12000 ioctl calls.  Each of those 12000 calls has to lock the two
> inodes, read the file contents, remap the blocks, etc.  instead of
> finding the maximal identical range and making a single call for the
> whole range.
>
> That's probably why it's taking forever to dedupe.

Yes but it looks like it's also heavily fragmenting the files as a
result as well.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux