On Thu, Dec 8, 2016 at 7:26 PM, Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote: > On Thu, Dec 08, 2016 at 05:45:40PM -0700, Chris Murphy wrote: >> OK something's wrong. >> >> Kernel 4.8.12 and duperemove v0.11.beta4. Brand new file system >> (mkfs.btrfs -dsingle -msingle, default mount options) and two >> identical files separately copied. >> >> [chris@f25s]$ ls -li /mnt/test >> total 2811904 >> 260 -rw-r--r--. 1 root root 1439694848 Dec 8 17:26 >> Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso >> 259 -rw-r--r--. 1 root root 1439694848 Dec 8 17:26 >> Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2 >> >> [chris@f25s]$ filefrag /mnt/test/* >> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: 3 extents found >> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2: 2 extents found >> >> >> [chris@f25s duperemove]$ sudo ./duperemove -dv /mnt/test/* >> Using 128K blocks >> Using hash: murmur3 >> Gathering file list... >> Using 4 threads for file hashing phase >> [1/2] (50.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso >> [2/2] (100.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2 >> Total files: 2 >> Total hashes: 21968 >> Loading only duplicated hashes from hashfile. >> Using 4 threads for dedupe phase >> [0xba8400] (00001/10947) Try to dedupe extents with id e47862ea >> [0xba84a0] (00003/10947) Try to dedupe extents with id ffed44f2 >> [0xba84f0] (00002/10947) Try to dedupe extents with id ffeefcdd >> [0xba8540] (00004/10947) Try to dedupe extents with id ffe4cf64 >> [0xba8540] Add extent for file >> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset >> 1182924800 (4) >> [0xba8540] Add extent for file >> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset >> 1182924800 (5) >> [0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800, >> 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" > > Ew, it's deduping these two 1.4GB files 128K at a time, which results in > 12000 ioctl calls. Each of those 12000 calls has to lock the two > inodes, read the file contents, remap the blocks, etc. instead of > finding the maximal identical range and making a single call for the > whole range. > > That's probably why it's taking forever to dedupe. Yes but it looks like it's also heavily fragmenting the files as a result as well. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
