On Tue, Jun 09, 2020 at 11:31:41AM -0400, Ellis H. Wilson III wrote: > We have a few engineers looking through BTRFS code presently for answers to > this, but I was interested to get input from the experts in parallel to > hopefully understand this issue quickly. > > We find that removes of large amounts of data can take a significant amount > of time in BTRFS on HDDs -- in fact it appears to scale linearly with the > size of the file. I'd like to better understand the mechanics underpinning > that behavior. > > See the attached graph for a quick experiment that demonstrates this > behavior. In this experiment I use 40 threads to perform deletions of > previous written data in parallel. 10,000 files in every case and I scale > files by powers of two from 16MB to 16GB. Thus, the raw amount of data > deleted also expands by 2x every step. Frankly I expected deletion of a > file to be predominantly a metadata operation and not scale with the size of > the file, but perhaps I'm misunderstanding that. The size of metadata is, after a small constant bit, proportional to the number of extents. Which in turn depends on file size. With compression off, extents may be as big as 1GB (which would make their number negligible), but that's clearly not happening in your case. There are tools which can show extent layout. I'd recommend python3-btrfs, which includes /usr/share/doc/python3-btrfs/examples/show_file.py that prints everything available about the list of extents. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ in the beginning was the boot and root floppies and they were good. ⢿⡄⠘⠷⠚⠋⠀ -- <willmore> on #linux-sunxi ⠈⠳⣄⠀⠀⠀⠀
