On 09.06.2020 19:53 Adam Borowski wrote:
On Tue, Jun 09, 2020 at 11:31:41AM -0400, Ellis H. Wilson III wrote:
We have a few engineers looking through BTRFS code presently for answers to
this, but I was interested to get input from the experts in parallel to
hopefully understand this issue quickly.
We find that removes of large amounts of data can take a significant amount
of time in BTRFS on HDDs -- in fact it appears to scale linearly with the
size of the file. I'd like to better understand the mechanics underpinning
that behavior.
See the attached graph for a quick experiment that demonstrates this
behavior. In this experiment I use 40 threads to perform deletions of
previous written data in parallel. 10,000 files in every case and I scale
files by powers of two from 16MB to 16GB. Thus, the raw amount of data
deleted also expands by 2x every step. Frankly I expected deletion of a
file to be predominantly a metadata operation and not scale with the size of
the file, but perhaps I'm misunderstanding that.
The size of metadata is, after a small constant bit, proportional to the
number of extents. Which in turn depends on file size. With compression
off, extents may be as big as 1GB (which would make their number
negligible), but that's clearly not happening in your case.
There are tools which can show extent layout. I'd recommend python3-btrfs,
which includes /usr/share/doc/python3-btrfs/examples/show_file.py that
prints everything available about the list of extents.
Also there is a 4 byte CRC checksum per 4K block that needs to be
removed. Mount with "nodatasum" and run your tests to confirm this is
the cause.