Hi folks,We have a few engineers looking through BTRFS code presently for answers to this, but I was interested to get input from the experts in parallel to hopefully understand this issue quickly.
We find that removes of large amounts of data can take a significant amount of time in BTRFS on HDDs -- in fact it appears to scale linearly with the size of the file. I'd like to better understand the mechanics underpinning that behavior.
See the attached graph for a quick experiment that demonstrates this behavior. In this experiment I use 40 threads to perform deletions of previous written data in parallel. 10,000 files in every case and I scale files by powers of two from 16MB to 16GB. Thus, the raw amount of data deleted also expands by 2x every step. Frankly I expected deletion of a file to be predominantly a metadata operation and not scale with the size of the file, but perhaps I'm misunderstanding that. While the overall speed of deletion is relatively fast (hovering between 30GB/s and 50GB/s) compared with raw ingest of data to the disk array we're using (in our case ~1.5GB/s) it can still take a very long time to delete data from the drives and removes hang completely until that data is deleted, unlike in some other filesystems. They also compete aggressively with foreground I/O due to the intense seeking on the HDDs.
This is with an older version of BTRFS (4.12.14-lp150.12.73-default) so if algorithms have changed substantially such that deletion rate is no longer tied to file size in newer versions please just say so and I'll be glad to take a look at that change and try with a newer version.
FWIW, we are using the v2 free space cache. If any other information is relevant to this just let me know and I'll be glad to share.
Thank you for any time people can spare to help us understand this better! ellis
Attachment:
parallel_delete_speed.png
Description: PNG image
