Thanks for the answer.I had the exact same issue as in the thread you've linked, and have some monitoring and graphs that showed that btrfs-cleaner did constant writes during 12 hours just after I upgraded to linux 5.16. Weirdly enough, the issue almost disappeared after I did a btrfs balance by filtering on 10% usage of data. But that's why I initially disabled autodefrag, what has lead to discovering this bug as I switched to manual defragmentation (which, in the end, makes more sense anyway with my setup).
I tried this patch, but sadly it doesn't help for the initial issue. I cannot say for the bug in the other thread, as the problem with btrfs-cleaner disappeared (I can still see some writes from it, but it so rare that I cannot say if it's normal or not).
Thanks, Anthony Le 17/01/2022 à 13:10, Filipe Manana a écrit :
On Sun, Jan 16, 2022 at 08:15:37PM +0100, Anthony Ruhier wrote:Hi, Since I upgraded from linux 5.15 to 5.16, `btrfs filesystem defrag -t128K` hangs on small files (~1 byte) and triggers what it seems to be a loop in the kernel. It results in one CPU thread running being used at 100%. I cannot kill the process, and rebooting is blocked by btrfs. It is a copy of the bughttps://bugzilla.kernel.org/show_bug.cgi?id=215498 Rebooting to linux 5.15 shows no issue. I have no issue to run a defrag on bigger files (I filter out files smaller than 3.9KB). I had a conversation on #btrfs on IRC, so here's what we debugged: I can replicate the issue by copying a file impacted by this bug, by using `cp --reflink=never`. I attached one of the impacted files to this bug, named README.md. Someone told me that it could be a bug due to the inline extent. So we tried to check that. filefrag shows that the file Readme.md is 1 inline extent. I tried to create a new file with random text, of 18 bytes (slightly bigger than the other file), that is also 1 inline extent. This file doesn't trigger the bug and has no issue to be defragmented. I tried to mount my system with `max_inline=0`, created a copy of README.md. `filefrag` shows me that the new file is now 1 extent, not inline. This new file also triggers the bug, so it doesn't seem to be due to the inline extent. Someone asked me to provide the output of a perf top when the defrag is stuck: 28.70% [kernel] [k] generic_bin_search 14.90% [kernel] [k] free_extent_buffer 13.17% [kernel] [k] btrfs_search_slot 12.63% [kernel] [k] btrfs_root_node 8.33% [kernel] [k] btrfs_get_64 3.88% [kernel] [k] __down_read_common.llvm 3.00% [kernel] [k] up_read 2.63% [kernel] [k] read_block_for_search 2.40% [kernel] [k] read_extent_buffer 1.38% [kernel] [k] memset_erms 1.11% [kernel] [k] find_extent_buffer 0.69% [kernel] [k] kmem_cache_free 0.69% [kernel] [k] memcpy_erms 0.57% [kernel] [k] kmem_cache_alloc 0.45% [kernel] [k] radix_tree_lookup I can reproduce the bug on 2 different machines, running 2 different linux distributions (Arch and Gentoo) with 2 different kernel configs. This kernel is compiled with clang, the other with GCC. Kernel version: 5.16.0 Mount options: Machine 1: rw,noatime,compress-force=zstd:2,ssd,discard=async,space_cache=v2,autodefrag Machine 2: rw,noatime,compress-force=zstd:3,nossd,space_cache=v2 When the error happens, no message is shown in dmesg.This is very likely the same issue as reported at this thread: https://lore.kernel.org/linux-btrfs/YeVawBBE3r6hVhgs@xxxxxxxxxxxx/T/#ma1c8a9848c9b7e4edb471f7be184599d38e288bb Are you able to test the patch posted there? Thanks.Thanks, Anthony Ruhier
Attachment:
OpenPGP_0xB00FBC7D08D231D9.asc
Description: OpenPGP public key
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature
