Re: btrfs fi defrag hangs on small files, 100% CPU thread

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the answer.

I had the exact same issue as in the thread you've linked, and have some monitoring and graphs that showed that btrfs-cleaner did constant writes during 12 hours just after I upgraded to linux 5.16. Weirdly enough, the issue almost disappeared after I did a btrfs balance by filtering on 10% usage of data. But that's why I initially disabled autodefrag, what has lead to discovering this bug as I switched to manual defragmentation (which, in the end, makes more sense anyway with my setup).

I tried this patch, but sadly it doesn't help for the initial issue. I cannot say for the bug in the other thread, as the problem with btrfs-cleaner disappeared (I can still see some writes from it, but it so rare that I cannot say if it's normal or not).

Thanks,
Anthony

Le 17/01/2022 à 13:10, Filipe Manana a écrit :
On Sun, Jan 16, 2022 at 08:15:37PM +0100, Anthony Ruhier wrote:
Hi,
Since I upgraded from linux 5.15 to 5.16, `btrfs filesystem defrag -t128K`
hangs on small files (~1 byte) and triggers what it seems to be a loop in
the kernel. It results in one CPU thread running being used at 100%. I
cannot kill the process, and rebooting is blocked by btrfs.
It is a copy of the bughttps://bugzilla.kernel.org/show_bug.cgi?id=215498

Rebooting to linux 5.15 shows no issue. I have no issue to run a defrag on
bigger files (I filter out files smaller than 3.9KB).

I had a conversation on #btrfs on IRC, so here's what we debugged:

I can replicate the issue by copying a file impacted by this bug, by using
`cp --reflink=never`. I attached one of the impacted files to this bug,
named README.md.

Someone told me that it could be a bug due to the inline extent. So we tried
to check that.

filefrag shows that the file Readme.md is 1 inline extent. I tried to create
a new file with random text, of 18 bytes (slightly bigger than the other
file), that is also 1 inline extent. This file doesn't trigger the bug and
has no issue to be defragmented.

I tried to mount my system with `max_inline=0`, created a copy of README.md.
`filefrag` shows me that the new file is now 1 extent, not inline. This new
file also triggers the bug, so it doesn't seem to be due to the inline
extent.

Someone asked me to provide the output of a perf top when the defrag is
stuck:

     28.70%  [kernel]          [k] generic_bin_search
     14.90%  [kernel]          [k] free_extent_buffer
     13.17%  [kernel]          [k] btrfs_search_slot
     12.63%  [kernel]          [k] btrfs_root_node
      8.33%  [kernel]          [k] btrfs_get_64
      3.88%  [kernel]          [k] __down_read_common.llvm
      3.00%  [kernel]          [k] up_read
      2.63%  [kernel]          [k] read_block_for_search
      2.40%  [kernel]          [k] read_extent_buffer
      1.38%  [kernel]          [k] memset_erms
      1.11%  [kernel]          [k] find_extent_buffer
      0.69%  [kernel]          [k] kmem_cache_free
      0.69%  [kernel]          [k] memcpy_erms
      0.57%  [kernel]          [k] kmem_cache_alloc
      0.45%  [kernel]          [k] radix_tree_lookup

I can reproduce the bug on 2 different machines, running 2 different linux
distributions (Arch and Gentoo) with 2 different kernel configs.
This kernel is compiled with clang, the other with GCC.

Kernel version: 5.16.0
Mount options:
     Machine 1:
rw,noatime,compress-force=zstd:2,ssd,discard=async,space_cache=v2,autodefrag
     Machine 2: rw,noatime,compress-force=zstd:3,nossd,space_cache=v2

When the error happens, no message is shown in dmesg.
This is very likely the same issue as reported at this thread:

https://lore.kernel.org/linux-btrfs/YeVawBBE3r6hVhgs@xxxxxxxxxxxx/T/#ma1c8a9848c9b7e4edb471f7be184599d38e288bb

Are you able to test the patch posted there?

Thanks.

Thanks,
Anthony Ruhier

Attachment: OpenPGP_0xB00FBC7D08D231D9.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux