On 2020/7/5 下午10:53, Hans van Kranenburg wrote: > On 7/5/20 3:13 PM, Qu Wenruo wrote: >> >> >> On 2020/7/5 下午8:49, Hans van Kranenburg wrote: >>> Hi, >>> >>> This is Linux kernel 5.7.6 (the Debian package, 5.7.6-1). >>> >>> So, I wanted to try out this new quicker balance interrupt thing, and >>> the result was that I could crash the fs at my very first try using it, >>> which was simply doing balance, and then pressing Ctrl-C. >>> >>> Recipe to reproduce: Start balance, wait a few seconds, then press >>> Ctrl-C. For me here, ~ 5 out of 10 times, it ends up exploding: >>> >>> -# btrfs balance start --full /btrfs/ >>> ^C >>> >>> [41190.572977] BTRFS info (device xvdb): balance: start -d -m -s >>> [41190.573035] BTRFS info (device xvdb): relocating block group >>> 73001861120 flags metadata >>> [41205.409600] BTRFS info (device xvdb): found 12236 extents, stage: >>> move data extents >>> [41205.509316] BTRFS info (device xvdb): relocating block group >>> 71928119296 flags data >>> [41205.695319] BTRFS info (device xvdb): found 3 extents, stage: move >>> data extents >>> [41205.723009] BTRFS info (device xvdb): found 3 extents, stage: update >>> data pointers >>> [41205.750590] BTRFS info (device xvdb): relocating block group >>> 60922265600 flags metadata >>> [41208.183424] BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505: >>> errno=-4 unknown >> >> -4 means -EINTR. > > From extent-tree.c: > > 5495 /* > 5496 * So if we need to stop dropping the snapshot for > whatever reason we > 5497 * need to make sure to add it back to the dead root list > so that we > 5498 * keep trying to do the work later. This also cleans up > roots if we > 5499 * don't have it in the radix (like when we recover after > a power fail > 5500 * or unmount) so we don't leak memory. > 5501 */ > 5502 if (!for_reloc && !root_dropped) > 5503 btrfs_add_dead_root(root); > 5504 if (err && err != -EAGAIN) > 5505 btrfs_handle_fs_error(fs_info, err, NULL); > 5506 return err; > 5507 } > >> It means during btrfs balance, signal could interrupt code running in >> kernel space??!! > > What a wonderful world. > > In the cases where the fs does not crash, it displays e.g.: > > [ 1749.607057] BTRFS info (device xvdb): balance: start -d -m -s > [ 1749.607154] BTRFS info (device xvdb): relocating block group > 69780635648 flags data > [ 1749.732598] BTRFS info (device xvdb): found 3 extents, stage: move > data extents > [ 1750.087368] BTRFS info (device xvdb): found 3 extents, stage: update > data pointers > [ 1750.109675] BTRFS info (device xvdb): relocating block group > 60922265600 flags metadata > [ 1758.021840] BTRFS info (device xvdb): balance: ended with status: -4 > > ...and it fairly quickly after pressing Ctrl-C exits 130 because SIGINT. > (128+2) I could get this reproduced now, with more filled fs. Although I haven't yet reproduced the abort transaction, it should already be a valid bug. As at this case, next balance run can cause a kernel warning due to the reloc tree not yet cleaned up. This really exposed a new set of problems. Thanks for the report, now it's time to debug it. Thanks, Qu > > But when it goes wrong, then in between pressing Ctrl-C and the forced > readonly happening, the balance in kernel continues for some time (this > can be even multiple next block groups), until it hits the code path > seen above (in btrfs_drop_snapshot), and it's *always* at that line. > > So, it seems that depending on what part of the kernel code is running > when the signal is sent, it's queued for being processed in that > (different) part of the running code? > >> I thought when we fall into the balance ioctl, we're unable to >> receive/handle signal, as we are in the kernel space, while signal >> handling are all handled in user space. > > System calls can be interrupted from user space, e.g. a large read that > goes to slow. > > Previously, ^C on the btrfs balance execution would exit when the > current block group in progress was ended. So, in that case the signal > would also be picked up somewhere in the kernel. > >> Or is there some config or out-of-tree patches make it possible? Is this >> specific to Debian kernels? >> At least I tried several times with upstream kernel, unable to reproduce >> it yet (maybe my fs is too small?) > > So, it at least seems to depends on the moment when Ctrl-C is pressed. > > This is a two-disk fs, where I reflinked a single file many tens of > thousands of time to generate quite some metadata. You might have to > need some more data or metadata to have enough change to hit Ctrl-C at > the right time, but I can only make guesses about that now. > > -# btrfs fi show /btrfs/ > Label: none uuid: 4771ea11-6ec6-4c00-a5f5-58acb3233659 > Total devices 2 FS bytes used 5.76GiB > devid 1 size 10.00GiB used 3.50GiB path /dev/xvdb > devid 2 size 10.00GiB used 3.53GiB path /dev/xvdc > > -# btrfs-search-metadata block_groups /btrfs > block group vaddr 78370570240 length 1073741824 flags DATA used > 1072177152 used_pct 100 > block group vaddr 79444312064 length 268435456 flags METADATA used > 219824128 used_pct 82 > block group vaddr 79712747520 length 33554432 flags SYSTEM used 16384 > used_pct 0 > block group vaddr 79746301952 length 1073741824 flags DATA used > 1071206400 used_pct 100 > block group vaddr 80820043776 length 268435456 flags METADATA used > 214712320 used_pct 80 > block group vaddr 81088479232 length 1073741824 flags DATA used > 1073045504 used_pct 100 > block group vaddr 82162221056 length 268435456 flags METADATA used > 262979584 used_pct 98 > block group vaddr 85920317440 length 1073741824 flags DATA used > 1069948928 used_pct 100 > block group vaddr 86994059264 length 1073741824 flags DATA used 15978496 > used_pct 1 > block group vaddr 90349502464 length 1073741824 flags DATA used > 1073246208 used_pct 100 > block group vaddr 91423244288 length 268435456 flags METADATA used > 109608960 used_pct 41 > >> If it's config related, then we must re-consider a lot of error handling. > > I don't know, but I don't think so. > >> >> Thanks, >> Qu >>> [41208.183450] BTRFS info (device xvdb): forced readonly >>> [41208.183469] BTRFS info (device xvdb): balance: ended with status: -4 >>> >>> Boom, readonly FS. >>> >>> Hans >>> >> > > Hans >
Attachment:
signature.asc
Description: OpenPGP digital signature
