On 7/5/20 3:13 PM, Qu Wenruo wrote: > > > On 2020/7/5 下午8:49, Hans van Kranenburg wrote: >> Hi, >> >> This is Linux kernel 5.7.6 (the Debian package, 5.7.6-1). >> >> So, I wanted to try out this new quicker balance interrupt thing, and >> the result was that I could crash the fs at my very first try using it, >> which was simply doing balance, and then pressing Ctrl-C. >> >> Recipe to reproduce: Start balance, wait a few seconds, then press >> Ctrl-C. For me here, ~ 5 out of 10 times, it ends up exploding: >> >> -# btrfs balance start --full /btrfs/ >> ^C >> >> [41190.572977] BTRFS info (device xvdb): balance: start -d -m -s >> [41190.573035] BTRFS info (device xvdb): relocating block group >> 73001861120 flags metadata >> [41205.409600] BTRFS info (device xvdb): found 12236 extents, stage: >> move data extents >> [41205.509316] BTRFS info (device xvdb): relocating block group >> 71928119296 flags data >> [41205.695319] BTRFS info (device xvdb): found 3 extents, stage: move >> data extents >> [41205.723009] BTRFS info (device xvdb): found 3 extents, stage: update >> data pointers >> [41205.750590] BTRFS info (device xvdb): relocating block group >> 60922265600 flags metadata >> [41208.183424] BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505: >> errno=-4 unknown > > -4 means -EINTR. >From extent-tree.c: 5495 /* 5496 * So if we need to stop dropping the snapshot for whatever reason we 5497 * need to make sure to add it back to the dead root list so that we 5498 * keep trying to do the work later. This also cleans up roots if we 5499 * don't have it in the radix (like when we recover after a power fail 5500 * or unmount) so we don't leak memory. 5501 */ 5502 if (!for_reloc && !root_dropped) 5503 btrfs_add_dead_root(root); 5504 if (err && err != -EAGAIN) 5505 btrfs_handle_fs_error(fs_info, err, NULL); 5506 return err; 5507 } > It means during btrfs balance, signal could interrupt code running in > kernel space??!! What a wonderful world. In the cases where the fs does not crash, it displays e.g.: [ 1749.607057] BTRFS info (device xvdb): balance: start -d -m -s [ 1749.607154] BTRFS info (device xvdb): relocating block group 69780635648 flags data [ 1749.732598] BTRFS info (device xvdb): found 3 extents, stage: move data extents [ 1750.087368] BTRFS info (device xvdb): found 3 extents, stage: update data pointers [ 1750.109675] BTRFS info (device xvdb): relocating block group 60922265600 flags metadata [ 1758.021840] BTRFS info (device xvdb): balance: ended with status: -4 ...and it fairly quickly after pressing Ctrl-C exits 130 because SIGINT. (128+2) But when it goes wrong, then in between pressing Ctrl-C and the forced readonly happening, the balance in kernel continues for some time (this can be even multiple next block groups), until it hits the code path seen above (in btrfs_drop_snapshot), and it's *always* at that line. So, it seems that depending on what part of the kernel code is running when the signal is sent, it's queued for being processed in that (different) part of the running code? > I thought when we fall into the balance ioctl, we're unable to > receive/handle signal, as we are in the kernel space, while signal > handling are all handled in user space. System calls can be interrupted from user space, e.g. a large read that goes to slow. Previously, ^C on the btrfs balance execution would exit when the current block group in progress was ended. So, in that case the signal would also be picked up somewhere in the kernel. > Or is there some config or out-of-tree patches make it possible? Is this > specific to Debian kernels? > At least I tried several times with upstream kernel, unable to reproduce > it yet (maybe my fs is too small?) So, it at least seems to depends on the moment when Ctrl-C is pressed. This is a two-disk fs, where I reflinked a single file many tens of thousands of time to generate quite some metadata. You might have to need some more data or metadata to have enough change to hit Ctrl-C at the right time, but I can only make guesses about that now. -# btrfs fi show /btrfs/ Label: none uuid: 4771ea11-6ec6-4c00-a5f5-58acb3233659 Total devices 2 FS bytes used 5.76GiB devid 1 size 10.00GiB used 3.50GiB path /dev/xvdb devid 2 size 10.00GiB used 3.53GiB path /dev/xvdc -# btrfs-search-metadata block_groups /btrfs block group vaddr 78370570240 length 1073741824 flags DATA used 1072177152 used_pct 100 block group vaddr 79444312064 length 268435456 flags METADATA used 219824128 used_pct 82 block group vaddr 79712747520 length 33554432 flags SYSTEM used 16384 used_pct 0 block group vaddr 79746301952 length 1073741824 flags DATA used 1071206400 used_pct 100 block group vaddr 80820043776 length 268435456 flags METADATA used 214712320 used_pct 80 block group vaddr 81088479232 length 1073741824 flags DATA used 1073045504 used_pct 100 block group vaddr 82162221056 length 268435456 flags METADATA used 262979584 used_pct 98 block group vaddr 85920317440 length 1073741824 flags DATA used 1069948928 used_pct 100 block group vaddr 86994059264 length 1073741824 flags DATA used 15978496 used_pct 1 block group vaddr 90349502464 length 1073741824 flags DATA used 1073246208 used_pct 100 block group vaddr 91423244288 length 268435456 flags METADATA used 109608960 used_pct 41 > If it's config related, then we must re-consider a lot of error handling. I don't know, but I don't think so. > > Thanks, > Qu >> [41208.183450] BTRFS info (device xvdb): forced readonly >> [41208.183469] BTRFS info (device xvdb): balance: ended with status: -4 >> >> Boom, readonly FS. >> >> Hans >> > Hans
