Re: Balance + Ctrl-C = forced readonly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/5/20 3:13 PM, Qu Wenruo wrote:
> 
> 
> On 2020/7/5 下午8:49, Hans van Kranenburg wrote:
>> Hi,
>>
>> This is Linux kernel 5.7.6 (the Debian package, 5.7.6-1).
>>
>> So, I wanted to try out this new quicker balance interrupt thing, and
>> the result was that I could crash the fs at my very first try using it,
>> which was simply doing balance, and then pressing Ctrl-C.
>>
>> Recipe to reproduce: Start balance, wait a few seconds, then press
>> Ctrl-C. For me here, ~ 5 out of 10 times, it ends up exploding:
>>
>> -# btrfs balance start --full /btrfs/
>> ^C
>>
>> [41190.572977] BTRFS info (device xvdb): balance: start -d -m -s
>> [41190.573035] BTRFS info (device xvdb): relocating block group
>> 73001861120 flags metadata
>> [41205.409600] BTRFS info (device xvdb): found 12236 extents, stage:
>> move data extents
>> [41205.509316] BTRFS info (device xvdb): relocating block group
>> 71928119296 flags data
>> [41205.695319] BTRFS info (device xvdb): found 3 extents, stage: move
>> data extents
>> [41205.723009] BTRFS info (device xvdb): found 3 extents, stage: update
>> data pointers
>> [41205.750590] BTRFS info (device xvdb): relocating block group
>> 60922265600 flags metadata
>> [41208.183424] BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505:
>> errno=-4 unknown
> 
> -4 means -EINTR.

>From extent-tree.c:

  5495         /*
  5496          * So if we need to stop dropping the snapshot for
whatever reason we
  5497          * need to make sure to add it back to the dead root list
so that we
  5498          * keep trying to do the work later.  This also cleans up
roots if we
  5499          * don't have it in the radix (like when we recover after
a power fail
  5500          * or unmount) so we don't leak memory.
  5501          */
  5502         if (!for_reloc && !root_dropped)
  5503                 btrfs_add_dead_root(root);
  5504         if (err && err != -EAGAIN)
  5505                 btrfs_handle_fs_error(fs_info, err, NULL);
  5506         return err;
  5507 }

> It means during btrfs balance, signal could interrupt code running in
> kernel space??!!

What a wonderful world.

In the cases where the fs does not crash, it displays e.g.:

[ 1749.607057] BTRFS info (device xvdb): balance: start -d -m -s
[ 1749.607154] BTRFS info (device xvdb): relocating block group
69780635648 flags data
[ 1749.732598] BTRFS info (device xvdb): found 3 extents, stage: move
data extents
[ 1750.087368] BTRFS info (device xvdb): found 3 extents, stage: update
data pointers
[ 1750.109675] BTRFS info (device xvdb): relocating block group
60922265600 flags metadata
[ 1758.021840] BTRFS info (device xvdb): balance: ended with status: -4

...and it fairly quickly after pressing Ctrl-C exits 130 because SIGINT.
(128+2)

But when it goes wrong, then in between pressing Ctrl-C and the forced
readonly happening, the balance in kernel continues for some time (this
can be even multiple next block groups), until it hits the code path
seen above (in btrfs_drop_snapshot), and it's *always* at that line.

So, it seems that depending on what part of the kernel code is running
when the signal is sent, it's queued for being processed in that
(different) part of the running code?

> I thought when we fall into the balance ioctl, we're unable to
> receive/handle signal, as we are in the kernel space, while signal
> handling are all handled in user space.

System calls can be interrupted from user space, e.g. a large read that
goes to slow.

Previously, ^C on the btrfs balance execution would exit when the
current block group in progress was ended. So, in that case the signal
would also be picked up somewhere in the kernel.

> Or is there some config or out-of-tree patches make it possible? Is this
> specific to Debian kernels?
> At least I tried several times with upstream kernel, unable to reproduce
> it yet (maybe my fs is too small?)

So, it at least seems to depends on the moment when Ctrl-C is pressed.

This is a two-disk fs, where I reflinked a single file many tens of
thousands of time to generate quite some metadata. You might have to
need some more data or metadata to have enough change to hit Ctrl-C at
the right time, but I can only make guesses about that now.

-# btrfs fi show /btrfs/
Label: none  uuid: 4771ea11-6ec6-4c00-a5f5-58acb3233659
	Total devices 2 FS bytes used 5.76GiB
	devid    1 size 10.00GiB used 3.50GiB path /dev/xvdb
	devid    2 size 10.00GiB used 3.53GiB path /dev/xvdc

-# btrfs-search-metadata block_groups /btrfs
block group vaddr 78370570240 length 1073741824 flags DATA used
1072177152 used_pct 100
block group vaddr 79444312064 length 268435456 flags METADATA used
219824128 used_pct 82
block group vaddr 79712747520 length 33554432 flags SYSTEM used 16384
used_pct 0
block group vaddr 79746301952 length 1073741824 flags DATA used
1071206400 used_pct 100
block group vaddr 80820043776 length 268435456 flags METADATA used
214712320 used_pct 80
block group vaddr 81088479232 length 1073741824 flags DATA used
1073045504 used_pct 100
block group vaddr 82162221056 length 268435456 flags METADATA used
262979584 used_pct 98
block group vaddr 85920317440 length 1073741824 flags DATA used
1069948928 used_pct 100
block group vaddr 86994059264 length 1073741824 flags DATA used 15978496
used_pct 1
block group vaddr 90349502464 length 1073741824 flags DATA used
1073246208 used_pct 100
block group vaddr 91423244288 length 268435456 flags METADATA used
109608960 used_pct 41

> If it's config related, then we must re-consider a lot of error handling.

I don't know, but I don't think so.

> 
> Thanks,
> Qu
>> [41208.183450] BTRFS info (device xvdb): forced readonly
>> [41208.183469] BTRFS info (device xvdb): balance: ended with status: -4
>>
>> Boom, readonly FS.
>>
>> Hans
>>
> 

Hans



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux