Re: Balance + Ctrl-C = forced readonly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/7/5 下午10:53, Hans van Kranenburg wrote:
> On 7/5/20 3:13 PM, Qu Wenruo wrote:
>>
>>
>> On 2020/7/5 下午8:49, Hans van Kranenburg wrote:
>>> Hi,
>>>
>>> This is Linux kernel 5.7.6 (the Debian package, 5.7.6-1).
>>>
>>> So, I wanted to try out this new quicker balance interrupt thing, and
>>> the result was that I could crash the fs at my very first try using it,
>>> which was simply doing balance, and then pressing Ctrl-C.
>>>
>>> Recipe to reproduce: Start balance, wait a few seconds, then press
>>> Ctrl-C. For me here, ~ 5 out of 10 times, it ends up exploding:
>>>
>>> -# btrfs balance start --full /btrfs/
>>> ^C
>>>
>>> [41190.572977] BTRFS info (device xvdb): balance: start -d -m -s
>>> [41190.573035] BTRFS info (device xvdb): relocating block group
>>> 73001861120 flags metadata
>>> [41205.409600] BTRFS info (device xvdb): found 12236 extents, stage:
>>> move data extents
>>> [41205.509316] BTRFS info (device xvdb): relocating block group
>>> 71928119296 flags data
>>> [41205.695319] BTRFS info (device xvdb): found 3 extents, stage: move
>>> data extents
>>> [41205.723009] BTRFS info (device xvdb): found 3 extents, stage: update
>>> data pointers
>>> [41205.750590] BTRFS info (device xvdb): relocating block group
>>> 60922265600 flags metadata
>>> [41208.183424] BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505:
>>> errno=-4 unknown
>>
>> -4 means -EINTR.
> 
> From extent-tree.c:
> 
>   5495         /*
>   5496          * So if we need to stop dropping the snapshot for
> whatever reason we
>   5497          * need to make sure to add it back to the dead root list
> so that we
>   5498          * keep trying to do the work later.  This also cleans up
> roots if we
>   5499          * don't have it in the radix (like when we recover after
> a power fail
>   5500          * or unmount) so we don't leak memory.
>   5501          */
>   5502         if (!for_reloc && !root_dropped)
>   5503                 btrfs_add_dead_root(root);
>   5504         if (err && err != -EAGAIN)
>   5505                 btrfs_handle_fs_error(fs_info, err, NULL);
>   5506         return err;
>   5507 }
> 
>> It means during btrfs balance, signal could interrupt code running in
>> kernel space??!!
> 
> What a wonderful world.
> 
> In the cases where the fs does not crash, it displays e.g.:
> 
> [ 1749.607057] BTRFS info (device xvdb): balance: start -d -m -s
> [ 1749.607154] BTRFS info (device xvdb): relocating block group
> 69780635648 flags data
> [ 1749.732598] BTRFS info (device xvdb): found 3 extents, stage: move
> data extents
> [ 1750.087368] BTRFS info (device xvdb): found 3 extents, stage: update
> data pointers
> [ 1750.109675] BTRFS info (device xvdb): relocating block group
> 60922265600 flags metadata
> [ 1758.021840] BTRFS info (device xvdb): balance: ended with status: -4
> 
> ...and it fairly quickly after pressing Ctrl-C exits 130 because SIGINT.
> (128+2)

I could get this reproduced now, with more filled fs.

Although I haven't yet reproduced the abort transaction, it should
already be a valid bug.

As at this case, next balance run can cause a kernel warning due to the
reloc tree not yet cleaned up.

This really exposed a new set of problems.

Thanks for the report, now it's time to debug it.

Thanks,
Qu

> 
> But when it goes wrong, then in between pressing Ctrl-C and the forced
> readonly happening, the balance in kernel continues for some time (this
> can be even multiple next block groups), until it hits the code path
> seen above (in btrfs_drop_snapshot), and it's *always* at that line.
> 
> So, it seems that depending on what part of the kernel code is running
> when the signal is sent, it's queued for being processed in that
> (different) part of the running code?
> 
>> I thought when we fall into the balance ioctl, we're unable to
>> receive/handle signal, as we are in the kernel space, while signal
>> handling are all handled in user space.
> 
> System calls can be interrupted from user space, e.g. a large read that
> goes to slow.
> 
> Previously, ^C on the btrfs balance execution would exit when the
> current block group in progress was ended. So, in that case the signal
> would also be picked up somewhere in the kernel.
> 
>> Or is there some config or out-of-tree patches make it possible? Is this
>> specific to Debian kernels?
>> At least I tried several times with upstream kernel, unable to reproduce
>> it yet (maybe my fs is too small?)
> 
> So, it at least seems to depends on the moment when Ctrl-C is pressed.
> 
> This is a two-disk fs, where I reflinked a single file many tens of
> thousands of time to generate quite some metadata. You might have to
> need some more data or metadata to have enough change to hit Ctrl-C at
> the right time, but I can only make guesses about that now.
> 
> -# btrfs fi show /btrfs/
> Label: none  uuid: 4771ea11-6ec6-4c00-a5f5-58acb3233659
> 	Total devices 2 FS bytes used 5.76GiB
> 	devid    1 size 10.00GiB used 3.50GiB path /dev/xvdb
> 	devid    2 size 10.00GiB used 3.53GiB path /dev/xvdc
> 
> -# btrfs-search-metadata block_groups /btrfs
> block group vaddr 78370570240 length 1073741824 flags DATA used
> 1072177152 used_pct 100
> block group vaddr 79444312064 length 268435456 flags METADATA used
> 219824128 used_pct 82
> block group vaddr 79712747520 length 33554432 flags SYSTEM used 16384
> used_pct 0
> block group vaddr 79746301952 length 1073741824 flags DATA used
> 1071206400 used_pct 100
> block group vaddr 80820043776 length 268435456 flags METADATA used
> 214712320 used_pct 80
> block group vaddr 81088479232 length 1073741824 flags DATA used
> 1073045504 used_pct 100
> block group vaddr 82162221056 length 268435456 flags METADATA used
> 262979584 used_pct 98
> block group vaddr 85920317440 length 1073741824 flags DATA used
> 1069948928 used_pct 100
> block group vaddr 86994059264 length 1073741824 flags DATA used 15978496
> used_pct 1
> block group vaddr 90349502464 length 1073741824 flags DATA used
> 1073246208 used_pct 100
> block group vaddr 91423244288 length 268435456 flags METADATA used
> 109608960 used_pct 41
> 
>> If it's config related, then we must re-consider a lot of error handling.
> 
> I don't know, but I don't think so.
> 
>>
>> Thanks,
>> Qu
>>> [41208.183450] BTRFS info (device xvdb): forced readonly
>>> [41208.183469] BTRFS info (device xvdb): balance: ended with status: -4
>>>
>>> Boom, readonly FS.
>>>
>>> Hans
>>>
>>
> 
> Hans
> 

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux