Re: Balance loops: what we know so far

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/5/15 上午1:44, Zygo Blaxell wrote:
> On Thu, May 14, 2020 at 04:55:18PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2020/5/14 下午4:08, Qu Wenruo wrote:
>>>
>>>
>>> On 2020/5/13 下午8:21, Zygo Blaxell wrote:
>>>> On Wed, May 13, 2020 at 07:23:40PM +0800, Qu Wenruo wrote:
>>>>>
>>>>>
>>> [...]
>>>>
>>>> Kernel log:
>>>>
>>>> 	[96199.614869][ T9676] BTRFS info (device dm-0): balance: start -d
>>>> 	[96199.616086][ T9676] BTRFS info (device dm-0): relocating block group 4396679168000 flags data
>>>> 	[96199.782217][ T9676] BTRFS info (device dm-0): relocating block group 4395605426176 flags data
>>>> 	[96199.971118][ T9676] BTRFS info (device dm-0): relocating block group 4394531684352 flags data
>>>> 	[96220.858317][ T9676] BTRFS info (device dm-0): found 13 extents, loops 1, stage: move data extents
>>>> 	[...]
>>>> 	[121403.509718][ T9676] BTRFS info (device dm-0): found 13 extents, loops 131823, stage: update data pointers
>>>> 	(qemu) stop
>>>>
>>>> btrfs-image URL:
>>>>
>>>> 	http://www.furryterror.org/~zblaxell/tmp/.fsinqz/image.bin
>>>>
>>> The image shows several very strange result.
>>>
>>> For one, although we're relocating block group 4394531684352, the
>>> previous two block groups doesn't really get relocated.
>>>
>>> There are still extents there, all belongs to data reloc tree.
>>>
>>> Furthermore, the data reloc tree inode 620 should be evicted when
>>> previous block group relocation finishes.
>>>
>>> So I'm considering something went wrong in data reloc tree, would you
>>> please try the following diff?
>>> (Either on vanilla kernel or with my previous useless patch)
>>
>> Oh, my previous testing patch is doing wrong inode put for data reloc
>> tree, thus it's possible to lead to such situation.
>>
>> Thankfully the v2 for upstream gets the problem fixed.
>>
>> Thus it goes back to the original stage, still no faster way to
>> reproduce the problem...
> 
> Can we attack the problem by logging kernel activity?  Like can we
> log whenever we add or remove items from the data reloc tree, or
> why we don't?
> 
> I can get a filesystem with a single data block group and a single
> (visible) extent that loops, and somehow it's so easy to do that that I'm
> having problems making filesystems _not_ do it.  What can we do with that?

OK, finally got it reproduced, but it's way more complex than I thought.

First, we need to cancel some balance.
Then if the canceling timing is good, next balance will always hang.

Furthermore, if the kernel has CONFIG_BTRFS_DEBUG compiled, the kernel
would report leaking reloc tree, then followed by NULL pointer dereference.

Now since I can reproduce it reliably, I guess I don't need to bother
you every time I have some new things to try.

Thanks for your report!
Qu

> 
> What am I (and everyone else with this problem) doing that you are not?
> Usually that difference is "I'm running bees" but we're running out of
> bugs related to LOGICAL_INO and the dedupe ioctl, and I think other people
> are reporting the problem without running bees.  I'm also running balance
> cancels, which seem to increase the repro rate (though they might just
> be increasing the number of balances tested per day, and there could be
> just a fixed percentage of balances that loop).
> 
> I will see if I can build a standalone kvm image that generates balance
> loops on blank disks.  If I'm successful, you can download it and then
> run all the experiments you want.
> 
> I also want to see if reverting the extended reloc tree lifespan patch
> (d2311e698578 "btrfs: relocation: Delay reloc tree deletion after
> merge_reloc_roots") stops the looping on misc-next.  I found that
> reverting that patch stops the balance looping on 5.1.21 in an earlier
> experiment.  Maybe there are two bugs here, and we've already fixed one,
> but the symptom won't go away because some second bug has appeared.
> 
> 
>> Thanks,
>> Qu
>>>
>>> Thanks,
>>> Qu
>>>
>>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>>> index 9afc1a6928cf..ef9e18bab6f6 100644
>>> --- a/fs/btrfs/relocation.c
>>> +++ b/fs/btrfs/relocation.c
>>> @@ -3498,6 +3498,7 @@ struct inode *create_reloc_inode(struct
>>> btrfs_fs_info *fs_info,
>>>         BTRFS_I(inode)->index_cnt = group->start;
>>>
>>>         err = btrfs_orphan_add(trans, BTRFS_I(inode));
>>> +       WARN_ON(atomic_read(inode->i_count) != 1);
>>>  out:
>>>         btrfs_put_root(root);
>>>         btrfs_end_transaction(trans);
>>> @@ -3681,6 +3682,7 @@ int btrfs_relocate_block_group(struct
>>> btrfs_fs_info *fs_info, u64 group_start)
>>>  out:
>>>         if (err && rw)
>>>                 btrfs_dec_block_group_ro(rc->block_group);
>>> +       WARN_ON(atomic_read(inode->i_count) != 1);
>>>         iput(rc->data_inode);
>>>         btrfs_put_block_group(rc->block_group);
>>>         free_reloc_control(rc);
>>>
>>
> 
> 
> 

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux