On 2020/5/15 上午1:44, Zygo Blaxell wrote: > On Thu, May 14, 2020 at 04:55:18PM +0800, Qu Wenruo wrote: >> >> >> On 2020/5/14 下午4:08, Qu Wenruo wrote: >>> >>> >>> On 2020/5/13 下午8:21, Zygo Blaxell wrote: >>>> On Wed, May 13, 2020 at 07:23:40PM +0800, Qu Wenruo wrote: >>>>> >>>>> >>> [...] >>>> >>>> Kernel log: >>>> >>>> [96199.614869][ T9676] BTRFS info (device dm-0): balance: start -d >>>> [96199.616086][ T9676] BTRFS info (device dm-0): relocating block group 4396679168000 flags data >>>> [96199.782217][ T9676] BTRFS info (device dm-0): relocating block group 4395605426176 flags data >>>> [96199.971118][ T9676] BTRFS info (device dm-0): relocating block group 4394531684352 flags data >>>> [96220.858317][ T9676] BTRFS info (device dm-0): found 13 extents, loops 1, stage: move data extents >>>> [...] >>>> [121403.509718][ T9676] BTRFS info (device dm-0): found 13 extents, loops 131823, stage: update data pointers >>>> (qemu) stop >>>> >>>> btrfs-image URL: >>>> >>>> http://www.furryterror.org/~zblaxell/tmp/.fsinqz/image.bin >>>> >>> The image shows several very strange result. >>> >>> For one, although we're relocating block group 4394531684352, the >>> previous two block groups doesn't really get relocated. >>> >>> There are still extents there, all belongs to data reloc tree. >>> >>> Furthermore, the data reloc tree inode 620 should be evicted when >>> previous block group relocation finishes. >>> >>> So I'm considering something went wrong in data reloc tree, would you >>> please try the following diff? >>> (Either on vanilla kernel or with my previous useless patch) >> >> Oh, my previous testing patch is doing wrong inode put for data reloc >> tree, thus it's possible to lead to such situation. >> >> Thankfully the v2 for upstream gets the problem fixed. >> >> Thus it goes back to the original stage, still no faster way to >> reproduce the problem... > > Can we attack the problem by logging kernel activity? Like can we > log whenever we add or remove items from the data reloc tree, or > why we don't? > > I can get a filesystem with a single data block group and a single > (visible) extent that loops, and somehow it's so easy to do that that I'm > having problems making filesystems _not_ do it. What can we do with that? OK, finally got it reproduced, but it's way more complex than I thought. First, we need to cancel some balance. Then if the canceling timing is good, next balance will always hang. Furthermore, if the kernel has CONFIG_BTRFS_DEBUG compiled, the kernel would report leaking reloc tree, then followed by NULL pointer dereference. Now since I can reproduce it reliably, I guess I don't need to bother you every time I have some new things to try. Thanks for your report! Qu > > What am I (and everyone else with this problem) doing that you are not? > Usually that difference is "I'm running bees" but we're running out of > bugs related to LOGICAL_INO and the dedupe ioctl, and I think other people > are reporting the problem without running bees. I'm also running balance > cancels, which seem to increase the repro rate (though they might just > be increasing the number of balances tested per day, and there could be > just a fixed percentage of balances that loop). > > I will see if I can build a standalone kvm image that generates balance > loops on blank disks. If I'm successful, you can download it and then > run all the experiments you want. > > I also want to see if reverting the extended reloc tree lifespan patch > (d2311e698578 "btrfs: relocation: Delay reloc tree deletion after > merge_reloc_roots") stops the looping on misc-next. I found that > reverting that patch stops the balance looping on 5.1.21 in an earlier > experiment. Maybe there are two bugs here, and we've already fixed one, > but the symptom won't go away because some second bug has appeared. > > >> Thanks, >> Qu >>> >>> Thanks, >>> Qu >>> >>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c >>> index 9afc1a6928cf..ef9e18bab6f6 100644 >>> --- a/fs/btrfs/relocation.c >>> +++ b/fs/btrfs/relocation.c >>> @@ -3498,6 +3498,7 @@ struct inode *create_reloc_inode(struct >>> btrfs_fs_info *fs_info, >>> BTRFS_I(inode)->index_cnt = group->start; >>> >>> err = btrfs_orphan_add(trans, BTRFS_I(inode)); >>> + WARN_ON(atomic_read(inode->i_count) != 1); >>> out: >>> btrfs_put_root(root); >>> btrfs_end_transaction(trans); >>> @@ -3681,6 +3682,7 @@ int btrfs_relocate_block_group(struct >>> btrfs_fs_info *fs_info, u64 group_start) >>> out: >>> if (err && rw) >>> btrfs_dec_block_group_ro(rc->block_group); >>> + WARN_ON(atomic_read(inode->i_count) != 1); >>> iput(rc->data_inode); >>> btrfs_put_block_group(rc->block_group); >>> free_reloc_control(rc); >>> >> > > >
Attachment:
signature.asc
Description: OpenPGP digital signature
