Re: [PATCH][RFC] btrfs: fix relocation panic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 25, 2019 at 05:45:51PM +0000, Filipe Manana wrote:
> On Mon, Feb 25, 2019 at 4:16 PM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:
> >
> > We've been seeing the following sporadically throughout our fleet
> >
> > panic: kernel BUG at fs/btrfs/relocation.c:4584!
> > netversion: 5.0-0
> > Backtrace:
> >  #0 [ffffc90003adb880] machine_kexec at ffffffff81041da8
> >  #1 [ffffc90003adb8c8] __crash_kexec at ffffffff8110396c
> >  #2 [ffffc90003adb988] crash_kexec at ffffffff811048ad
> >  #3 [ffffc90003adb9a0] oops_end at ffffffff8101c19a
> >  #4 [ffffc90003adb9c0] do_trap at ffffffff81019114
> >  #5 [ffffc90003adba00] do_error_trap at ffffffff810195d0
> >  #6 [ffffc90003adbab0] invalid_op at ffffffff81a00a9b
> >     [exception RIP: btrfs_reloc_cow_block+692]
> >     RIP: ffffffff8143b614  RSP: ffffc90003adbb68  RFLAGS: 00010246
> >     RAX: fffffffffffffff7  RBX: ffff8806b9c32000  RCX: ffff8806aad00690
> >     RDX: ffff880850b295e0  RSI: ffff8806b9c32000  RDI: ffff88084f205bd0
> >     RBP: ffff880849415000   R8: ffffc90003adbbe0   R9: ffff88085ac90000
> >     R10: ffff8805f7369140  R11: 0000000000000000  R12: ffff880850b295e0
> >     R13: ffff88084f205bd0  R14: 0000000000000000  R15: 0000000000000000
> >     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >  #7 [ffffc90003adbbb0] __btrfs_cow_block at ffffffff813bf1cd
> >  #8 [ffffc90003adbc28] btrfs_cow_block at ffffffff813bf4b3
> >  #9 [ffffc90003adbc78] btrfs_search_slot at ffffffff813c2e6c
> >
> > The way relocation moves data extents is by creating a reloc inode and
> > preallocating extents in this inode and then copying the data into these
> > preallocated extents.  Once we've done this for all of our extents,
> > we'll write out these dirty pages, which marks the extent written, and
> > goes into btrfs_reloc_cow_block().  From here we get our current
> > reloc_control, which _should_ match the reloc_control for the current
> > block group we're relocating.
> >
> > However if we get an ENOSPC in this path at some point we'll bail out,
> > never initiating writeback on this inode.  Not a huge deal, unless we
> > happen to be doing relocation on a different block group, and this block
> > group is now rc->stage == UPDATE_DATA_PTRS.  This trips the BUG_ON() in
> > btrfs_reloc_cow_block(), because we expect to be done modifying the data
> > inode.  We are in fact done modifying the metadata for the data inode
> > we're currently using, but not the one from the failed block group, and
> > thus we BUG_ON().
> >
> > Fix this by writing out the reloc data inode always, and then breaking
> > out of the loop after that point to keep from tripping this BUG_ON()
> > later.
> >
> > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
> 
> Perhaps it's worth noting that this happens when writeback finishes
> for extents from the previous group, when we are
> at btrfs_finish_ordered_io() which updates the data reloc tree (inode
> item, drops/adds extent items, etc).
> 
> Anyway, looks good to me as it is.

I'll add it to the changelog, and add the patch to misc-next, thanks.



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux