Re: [PATCH] btrfs: transaction: Avoid deadlock due to bad initialization timing of fs_info::journal_info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 27, 2020 at 02:50:14PM +0800, Qu Wenruo wrote:
> [BUG]
> One run of btrfs/063 triggered the following lockdep:
>   ============================================
>   WARNING: possible recursive locking detected
>   5.6.0-rc7-custom+ #48 Not tainted
>   --------------------------------------------
>   kworker/u24:0/7 is trying to acquire lock:
>   ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs]
> 
>   but task is already holding lock:
>   ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs]
> 
>   other info that might help us debug this:
>    Possible unsafe locking scenario:
> 
>          CPU0
>          ----
>     lock(sb_internal#2);
>     lock(sb_internal#2);
> 
>    *** DEADLOCK ***
> 
>    May be due to missing lock nesting notation
> 
>   4 locks held by kworker/u24:0/7:
>    #0: ffff88817b495948 ((wq_completion)btrfs-endio-write){+.+.}, at: process_one_work+0x557/0xb80
>    #1: ffff888189ea7db8 ((work_completion)(&work->normal_work)){+.+.}, at: process_one_work+0x557/0xb80
>    #2: ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs]
>    #3: ffff888174ca4da8 (&fs_info->reloc_mutex){+.+.}, at: btrfs_record_root_in_trans+0x83/0xd0 [btrfs]
> 
>   stack backtrace:
>   CPU: 0 PID: 7 Comm: kworker/u24:0 Not tainted 5.6.0-rc7-custom+ #48
>   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
>   Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
>   Call Trace:
>    dump_stack+0xc2/0x11a
>    __lock_acquire.cold+0xce/0x214
>    lock_acquire+0xe6/0x210
>    __sb_start_write+0x14e/0x290
>    start_transaction+0x66c/0x890 [btrfs]
>    btrfs_join_transaction+0x1d/0x20 [btrfs]
>    find_free_extent+0x1504/0x1a50 [btrfs]
>    btrfs_reserve_extent+0xd5/0x1f0 [btrfs]
>    btrfs_alloc_tree_block+0x1ac/0x570 [btrfs]
>    btrfs_copy_root+0x213/0x580 [btrfs]
>    create_reloc_root+0x3bd/0x470 [btrfs]
>    btrfs_init_reloc_root+0x2d2/0x310 [btrfs]
>    record_root_in_trans+0x191/0x1d0 [btrfs]
>    btrfs_record_root_in_trans+0x90/0xd0 [btrfs]
>    start_transaction+0x16e/0x890 [btrfs]
>    btrfs_join_transaction+0x1d/0x20 [btrfs]
>    btrfs_finish_ordered_io+0x55d/0xcd0 [btrfs]
>    finish_ordered_fn+0x15/0x20 [btrfs]
>    btrfs_work_helper+0x116/0x9a0 [btrfs]
>    process_one_work+0x632/0xb80
>    worker_thread+0x80/0x690
>    kthread+0x1a3/0x1f0
>    ret_from_fork+0x27/0x50
> 
> It's pretty hard to reproduce, only one hit so far.

I haven't found the report in any of my logs, so yeah.

> [CAUSE]
> This is because we're calling btrfs_join_transaction() without re-using
> the current running one:
> 
> btrfs_finish_ordered_io()
> |- btrfs_join_transaction()		<<< Call #1
>    |- btrfs_record_root_in_trans()
>       |- btrfs_reserve_extent()
> 	 |- btrfs_join_transaction()	<<< Call #2
> 
> Normally such btrfs_join_transaction() call should re-use the existing
> one, without trying to re-start a transaction.
> 
> But the problem is, in btrfs_join_transaction() call #1, we call
> btrfs_record_root_in_trans() before initializing current::journal_info.
> 
> And in btrfs_join_transaction() call #2, we're relying on
> current::journal_info to avoid such deadlock.
> 
> [FIX]
> Call btrfs_record_root_in_trans() after we have initialized
> current::journal_info.
> 
> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>

Added to misc-next, thanks.



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux