On Thu, Jun 13, 2019 at 05:30:34PM +0800, Qu Wenruo wrote:
> [BUG]
> Lockdep will report the following circular locking dependency:
>
> WARNING: possible circular locking dependency detected
> 5.2.0-rc2-custom #24 Tainted: G O
> ------------------------------------------------------
> btrfs/8631 is trying to acquire lock:
> 000000002536438c (&fs_info->qgroup_ioctl_lock#2){+.+.}, at: btrfs_qgroup_inherit+0x40/0x620 [btrfs]
>
> but task is already holding lock:
> 000000003d52cc23 (&fs_info->tree_log_mutex){+.+.}, at: create_pending_snapshot+0x8b6/0xe60 [btrfs]
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (&fs_info->tree_log_mutex){+.+.}:
> __mutex_lock+0x76/0x940
> mutex_lock_nested+0x1b/0x20
> btrfs_commit_transaction+0x475/0xa00 [btrfs]
> btrfs_commit_super+0x71/0x80 [btrfs]
> close_ctree+0x2bd/0x320 [btrfs]
> btrfs_put_super+0x15/0x20 [btrfs]
> generic_shutdown_super+0x72/0x110
> kill_anon_super+0x18/0x30
> btrfs_kill_super+0x16/0xa0 [btrfs]
> deactivate_locked_super+0x3a/0x80
> deactivate_super+0x51/0x60
> cleanup_mnt+0x3f/0x80
> __cleanup_mnt+0x12/0x20
> task_work_run+0x94/0xb0
> exit_to_usermode_loop+0xd8/0xe0
> do_syscall_64+0x210/0x240
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #1 (&fs_info->reloc_mutex){+.+.}:
> __mutex_lock+0x76/0x940
> mutex_lock_nested+0x1b/0x20
> btrfs_commit_transaction+0x40d/0xa00 [btrfs]
> btrfs_quota_enable+0x2da/0x730 [btrfs]
> btrfs_ioctl+0x2691/0x2b40 [btrfs]
> do_vfs_ioctl+0xa9/0x6d0
> ksys_ioctl+0x67/0x90
> __x64_sys_ioctl+0x1a/0x20
> do_syscall_64+0x65/0x240
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #0 (&fs_info->qgroup_ioctl_lock#2){+.+.}:
> lock_acquire+0xa7/0x190
> __mutex_lock+0x76/0x940
> mutex_lock_nested+0x1b/0x20
> btrfs_qgroup_inherit+0x40/0x620 [btrfs]
> create_pending_snapshot+0x9d7/0xe60 [btrfs]
> create_pending_snapshots+0x94/0xb0 [btrfs]
> btrfs_commit_transaction+0x415/0xa00 [btrfs]
> btrfs_mksubvol+0x496/0x4e0 [btrfs]
> btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs]
> btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs]
> btrfs_ioctl+0xa90/0x2b40 [btrfs]
> do_vfs_ioctl+0xa9/0x6d0
> ksys_ioctl+0x67/0x90
> __x64_sys_ioctl+0x1a/0x20
> do_syscall_64+0x65/0x240
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> other info that might help us debug this:
>
> Chain exists of:
> &fs_info->qgroup_ioctl_lock#2 --> &fs_info->reloc_mutex --> &fs_info->tree_log_mutex
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&fs_info->tree_log_mutex);
> lock(&fs_info->reloc_mutex);
> lock(&fs_info->tree_log_mutex);
> lock(&fs_info->qgroup_ioctl_lock#2);
>
> *** DEADLOCK ***
>
> 6 locks held by btrfs/8631:
> #0: 00000000ed8f23f6 (sb_writers#12){.+.+}, at: mnt_want_write_file+0x28/0x60
> #1: 000000009fb1597a (&type->i_mutex_dir_key#10/1){+.+.}, at: btrfs_mksubvol+0x70/0x4e0 [btrfs]
> #2: 0000000088c5ad88 (&fs_info->subvol_sem){++++}, at: btrfs_mksubvol+0x128/0x4e0 [btrfs]
> #3: 000000009606fc3e (sb_internal#2){.+.+}, at: start_transaction+0x37a/0x520 [btrfs]
> #4: 00000000f82bbdf5 (&fs_info->reloc_mutex){+.+.}, at: btrfs_commit_transaction+0x40d/0xa00 [btrfs]
> #5: 000000003d52cc23 (&fs_info->tree_log_mutex){+.+.}, at: create_pending_snapshot+0x8b6/0xe60 [btrfs]
>
> [CAUSE]
> Due to the delayed subvolume creation, we need to call
> btrfs_qgroup_inherit() inside commit transaction code, with a lot of
> other mutex hold.
> This hell of lock chain can lead to above problem.
>
> [FIX]
> On the other hand, we don't really need to hold qgroup_ioctl_lock if
> we're in the context of create_pending_snapshot().
> As in that context, we're the only one being able to modify qgroup.
>
> All other qgroup functions which needs qgroup_ioctl_lock are either
> holding a transaction handle, or will start a new transaction:
> Functions will start a new transaction():
> * btrfs_quota_enable()
> * btrfs_quota_disable()
> Functions hold a transaction handler:
> * btrfs_add_qgroup_relation()
> * btrfs_del_qgroup_relation()
> * btrfs_create_qgroup()
> * btrfs_remove_qgroup()
> * btrfs_limit_qgroup()
> * btrfs_qgroup_inherit() call inside create_subvol()
>
> So we have a higher level protection provided by transaction, thus we
> don't need to always hold qgroup_ioctl_lock in btrfs_qgroup_inherit().
>
> Only the btrfs_qgroup_inherit() call in create_subvol() needs to hold
> qgroup_ioctl_lock, while the btrfs_qgroup_inherit() call in
> create_pending_snapshot() is already protected by transaction.
>
> So the fix is to detect the context by checking
> trans->transaction->state.
> If we're at TRANS_STATE_COMMIT_DOING, then we're in commit transaction
> context and no need to get the mutex.
>
> Reported-by: Nikolay Borisov <nborisov@xxxxxxxx>
> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> ---
> fs/btrfs/qgroup.c | 24 ++++++++++++++++++++++--
> 1 file changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 3e6ffbbd8b0a..f8a3c1b0a15a 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2614,6 +2614,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
> int ret = 0;
> int i;
> u64 *i_qgroups;
> + bool committing = false;
> struct btrfs_fs_info *fs_info = trans->fs_info;
> struct btrfs_root *quota_root;
> struct btrfs_qgroup *srcgroup;
> @@ -2621,7 +2622,25 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
> u32 level_size = 0;
> u64 nums;
>
> - mutex_lock(&fs_info->qgroup_ioctl_lock);
> + /*
> + * There are only two callers of this function.
> + *
> + * One in create_subvol() in the ioctl context, which needs to hold
> + * the qgroup_ioctl_lock.
> + *
> + * The other one in create_pending_snapshot() where no other qgroup
> + * code can modify the fs as they all need to either start a new trans
> + * or hold a trans handler, thus we don't need to hold
> + * qgroup_ioctl_lock.
> + * This would avoid long and complex lock chain and make lockdep happy.
> + */
> + spin_lock(&fs_info->trans_lock);
> + if (trans->transaction->state == TRANS_STATE_COMMIT_DOING)
> + committing = true;
> + spin_unlock(&fs_info->trans_lock);
> +
> + if (!committing)
> + mutex_lock(&fs_info->qgroup_ioctl_lock);
That's different from what I had in mind but is better in a way. The
conditional locking is not always a good option but I think it'll be ok
as the function is not heavily used in random contexts and it's
commented.