Hi,
On Thu, Apr 09, 2015 at 12:34:41PM +0800, Zhaolei wrote:
> From: Zhao Lei <zhaolei@xxxxxxxxxxxxxx>
>
> Steps to reproduce:
> while true; do
> dd if=/dev/zero of=/btrfs_dir/file count=[fs_size * 75%]
> rm /btrfs_dir/file
> sync
> done
>
> And we'll see dd failed because btrfs return NO_SPACE.
>
> Reason:
> Normally, btrfs_commit_transaction() call btrfs_run_delayed_iputs()
> in end to free fs space for next write, but sometimes it hadn't
> done work on time, because btrfs-cleaner thread get delayed-iputs
> from list before, but do iput() after next write.
>
> This is log:
> [ 2569.050776] comm=btrfs-cleaner func=btrfs_evict_inode() begin
>
> [ 2569.084280] comm=sync func=btrfs_commit_transaction() call btrfs_run_delayed_iputs()
> [ 2569.085418] comm=sync func=btrfs_commit_transaction() done btrfs_run_delayed_iputs()
> [ 2569.087554] comm=sync func=btrfs_commit_transaction() end
>
> [ 2569.191081] comm=dd begin
> [ 2569.790112] comm=dd func=__btrfs_buffered_write() ret=-28
>
> [ 2569.847479] comm=btrfs-cleaner func=add_pinned_bytes() 0 + 32677888 = 32677888
> [ 2569.849530] comm=btrfs-cleaner func=add_pinned_bytes() 32677888 + 23834624 = 56512512
> ...
> [ 2569.903893] comm=btrfs-cleaner func=add_pinned_bytes() 943976448 + 21762048 = 965738496
> [ 2569.908270] comm=btrfs-cleaner func=btrfs_evict_inode() end
>
> Fix:
> Make btrfs_commit_transaction() wait current running btrfs-cleaner's
> delayed-iputs() done in end.
>
> Test:
> Use script similar to above(more complex),
> before patch:
> 7 failed in 100 * 20 loop.
> after patch:
> 0 failed in 100 * 20 loop.
>
> Signed-off-by: Zhao Lei <zhaolei@xxxxxxxxxxxxxx>
> ---
> fs/btrfs/ctree.h | 1 +
> fs/btrfs/disk-io.c | 3 ++-
> fs/btrfs/extent-tree.c | 6 ++++++
> fs/btrfs/inode.c | 4 ++++
> 4 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index f9c89ca..54d4d78 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1513,6 +1513,7 @@ struct btrfs_fs_info {
>
> spinlock_t delayed_iput_lock;
> struct list_head delayed_iputs;
> + struct rw_semaphore delayed_iput_sem;
>
> /* this protects tree_mod_seq_list */
> spinlock_t tree_mod_seq_lock;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 639f266..6867471 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2241,11 +2241,12 @@ int open_ctree(struct super_block *sb,
> spin_lock_init(&fs_info->qgroup_op_lock);
> spin_lock_init(&fs_info->buffer_lock);
> spin_lock_init(&fs_info->unused_bgs_lock);
> - mutex_init(&fs_info->unused_bg_unpin_mutex);
> rwlock_init(&fs_info->tree_mod_log_lock);
> + mutex_init(&fs_info->unused_bg_unpin_mutex);
> mutex_init(&fs_info->reloc_mutex);
> mutex_init(&fs_info->delalloc_root_mutex);
> seqlock_init(&fs_info->profiles_lock);
> + init_rwsem(&fs_info->delayed_iput_sem);
>
> init_completion(&fs_info->kobj_unregister);
> INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 203ac63..6fd7dca 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3732,6 +3732,12 @@ commit_trans:
> ret = btrfs_commit_transaction(trans, root);
> if (ret)
> return ret;
> + /*
> + * make sure that all running delayed iput are
> + * done
> + */
> + down_write(&root->fs_info->delayed_iput_sem);
> + up_write(&root->fs_info->delayed_iput_sem);
> goto again;
> } else {
> btrfs_end_transaction(trans, root);
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index d2e732d..34d10be 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3110,6 +3110,8 @@ void btrfs_run_delayed_iputs(struct btrfs_root *root)
> if (empty)
> return;
>
> + down_read(&fs_info->delayed_iput_sem);
> +
> spin_lock(&fs_info->delayed_iput_lock);
> list_splice_init(&fs_info->delayed_iputs, &list);
> spin_unlock(&fs_info->delayed_iput_lock);
> @@ -3120,6 +3122,8 @@ void btrfs_run_delayed_iputs(struct btrfs_root *root)
> iput(delayed->inode);
> kfree(delayed);
> }
> +
> + up_read(&root->fs_info->delayed_iput_sem);
By introducing this "delayed_iput_sem", fstests's generic/241 cries
about a possible deadlock, can we use a waitqueue to avoid this?
[ 2061.345955] =============================================
[ 2061.346027] [ INFO: possible recursive locking detected ]
[ 2061.346027] 4.1.0+ #268 Tainted: G W
[ 2061.346027] ---------------------------------------------
[ 2061.346027] btrfs-cleaner/3045 is trying to acquire lock:
[ 2061.346027] (&fs_info->delayed_iput_sem){++++..}, at:
[<ffffffff814063ab>] btrfs_run_delayed_iputs+0x6b/0x100
[ 2061.346027] but task is already holding lock:
[ 2061.346027] (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffff814063ab>] btrfs_run_delayed_iputs+0x6b/0x100
[ 2061.346027] other info that might help us debug this:
[ 2061.346027] Possible unsafe locking scenario:
[ 2061.346027] CPU0
[ 2061.346027] ----
[ 2061.346027] lock(&fs_info->delayed_iput_sem);
[ 2061.346027] lock(&fs_info->delayed_iput_sem);
[ 2061.346027]
*** DEADLOCK ***
[ 2061.346027] May be due to missing lock nesting notation
[ 2061.346027] 2 locks held by btrfs-cleaner/3045:
[ 2061.346027] #0: (&fs_info->cleaner_mutex){+.+.+.}, at: [<ffffffff813f6cc4>] cleaner_kthread+0x64/0x1a0
[ 2061.346027] #1: (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffff814063ab>] btrfs_run_delayed_iputs+0x6b/0x100
[ 2061.346027] stack backtrace:
[ 2061.346027] CPU: 0 PID: 3045 Comm: btrfs-cleaner Tainted: G W 4.1.0+ #268
[ 2061.346027] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
[ 2061.346027] ffff88013760a700 ffff8800b4aff888 ffffffff818bb68e 0000000000000007
[ 2061.346027] ffffffff828fa0a0 ffff8800b4aff968 ffffffff810db18d ffff8800b4aff978
[ 2061.346027] ffffffff00000000 ffff88013760a750 ffffffff00000001 ffff88013760b470
[ 2061.346027] Call Trace:
[ 2061.346027] [<ffffffff818bb68e>] dump_stack+0x4c/0x65
[ 2061.346027] [<ffffffff810db18d>] __lock_acquire+0x214d/0x22f0
[ 2061.346027] [<ffffffff810d5e3d>] ? trace_hardirqs_off+0xd/0x10
[ 2061.346027] [<ffffffff810d649d>] ? __lock_is_held+0x4d/0x70
[ 2061.346027] [<ffffffff810dc050>] lock_acquire+0xd0/0x280
[ 2061.346027] [<ffffffff814063ab>] ? btrfs_run_delayed_iputs+0x6b/0x100
[ 2061.346027] [<ffffffff818c275c>] down_read+0x4c/0x70
[ 2061.346027] [<ffffffff814063ab>] ? btrfs_run_delayed_iputs+0x6b/0x100
[ 2061.346027] [<ffffffff818c48cb>] ? _raw_spin_unlock+0x2b/0x40
[ 2061.346027] [<ffffffff814063ab>] btrfs_run_delayed_iputs+0x6b/0x100
[ 2061.346027] [<ffffffff8140042f>] ? btrfs_commit_transaction+0xa1f/0xc50
[ 2061.346027] [<ffffffff81400454>] btrfs_commit_transaction+0xa44/0xc50
[ 2061.346027] [<ffffffff810cdbd0>] ? wait_woken+0xa0/0xa0
[ 2061.346027] [<ffffffff813e5247>] flush_space+0x97/0x4b0
[ 2061.346027] [<ffffffff818c48cb>] ? _raw_spin_unlock+0x2b/0x40
[ 2061.346027] [<ffffffff813e5a3b>] ? reserve_metadata_bytes+0x20b/0x6d0
[ 2061.346027] [<ffffffff813e5a52>] reserve_metadata_bytes+0x222/0x6d0
[ 2061.346027] [<ffffffff813e6245>] ? btrfs_block_rsv_refill+0x65/0x90
[ 2061.346027] [<ffffffff813e6257>] btrfs_block_rsv_refill+0x77/0x90
[ 2061.346027] [<ffffffff8140c723>] btrfs_evict_inode+0x473/0x700
[ 2061.346027] [<ffffffff8123dcbb>] evict+0xab/0x170
[ 2061.346027] [<ffffffff8123e6fd>] iput+0x1cd/0x350
[ 2061.346027] [<ffffffff81406409>] btrfs_run_delayed_iputs+0xc9/0x100
[ 2061.346027] [<ffffffff813f6cc4>] ? cleaner_kthread+0x64/0x1a0
[ 2061.346027] [<ffffffff813f6dba>] cleaner_kthread+0x15a/0x1a0
[ 2061.346027] [<ffffffff813f6c60>] ? btree_invalidatepage+0x90/0x90
[ 2061.346027] [<ffffffff810a3d5f>] kthread+0xef/0x110
[ 2061.346027] [<ffffffff810a3c70>] ? kthread_create_on_node+0x210/0x210
[ 2061.346027] [<ffffffff818c550f>] ret_from_fork+0x3f/0x70
[ 2061.346027] [<ffffffff810a3c70>] ? kthread_create_on_node+0x210/0x210
Thanks,
-liubo
> }
>
> /*
> --
> 1.8.5.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html