Re: [PATCH RFC] btrfs: introduce a separate mutex for caching_block_groups list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This sounds reasonable to me, I'll look at it more when I'm on the ground and can look at the code and see for sure.  Thanks,

Josef

Sent from my iPhone

> On Mar 19, 2017, at 1:19 PM, Alex Lyakas <alex@xxxxxxxxxxxxxxxxx> wrote:
> 
> We have a commit_root_sem, which is a read-write semaphore that protects the commit roots.
> But it is also used to protect the list of caching block groups.
> 
> As a result, while doing "slow" caching, the following issue is seen:
> 
> Some of the caching threads are scanning the extent tree with commit_root_sem
> acquired in shared mode, with stack like:
> [<ffffffffc0ad27b2>] read_extent_buffer_pages+0x2d2/0x300 [btrfs]
> [<ffffffffc0a9fbe7>] btree_read_extent_buffer_pages.constprop.50+0xb7/0x1e0 [btrfs]
> [<ffffffffc0aa1550>] read_tree_block+0x40/0x70 [btrfs]
> [<ffffffffc0a7aa7c>] read_block_for_search.isra.33+0x12c/0x370 [btrfs]
> [<ffffffffc0a7ce86>] btrfs_search_slot+0x3c6/0xb10 [btrfs]
> [<ffffffffc0a975b9>] caching_thread+0x1b9/0x820 [btrfs]
> [<ffffffffc0adfa36>] normal_work_helper+0xc6/0x340 [btrfs]
> [<ffffffffc0adfd22>] btrfs_cache_helper+0x12/0x20 [btrfs]
> 
> IO requests that want to allocate space are waiting in cache_block_group()
> to acquire the commit_root_sem in exclusive mode. But they only want to add
> the caching control structure to the list of caching block-groups:
> [<ffffffff817136c9>] schedule+0x29/0x70
> [<ffffffff81716085>] rwsem_down_write_failed+0x145/0x320
> [<ffffffff813a1ae3>] call_rwsem_down_write_failed+0x13/0x20
> [<ffffffffc0a86d0b>] cache_block_group+0x25b/0x450 [btrfs]
> [<ffffffffc0a94d36>] find_free_extent+0xd16/0xdb0 [btrfs]
> [<ffffffffc0a94e7f>] btrfs_reserve_extent+0xaf/0x160 [btrfs]
> 
> Other caching threads want to continue their scanning, and for that they
> are waiting to acquire commit_root_sem in shared mode. But since there are
> IO threads that want the exclusive lock, the caching threads are unable
> to continue the scanning, because (I presume) rw_semaphore guarantees some fairness:
> [<ffffffff817136c9>] schedule+0x29/0x70
> [<ffffffff81715ee5>] rwsem_down_read_failed+0xc5/0x120
> [<ffffffff813a1ab4>] call_rwsem_down_read_failed+0x14/0x30
> [<ffffffffc0a975a1>] caching_thread+0x1a1/0x820 [btrfs]
> [<ffffffffc0adfa36>] normal_work_helper+0xc6/0x340 [btrfs]
> [<ffffffffc0adfd22>] btrfs_cache_helper+0x12/0x20 [btrfs]
> [<ffffffff8108bd56>] process_one_work+0x146/0x410
> 
> This causes slowness of the IO, especially when there are many block groups
> that need to be scanned for free space. In some cases it takes minutes
> until a single IO thread is able to allocate free space.
> 
> I don't see a deadlock here, because the caching threads that were able to acquire
> the commit_root_sem will call rwsem_is_contended() and should give up the semaphore,
> so that IO threads are able to acquire it in exclusive mode.
> 
> However, introducing a separate mutex that protects only the list of caching
> block groups makes things move forward much faster.
> 
> This patch is based on kernel 3.18.
> Unfortunately, I am not able to submit a patch based on one of the latest kernels, because
> here btrfs is part of the larger system, and upgrading the kernel is a significant effort.
> Hence marking the patch as RFC.
> Hopefully, this patch still has some value to the community.
> 
> Signed-off-by: Alex Lyakas <alex@xxxxxxxxxxxxxxxxx>
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 42d11e7..74feacb 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1490,6 +1490,8 @@ struct btrfs_fs_info {
>    struct list_head trans_list;
>    struct list_head dead_roots;
>    struct list_head caching_block_groups;
> +    /* protects the above list */
> +    struct mutex caching_block_groups_mutex;
> 
>    spinlock_t delayed_iput_lock;
>    struct list_head delayed_iputs;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 5177954..130ec58 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2229,6 +2229,7 @@ int open_ctree(struct super_block *sb,
>    INIT_LIST_HEAD(&fs_info->delayed_iputs);
>    INIT_LIST_HEAD(&fs_info->delalloc_roots);
>    INIT_LIST_HEAD(&fs_info->caching_block_groups);
> +    mutex_init(&fs_info->caching_block_groups_mutex);
>    spin_lock_init(&fs_info->delalloc_root_lock);
>    spin_lock_init(&fs_info->trans_lock);
>    spin_lock_init(&fs_info->fs_roots_radix_lock);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index a067065..906fb08 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -637,10 +637,10 @@ static int cache_block_group(struct btrfs_block_group_cache *cache,
>        return 0;
>    }
> 
> -    down_write(&fs_info->commit_root_sem);
> +    mutex_lock(&fs_info->caching_block_groups_mutex);
>    atomic_inc(&caching_ctl->count);
>    list_add_tail(&caching_ctl->list, &fs_info->caching_block_groups);
> -    up_write(&fs_info->commit_root_sem);
> +    mutex_unlock(&fs_info->caching_block_groups_mutex);
> 
>    btrfs_get_block_group(cache);
> 
> @@ -5693,6 +5693,7 @@ void btrfs_prepare_extent_commit(struct btrfs_trans_handle *trans,
> 
>    down_write(&fs_info->commit_root_sem);
> 
> +    mutex_lock(&fs_info->caching_block_groups_mutex);
>    list_for_each_entry_safe(caching_ctl, next,
>                 &fs_info->caching_block_groups, list) {
>        cache = caching_ctl->block_group;
> @@ -5704,6 +5705,7 @@ void btrfs_prepare_extent_commit(struct btrfs_trans_handle *trans,
>            cache->last_byte_to_unpin = caching_ctl->progress;
>        }
>    }
> +    mutex_unlock(&fs_info->caching_block_groups_mutex);
> 
>    if (fs_info->pinned_extents == &fs_info->freed_extents[0])
>        fs_info->pinned_extents = &fs_info->freed_extents[1];
> @@ -8849,14 +8851,14 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
>    struct btrfs_caching_control *caching_ctl;
>    struct rb_node *n;
> 
> -    down_write(&info->commit_root_sem);
> +    mutex_lock(&info->caching_block_groups_mutex);
>    while (!list_empty(&info->caching_block_groups)) {
>        caching_ctl = list_entry(info->caching_block_groups.next,
>                     struct btrfs_caching_control, list);
>        list_del(&caching_ctl->list);
>        put_caching_control(caching_ctl);
>    }
> -    up_write(&info->commit_root_sem);
> +    mutex_unlock(&info->caching_block_groups_mutex);
> 
>    spin_lock(&info->unused_bgs_lock);
>    while (!list_empty(&info->unused_bgs)) {
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html





[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux