On 3.12.18 г. 18:06 ч., Josef Bacik wrote:
> The throttle path doesn't take cleaner_delayed_iput_mutex, which means
> we could think we're done flushing iputs in the data space reservation
> path when we could have a throttler doing an iput. There's no real
> reason to serialize the delayed iput flushing, so instead of taking the
> cleaner_delayed_iput_mutex whenever we flush the delayed iputs just
> replace it with an atomic counter and a waitqueue. This removes the
> short (or long depending on how big the inode is) window where we think
> there are no more pending iputs when there really are some.
>
> Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
> ---
> fs/btrfs/ctree.h | 4 +++-
> fs/btrfs/disk-io.c | 5 ++---
> fs/btrfs/extent-tree.c | 13 ++++++++-----
> fs/btrfs/inode.c | 21 +++++++++++++++++++++
> 4 files changed, 34 insertions(+), 9 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index dc56a4d940c3..20af5d6d81f1 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -915,7 +915,8 @@ struct btrfs_fs_info {
>
> spinlock_t delayed_iput_lock;
> struct list_head delayed_iputs;
> - struct mutex cleaner_delayed_iput_mutex;
> + atomic_t nr_delayed_iputs;
> + wait_queue_head_t delayed_iputs_wait;
>
Have you considered whether the same could be achieved with a completion
rather than an open-coded waitqueue? I tried prototyping it and it could
be done but it becomes messy regarding when the completion should be
initialised i.e only when it's not in btrfs_add_delayed_iput
<snip>
> @@ -4958,9 +4962,8 @@ static void flush_space(struct btrfs_fs_info *fs_info,
> * bunch of pinned space, so make sure we run the iputs before
> * we do our pinned bytes check below.
> */
> - mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
> btrfs_run_delayed_iputs(fs_info);
> - mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
> + btrfs_wait_on_delayed_iputs(fs_info);
Waiting on delayed iputs here is pointless since they are run
synchronously form this context.
>
> ret = may_commit_transaction(fs_info, space_info);
> break;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 0b9f3e482cea..958e30c7c744 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3260,6 +3260,7 @@ void btrfs_add_delayed_iput(struct inode *inode)
> if (atomic_add_unless(&inode->i_count, -1, 1))
> return;
>
> + atomic_inc(&fs_info->nr_delayed_iputs);
> spin_lock(&fs_info->delayed_iput_lock);
> ASSERT(list_empty(&binode->delayed_iput));
> list_add_tail(&binode->delayed_iput, &fs_info->delayed_iputs);
> @@ -3280,11 +3281,31 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info)
> list_del_init(&inode->delayed_iput);
> spin_unlock(&fs_info->delayed_iput_lock);
> iput(&inode->vfs_inode);
> + if (atomic_dec_and_test(&fs_info->nr_delayed_iputs))
> + wake_up(&fs_info->delayed_iputs_wait);
> spin_lock(&fs_info->delayed_iput_lock);
> }
> spin_unlock(&fs_info->delayed_iput_lock);
> }
>
> +/**
> + * btrfs_wait_on_delayed_iputs - wait on the delayed iputs to be done running
> + * @fs_info - the fs_info for this fs
> + * @return - EINTR if we were killed, 0 if nothing's pending
> + *
> + * This will wait on any delayed iputs that are currently running with KILLABLE
> + * set. Once they are all done running we will return, unless we are killed in
> + * which case we return EINTR.
> + */
> +int btrfs_wait_on_delayed_iputs(struct btrfs_fs_info *fs_info)
> +{
> + int ret = wait_event_killable(fs_info->delayed_iputs_wait,
> + atomic_read(&fs_info->nr_delayed_iputs) == 0);
> + if (ret)
> + return -EINTR;
> + return 0;
> +}
> +
> /*
> * This creates an orphan entry for the given inode in case something goes wrong
> * in the middle of an unlink.
>