Re: [PATCH] btrfs: qgroup: Always free PREALLOC META reserve in btrfs_delalloc_release_extents()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 14, 2019 at 7:36 AM Qu Wenruo <wqu@xxxxxxxx> wrote:
>
> [Background]
> Btrfs qgroup uses two types of reserved space for METADATA space,
> PERTRANS and PREALLOC.
>
> PERTRANS is metadata space reserved for each transaction started by
> btrfs_start_transaction().
> While PREALLOC is for delalloc, where we reserve space before joining a
> transaction, and finally it will be converted to PERTRANS after the
> writeback is done.
>
> [Inconsistency]
> However there is inconsistency in how we handle PREALLOC metadata space.
>
> The most obvious one is:
> In btrfs_buffered_write():
>         btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes, true);
>
> We always free qgroup PREALLOC meta space.
>
> While in btrfs_truncate_block():
>         btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, (ret != 0));
>
> We only free qgroup PREALLOC meta space when something went wrong.
>
> [The Correct Behavior]
> The correct behavior should be the one in btrfs_buffered_write(), we
> should always free PREALLOC metadata space.
>
> The reason is, the btrfs_delalloc_* mechanism works by:
> - Reserve metadata first, even it's not necessary
>   In btrfs_delalloc_reserve_metadata()
>
> - Free the unused metadata space
>   Normally in:
>   btrfs_delalloc_release_extents()
>   |- btrfs_inode_rsv_release()
>      Here we do calculation on whether we should release or not.
>
> E.g. for 64K buffered write, the metadata rsv works like:
>
> /* The first page */
> reserve_meta:   num_bytes=calc_inode_reservations()
> free_meta:      num_bytes=0
> total:          num_bytes=calc_inode_reservations()
> /* The first page caused one outstanding extent, thus needs metadata
>    rsv */
>
> /* The 2nd page */
> reserve_meta:   num_bytes=calc_inode_reservations()
> free_meta:      num_bytes=calc_inode_reservations()
> total:          not changed
> /* The 2nd page doesn't cause new outstanding extent, needs no new meta
>    rsv, so we free what we have reserved */
>
> /* The 3rd~16th pages */
> reserve_meta:   num_bytes=calc_inode_reservations()
> free_meta:      num_bytes=calc_inode_reservations()
> total:          not changed (still space for one outstanding extent)
>
> This means, if btrfs_delalloc_release_extents() determines to free some
> space, then those space should be freed NOW.
> So for qgroup, we should call btrfs_qgroup_free_meta_prealloc() other
> than btrfs_qgroup_convert_reserved_meta().
>
> The good news is:
> - The callers are not that hot
>   The hottest caller is in btrfs_buffered_write(), which is already
>   fixed by commit 336a8bb8e36a ("btrfs: Fix wrong
>   btrfs_delalloc_release_extents parameter"). Thus it's not that
>   easy to cause false EDQUOT.
>
> - The trans commit in advance for qgroup would hide the bug
>   Since commit f5fef4593653 ("btrfs: qgroup: Make qgroup async transaction
>   commit more aggressive"), when btrfs qgroup metadata free space is slow,
>   it will try to commit transaction and free the wrongly converted
>   PERTRANS space, so it's not that easy to hit such bug.
>
> [FIX]
> So to fix the problem, remove the @qgroup_free parameter for
> btrfs_delalloc_release_extents(), and always pass true to
> btrfs_inode_rsv_release().
>
> Reported-by: Filipe Manana <fdmanana@xxxxxxxx>
> Fixes: 43b18595d660 ("btrfs: qgroup: Use separate meta reservation type for delalloc")
> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>

Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx>

Thanks for looking into this!

> ---
>  fs/btrfs/ctree.h          |  3 +--
>  fs/btrfs/delalloc-space.c |  6 ++----
>  fs/btrfs/file.c           |  7 +++----
>  fs/btrfs/inode-map.c      |  4 ++--
>  fs/btrfs/inode.c          | 12 ++++++------
>  fs/btrfs/ioctl.c          |  6 ++----
>  fs/btrfs/relocation.c     |  7 +++----
>  7 files changed, 19 insertions(+), 26 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 19d669d12ca1..6b50bafd6a64 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -2489,8 +2489,7 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
>                                      int nitems, bool use_global_rsv);
>  void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info,
>                                       struct btrfs_block_rsv *rsv);
> -void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes,
> -                                   bool qgroup_free);
> +void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes);
>
>  int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes);
>  u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo);
> diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
> index d949d7d2abed..571e7b31ea2f 100644
> --- a/fs/btrfs/delalloc-space.c
> +++ b/fs/btrfs/delalloc-space.c
> @@ -418,7 +418,6 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes,
>   * btrfs_delalloc_release_extents - release our outstanding_extents
>   * @inode: the inode to balance the reservation for.
>   * @num_bytes: the number of bytes we originally reserved with
> - * @qgroup_free: do we need to free qgroup meta reservation or convert them.
>   *
>   * When we reserve space we increase outstanding_extents for the extents we may
>   * add.  Once we've set the range as delalloc or created our ordered extents we
> @@ -426,8 +425,7 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes,
>   * temporarily tracked outstanding_extents.  This _must_ be used in conjunction
>   * with btrfs_delalloc_reserve_metadata.
>   */
> -void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes,
> -                                   bool qgroup_free)
> +void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes)
>  {
>         struct btrfs_fs_info *fs_info = inode->root->fs_info;
>         unsigned num_extents;
> @@ -441,7 +439,7 @@ void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes,
>         if (btrfs_is_testing(fs_info))
>                 return;
>
> -       btrfs_inode_rsv_release(inode, qgroup_free);
> +       btrfs_inode_rsv_release(inode, true);
>  }
>
>  /**
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 8fe4eb7e5045..59b20fb89abe 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1692,7 +1692,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>                                     force_page_uptodate);
>                 if (ret) {
>                         btrfs_delalloc_release_extents(BTRFS_I(inode),
> -                                                      reserve_bytes, true);
> +                                                      reserve_bytes);
>                         break;
>                 }
>
> @@ -1704,7 +1704,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>                         if (extents_locked == -EAGAIN)
>                                 goto again;
>                         btrfs_delalloc_release_extents(BTRFS_I(inode),
> -                                                      reserve_bytes, true);
> +                                                      reserve_bytes);
>                         ret = extents_locked;
>                         break;
>                 }
> @@ -1761,8 +1761,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>                 if (extents_locked)
>                         unlock_extent_cached(&BTRFS_I(inode)->io_tree,
>                                              lockstart, lockend, &cached_state);
> -               btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes,
> -                                              true);
> +               btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
>                 if (ret) {
>                         btrfs_drop_pages(pages, num_pages);
>                         break;
> diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
> index 63cad7865d75..37345fb6191d 100644
> --- a/fs/btrfs/inode-map.c
> +++ b/fs/btrfs/inode-map.c
> @@ -501,13 +501,13 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
>         ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
>                                               prealloc, prealloc, &alloc_hint);
>         if (ret) {
> -               btrfs_delalloc_release_extents(BTRFS_I(inode), prealloc, true);
> +               btrfs_delalloc_release_extents(BTRFS_I(inode), prealloc);
>                 btrfs_delalloc_release_metadata(BTRFS_I(inode), prealloc, true);
>                 goto out_put;
>         }
>
>         ret = btrfs_write_out_ino_cache(root, trans, path, inode);
> -       btrfs_delalloc_release_extents(BTRFS_I(inode), prealloc, false);
> +       btrfs_delalloc_release_extents(BTRFS_I(inode), prealloc);
>  out_put:
>         iput(inode);
>  out_release:
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index a0546401bc0a..166b3acfbb1f 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2206,7 +2206,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
>
>         ClearPageChecked(page);
>         set_page_dirty(page);
> -       btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, false);
> +       btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
>  out:
>         unlock_extent_cached(&BTRFS_I(inode)->io_tree, page_start, page_end,
>                              &cached_state);
> @@ -4951,7 +4951,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>         if (!page) {
>                 btrfs_delalloc_release_space(inode, data_reserved,
>                                              block_start, blocksize, true);
> -               btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, true);
> +               btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
>                 ret = -ENOMEM;
>                 goto out;
>         }
> @@ -5018,7 +5018,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>         if (ret)
>                 btrfs_delalloc_release_space(inode, data_reserved, block_start,
>                                              blocksize, true);
> -       btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, (ret != 0));
> +       btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
>         unlock_page(page);
>         put_page(page);
>  out:
> @@ -8706,7 +8706,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>                 } else if (ret >= 0 && (size_t)ret < count)
>                         btrfs_delalloc_release_space(inode, data_reserved,
>                                         offset, count - (size_t)ret, true);
> -               btrfs_delalloc_release_extents(BTRFS_I(inode), count, false);
> +               btrfs_delalloc_release_extents(BTRFS_I(inode), count);
>         }
>  out:
>         if (wakeup)
> @@ -9056,7 +9056,7 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
>         unlock_extent_cached(io_tree, page_start, page_end, &cached_state);
>
>         if (!ret2) {
> -               btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, true);
> +               btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
>                 sb_end_pagefault(inode->i_sb);
>                 extent_changeset_free(data_reserved);
>                 return VM_FAULT_LOCKED;
> @@ -9065,7 +9065,7 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
>  out_unlock:
>         unlock_page(page);
>  out:
> -       btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, (ret != 0));
> +       btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
>         btrfs_delalloc_release_space(inode, data_reserved, page_start,
>                                      reserved_space, (ret != 0));
>  out_noreserve:
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index de730e56d3f5..7c145a41decd 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1360,8 +1360,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
>                 unlock_page(pages[i]);
>                 put_page(pages[i]);
>         }
> -       btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT,
> -                                      false);
> +       btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT);
>         extent_changeset_free(data_reserved);
>         return i_done;
>  out:
> @@ -1372,8 +1371,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
>         btrfs_delalloc_release_space(inode, data_reserved,
>                         start_index << PAGE_SHIFT,
>                         page_cnt << PAGE_SHIFT, true);
> -       btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT,
> -                                      true);
> +       btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT);
>         extent_changeset_free(data_reserved);
>         return ret;
>
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 00504657b602..205b35ee2fb3 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -3297,7 +3297,7 @@ static int relocate_file_extent_cluster(struct inode *inode,
>                                 btrfs_delalloc_release_metadata(BTRFS_I(inode),
>                                                         PAGE_SIZE, true);
>                                 btrfs_delalloc_release_extents(BTRFS_I(inode),
> -                                                              PAGE_SIZE, true);
> +                                                              PAGE_SIZE);
>                                 ret = -EIO;
>                                 goto out;
>                         }
> @@ -3326,7 +3326,7 @@ static int relocate_file_extent_cluster(struct inode *inode,
>                         btrfs_delalloc_release_metadata(BTRFS_I(inode),
>                                                          PAGE_SIZE, true);
>                         btrfs_delalloc_release_extents(BTRFS_I(inode),
> -                                                      PAGE_SIZE, true);
> +                                                      PAGE_SIZE);
>
>                         clear_extent_bits(&BTRFS_I(inode)->io_tree,
>                                           page_start, page_end,
> @@ -3342,8 +3342,7 @@ static int relocate_file_extent_cluster(struct inode *inode,
>                 put_page(page);
>
>                 index++;
> -               btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE,
> -                                              false);
> +               btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
>                 balance_dirty_pages_ratelimited(inode->i_mapping);
>                 btrfs_throttle(fs_info);
>         }
> --
> 2.23.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux