Re: [PATCH] btrfs: Allow btrfs_truncate_block() to fallback to nocow for data space reservation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 30, 2020 at 10:36 AM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote:
>
>
>
> On 2020/1/30 下午6:02, Filipe Manana wrote:
> > On Thu, Jan 30, 2020 at 5:30 AM Qu Wenruo <wqu@xxxxxxxx> wrote:
> >>
> >> [BUG]
> >> When the data space is exhausted, even the inode has NOCOW attribute,
> >> btrfs will still refuse to truncate unaligned range due to ENOSPC.
> >>
> >> The following script can reproduce it pretty easily:
> >>   #!/bin/bash
> >>
> >>   dev=/dev/test/test
> >>   mnt=/mnt/btrfs
> >>
> >>   umount $dev &> /dev/null
> >>   umount $mnt&> /dev/null
> >>
> >>   mkfs.btrfs -f $dev -b 1G
> >>   mount -o nospace_cache $dev $mnt
> >>   touch $mnt/foobar
> >>   chattr +C $mnt/foobar
> >>
> >>   xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null
> >>   xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null
> >>   sync
> >>
> >>   xfs_io -c "fpunch 0 2k" $mnt/foobar
> >>   umount $mnt
> >>
> >> Current btrfs will fail at the fpunch part.
> >>
> >> [CAUSE]
> >> Because btrfs_truncate_block() always reserve space without checking the
> >> NOCOW attribute.
> >>
> >> Since the writeback path follows NOCOW bit, we only need to bother the
> >> space reservation code in btrfs_truncate_block().
> >>
> >> [FIX]
> >> Make btrfs_truncate_block() to follow btrfs_buffered_write() to try to
> >> reserve data space first, and falls back to NOCOW check only when we
> >> don't have enough space.
> >>
> >> Such always-try-reserve is an optimization introduced in
> >> btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call.
> >>
> >> Since now check_can_nocow() is needed outside of inode.c, also export it
> >> and rename it to btrfs_check_can_nocow().
> >>
> >> Reported-by: Martin Doucha <martin.doucha@xxxxxxxx>
> >> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> >> ---
> >> Test case will be submitted to fstests by the reporter.
> >
> > Well, this is a sudden change of mind, isn't it? :)
> >
> > We had btrfs/172, which you removed very recently, that precisely tested this:
> >
> > https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?id=538d8a4bcc782258f8f95fae815d5e859dee9126
>
> I didn't notice the nodatacow mount option. Super duper big facepalm.
>
> All my bad, especially feel sorry for Anand.
>
> With nodatacow mount option there, that test case in fact makes a lot of
> sense.
> Sorry again for that.
>
> Anand, mind to resubmit it to generic group?

Why the generic group?
The nodatacow mount option is btrfs specific, and most filesystems
don't support chattr +C (ext4 for example).

>
> Thanks,
> Qu
>
> >
> > Even though there are several reasons why this can still fail (at
> > writeback time), like regular buffered writes through the family of
> > write() syscalls can, I think it's perfectly fine to have this
> > behaviour.
> >
> > Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx>
> >
> > So I think we can just resurrect btrfs/172 now...
> >
> >> ---
> >>  fs/btrfs/ctree.h |  2 ++
> >>  fs/btrfs/file.c  | 10 +++++-----
> >>  fs/btrfs/inode.c | 41 ++++++++++++++++++++++++++++++++++-------
> >>  3 files changed, 41 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> >> index 54efb21c2727..b5639f3461e4 100644
> >> --- a/fs/btrfs/ctree.h
> >> +++ b/fs/btrfs/ctree.h
> >> @@ -2954,6 +2954,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
> >>  loff_t btrfs_remap_file_range(struct file *file_in, loff_t pos_in,
> >>                               struct file *file_out, loff_t pos_out,
> >>                               loff_t len, unsigned int remap_flags);
> >> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
> >> +                         size_t *write_bytes);
> >>
> >>  /* tree-defrag.c */
> >>  int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
> >> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> >> index 8d47c76b7bd1..8dc084600f4e 100644
> >> --- a/fs/btrfs/file.c
> >> +++ b/fs/btrfs/file.c
> >> @@ -1544,8 +1544,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
> >>         return ret;
> >>  }
> >>
> >> -static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
> >> -                                   size_t *write_bytes)
> >> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
> >> +                         size_t *write_bytes)
> >>  {
> >>         struct btrfs_fs_info *fs_info = inode->root->fs_info;
> >>         struct btrfs_root *root = inode->root;
> >> @@ -1645,8 +1645,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
> >>                 if (ret < 0) {
> >>                         if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> >>                                                       BTRFS_INODE_PREALLOC)) &&
> >> -                           check_can_nocow(BTRFS_I(inode), pos,
> >> -                                       &write_bytes) > 0) {
> >> +                           btrfs_check_can_nocow(BTRFS_I(inode), pos,
> >> +                                                 &write_bytes) > 0) {
> >>                                 /*
> >>                                  * For nodata cow case, no need to reserve
> >>                                  * data space.
> >> @@ -1923,7 +1923,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
> >>                  */
> >>                 if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> >>                                               BTRFS_INODE_PREALLOC)) ||
> >> -                   check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
> >> +                   btrfs_check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
> >>                         inode_unlock(inode);
> >>                         return -EAGAIN;
> >>                 }
> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> >> index 5509c41a4f43..b5ae4bbf1ad4 100644
> >> --- a/fs/btrfs/inode.c
> >> +++ b/fs/btrfs/inode.c
> >> @@ -4974,11 +4974,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
> >>         struct extent_state *cached_state = NULL;
> >>         struct extent_changeset *data_reserved = NULL;
> >>         char *kaddr;
> >> +       bool only_release_metadata = false;
> >>         u32 blocksize = fs_info->sectorsize;
> >>         pgoff_t index = from >> PAGE_SHIFT;
> >>         unsigned offset = from & (blocksize - 1);
> >>         struct page *page;
> >>         gfp_t mask = btrfs_alloc_write_mask(mapping);
> >> +       size_t write_bytes = blocksize;
> >>         int ret = 0;
> >>         u64 block_start;
> >>         u64 block_end;
> >> @@ -4990,11 +4992,26 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
> >>         block_start = round_down(from, blocksize);
> >>         block_end = block_start + blocksize - 1;
> >>
> >> -       ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
> >> -                                          block_start, blocksize);
> >> -       if (ret)
> >> +       ret = btrfs_check_data_free_space(inode, &data_reserved, block_start,
> >> +                                         blocksize);
> >> +       if (ret < 0) {
> >> +               if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> >> +                                             BTRFS_INODE_PREALLOC)) &&
> >> +                   btrfs_check_can_nocow(BTRFS_I(inode), block_start,
> >> +                                         &write_bytes) > 0) {
> >> +                       /* For nocow case, no need to reserve data space. */
> >> +                       only_release_metadata = true;
> >> +               } else {
> >> +                       goto out;
> >> +               }
> >> +       }
> >> +       ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize);
> >> +       if (ret < 0) {
> >> +               if (!only_release_metadata)
> >> +                       btrfs_free_reserved_data_space(inode, data_reserved,
> >> +                                       block_start, blocksize);
> >>                 goto out;
> >> -
> >> +       }
> >>  again:
> >>         page = find_or_create_page(mapping, index, mask);
> >>         if (!page) {
> >> @@ -5063,10 +5080,20 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
> >>         set_page_dirty(page);
> >>         unlock_extent_cached(io_tree, block_start, block_end, &cached_state);
> >>
> >> +       if (only_release_metadata)
> >> +               set_extent_bit(&BTRFS_I(inode)->io_tree, block_start,
> >> +                               block_end, EXTENT_NORESERVE, NULL, NULL,
> >> +                               GFP_NOFS);
> >> +
> >>  out_unlock:
> >> -       if (ret)
> >> -               btrfs_delalloc_release_space(inode, data_reserved, block_start,
> >> -                                            blocksize, true);
> >> +       if (ret) {
> >> +               if (!only_release_metadata)
> >> +                       btrfs_delalloc_release_space(inode, data_reserved,
> >> +                                       block_start, blocksize, true);
> >> +               else
> >> +                       btrfs_delalloc_release_metadata(BTRFS_I(inode),
> >> +                                       blocksize, true);
> >
> > I usually find it more intuitive to have it the other way around:
> >
> > if (only_release_metadata)
> >   ...
> > else
> >   ...
> >
> > E.g., positive case first, negative in the else branch. But that's
> > likely too much of a personal preference.
> >
> > Thanks.
> >
> >> +       }
> >>         btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
> >>         unlock_page(page);
> >>         put_page(page);
> >> --
> >> 2.25.0
> >>
> >
> >
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux