Re: [PATCH] btrfs: Allow btrfs_truncate_block() to fallback to nocow for data space reservation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/1/30 下午6:02, Filipe Manana wrote:
> On Thu, Jan 30, 2020 at 5:30 AM Qu Wenruo <wqu@xxxxxxxx> wrote:
>>
>> [BUG]
>> When the data space is exhausted, even the inode has NOCOW attribute,
>> btrfs will still refuse to truncate unaligned range due to ENOSPC.
>>
>> The following script can reproduce it pretty easily:
>>   #!/bin/bash
>>
>>   dev=/dev/test/test
>>   mnt=/mnt/btrfs
>>
>>   umount $dev &> /dev/null
>>   umount $mnt&> /dev/null
>>
>>   mkfs.btrfs -f $dev -b 1G
>>   mount -o nospace_cache $dev $mnt
>>   touch $mnt/foobar
>>   chattr +C $mnt/foobar
>>
>>   xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null
>>   xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null
>>   sync
>>
>>   xfs_io -c "fpunch 0 2k" $mnt/foobar
>>   umount $mnt
>>
>> Current btrfs will fail at the fpunch part.
>>
>> [CAUSE]
>> Because btrfs_truncate_block() always reserve space without checking the
>> NOCOW attribute.
>>
>> Since the writeback path follows NOCOW bit, we only need to bother the
>> space reservation code in btrfs_truncate_block().
>>
>> [FIX]
>> Make btrfs_truncate_block() to follow btrfs_buffered_write() to try to
>> reserve data space first, and falls back to NOCOW check only when we
>> don't have enough space.
>>
>> Such always-try-reserve is an optimization introduced in
>> btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call.
>>
>> Since now check_can_nocow() is needed outside of inode.c, also export it
>> and rename it to btrfs_check_can_nocow().
>>
>> Reported-by: Martin Doucha <martin.doucha@xxxxxxxx>
>> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
>> ---
>> Test case will be submitted to fstests by the reporter.
> 
> Well, this is a sudden change of mind, isn't it? :)
> 
> We had btrfs/172, which you removed very recently, that precisely tested this:
> 
> https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?id=538d8a4bcc782258f8f95fae815d5e859dee9126

I didn't notice the nodatacow mount option. Super duper big facepalm.

All my bad, especially feel sorry for Anand.

With nodatacow mount option there, that test case in fact makes a lot of
sense.
Sorry again for that.

Anand, mind to resubmit it to generic group?

Thanks,
Qu

> 
> Even though there are several reasons why this can still fail (at
> writeback time), like regular buffered writes through the family of
> write() syscalls can, I think it's perfectly fine to have this
> behaviour.
> 
> Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx>
> 
> So I think we can just resurrect btrfs/172 now...
> 
>> ---
>>  fs/btrfs/ctree.h |  2 ++
>>  fs/btrfs/file.c  | 10 +++++-----
>>  fs/btrfs/inode.c | 41 ++++++++++++++++++++++++++++++++++-------
>>  3 files changed, 41 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 54efb21c2727..b5639f3461e4 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -2954,6 +2954,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
>>  loff_t btrfs_remap_file_range(struct file *file_in, loff_t pos_in,
>>                               struct file *file_out, loff_t pos_out,
>>                               loff_t len, unsigned int remap_flags);
>> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
>> +                         size_t *write_bytes);
>>
>>  /* tree-defrag.c */
>>  int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index 8d47c76b7bd1..8dc084600f4e 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -1544,8 +1544,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
>>         return ret;
>>  }
>>
>> -static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
>> -                                   size_t *write_bytes)
>> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
>> +                         size_t *write_bytes)
>>  {
>>         struct btrfs_fs_info *fs_info = inode->root->fs_info;
>>         struct btrfs_root *root = inode->root;
>> @@ -1645,8 +1645,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>>                 if (ret < 0) {
>>                         if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>>                                                       BTRFS_INODE_PREALLOC)) &&
>> -                           check_can_nocow(BTRFS_I(inode), pos,
>> -                                       &write_bytes) > 0) {
>> +                           btrfs_check_can_nocow(BTRFS_I(inode), pos,
>> +                                                 &write_bytes) > 0) {
>>                                 /*
>>                                  * For nodata cow case, no need to reserve
>>                                  * data space.
>> @@ -1923,7 +1923,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
>>                  */
>>                 if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>>                                               BTRFS_INODE_PREALLOC)) ||
>> -                   check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
>> +                   btrfs_check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) {
>>                         inode_unlock(inode);
>>                         return -EAGAIN;
>>                 }
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 5509c41a4f43..b5ae4bbf1ad4 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -4974,11 +4974,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>         struct extent_state *cached_state = NULL;
>>         struct extent_changeset *data_reserved = NULL;
>>         char *kaddr;
>> +       bool only_release_metadata = false;
>>         u32 blocksize = fs_info->sectorsize;
>>         pgoff_t index = from >> PAGE_SHIFT;
>>         unsigned offset = from & (blocksize - 1);
>>         struct page *page;
>>         gfp_t mask = btrfs_alloc_write_mask(mapping);
>> +       size_t write_bytes = blocksize;
>>         int ret = 0;
>>         u64 block_start;
>>         u64 block_end;
>> @@ -4990,11 +4992,26 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>         block_start = round_down(from, blocksize);
>>         block_end = block_start + blocksize - 1;
>>
>> -       ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
>> -                                          block_start, blocksize);
>> -       if (ret)
>> +       ret = btrfs_check_data_free_space(inode, &data_reserved, block_start,
>> +                                         blocksize);
>> +       if (ret < 0) {
>> +               if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>> +                                             BTRFS_INODE_PREALLOC)) &&
>> +                   btrfs_check_can_nocow(BTRFS_I(inode), block_start,
>> +                                         &write_bytes) > 0) {
>> +                       /* For nocow case, no need to reserve data space. */
>> +                       only_release_metadata = true;
>> +               } else {
>> +                       goto out;
>> +               }
>> +       }
>> +       ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize);
>> +       if (ret < 0) {
>> +               if (!only_release_metadata)
>> +                       btrfs_free_reserved_data_space(inode, data_reserved,
>> +                                       block_start, blocksize);
>>                 goto out;
>> -
>> +       }
>>  again:
>>         page = find_or_create_page(mapping, index, mask);
>>         if (!page) {
>> @@ -5063,10 +5080,20 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>         set_page_dirty(page);
>>         unlock_extent_cached(io_tree, block_start, block_end, &cached_state);
>>
>> +       if (only_release_metadata)
>> +               set_extent_bit(&BTRFS_I(inode)->io_tree, block_start,
>> +                               block_end, EXTENT_NORESERVE, NULL, NULL,
>> +                               GFP_NOFS);
>> +
>>  out_unlock:
>> -       if (ret)
>> -               btrfs_delalloc_release_space(inode, data_reserved, block_start,
>> -                                            blocksize, true);
>> +       if (ret) {
>> +               if (!only_release_metadata)
>> +                       btrfs_delalloc_release_space(inode, data_reserved,
>> +                                       block_start, blocksize, true);
>> +               else
>> +                       btrfs_delalloc_release_metadata(BTRFS_I(inode),
>> +                                       blocksize, true);
> 
> I usually find it more intuitive to have it the other way around:
> 
> if (only_release_metadata)
>   ...
> else
>   ...
> 
> E.g., positive case first, negative in the else branch. But that's
> likely too much of a personal preference.
> 
> Thanks.
> 
>> +       }
>>         btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
>>         unlock_page(page);
>>         put_page(page);
>> --
>> 2.25.0
>>
> 
> 

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux