On Thu, Jan 30, 2020 at 10:36 AM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote: > > > > On 2020/1/30 下午6:02, Filipe Manana wrote: > > On Thu, Jan 30, 2020 at 5:30 AM Qu Wenruo <wqu@xxxxxxxx> wrote: > >> > >> [BUG] > >> When the data space is exhausted, even the inode has NOCOW attribute, > >> btrfs will still refuse to truncate unaligned range due to ENOSPC. > >> > >> The following script can reproduce it pretty easily: > >> #!/bin/bash > >> > >> dev=/dev/test/test > >> mnt=/mnt/btrfs > >> > >> umount $dev &> /dev/null > >> umount $mnt&> /dev/null > >> > >> mkfs.btrfs -f $dev -b 1G > >> mount -o nospace_cache $dev $mnt > >> touch $mnt/foobar > >> chattr +C $mnt/foobar > >> > >> xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null > >> xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null > >> sync > >> > >> xfs_io -c "fpunch 0 2k" $mnt/foobar > >> umount $mnt > >> > >> Current btrfs will fail at the fpunch part. > >> > >> [CAUSE] > >> Because btrfs_truncate_block() always reserve space without checking the > >> NOCOW attribute. > >> > >> Since the writeback path follows NOCOW bit, we only need to bother the > >> space reservation code in btrfs_truncate_block(). > >> > >> [FIX] > >> Make btrfs_truncate_block() to follow btrfs_buffered_write() to try to > >> reserve data space first, and falls back to NOCOW check only when we > >> don't have enough space. > >> > >> Such always-try-reserve is an optimization introduced in > >> btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call. > >> > >> Since now check_can_nocow() is needed outside of inode.c, also export it > >> and rename it to btrfs_check_can_nocow(). > >> > >> Reported-by: Martin Doucha <martin.doucha@xxxxxxxx> > >> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> > >> --- > >> Test case will be submitted to fstests by the reporter. > > > > Well, this is a sudden change of mind, isn't it? :) > > > > We had btrfs/172, which you removed very recently, that precisely tested this: > > > > https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?id=538d8a4bcc782258f8f95fae815d5e859dee9126 > > I didn't notice the nodatacow mount option. Super duper big facepalm. > > All my bad, especially feel sorry for Anand. > > With nodatacow mount option there, that test case in fact makes a lot of > sense. > Sorry again for that. > > Anand, mind to resubmit it to generic group? Why the generic group? The nodatacow mount option is btrfs specific, and most filesystems don't support chattr +C (ext4 for example). > > Thanks, > Qu > > > > > Even though there are several reasons why this can still fail (at > > writeback time), like regular buffered writes through the family of > > write() syscalls can, I think it's perfectly fine to have this > > behaviour. > > > > Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx> > > > > So I think we can just resurrect btrfs/172 now... > > > >> --- > >> fs/btrfs/ctree.h | 2 ++ > >> fs/btrfs/file.c | 10 +++++----- > >> fs/btrfs/inode.c | 41 ++++++++++++++++++++++++++++++++++------- > >> 3 files changed, 41 insertions(+), 12 deletions(-) > >> > >> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > >> index 54efb21c2727..b5639f3461e4 100644 > >> --- a/fs/btrfs/ctree.h > >> +++ b/fs/btrfs/ctree.h > >> @@ -2954,6 +2954,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end); > >> loff_t btrfs_remap_file_range(struct file *file_in, loff_t pos_in, > >> struct file *file_out, loff_t pos_out, > >> loff_t len, unsigned int remap_flags); > >> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos, > >> + size_t *write_bytes); > >> > >> /* tree-defrag.c */ > >> int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, > >> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > >> index 8d47c76b7bd1..8dc084600f4e 100644 > >> --- a/fs/btrfs/file.c > >> +++ b/fs/btrfs/file.c > >> @@ -1544,8 +1544,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages, > >> return ret; > >> } > >> > >> -static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos, > >> - size_t *write_bytes) > >> +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos, > >> + size_t *write_bytes) > >> { > >> struct btrfs_fs_info *fs_info = inode->root->fs_info; > >> struct btrfs_root *root = inode->root; > >> @@ -1645,8 +1645,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, > >> if (ret < 0) { > >> if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | > >> BTRFS_INODE_PREALLOC)) && > >> - check_can_nocow(BTRFS_I(inode), pos, > >> - &write_bytes) > 0) { > >> + btrfs_check_can_nocow(BTRFS_I(inode), pos, > >> + &write_bytes) > 0) { > >> /* > >> * For nodata cow case, no need to reserve > >> * data space. > >> @@ -1923,7 +1923,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, > >> */ > >> if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | > >> BTRFS_INODE_PREALLOC)) || > >> - check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) { > >> + btrfs_check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) { > >> inode_unlock(inode); > >> return -EAGAIN; > >> } > >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > >> index 5509c41a4f43..b5ae4bbf1ad4 100644 > >> --- a/fs/btrfs/inode.c > >> +++ b/fs/btrfs/inode.c > >> @@ -4974,11 +4974,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, > >> struct extent_state *cached_state = NULL; > >> struct extent_changeset *data_reserved = NULL; > >> char *kaddr; > >> + bool only_release_metadata = false; > >> u32 blocksize = fs_info->sectorsize; > >> pgoff_t index = from >> PAGE_SHIFT; > >> unsigned offset = from & (blocksize - 1); > >> struct page *page; > >> gfp_t mask = btrfs_alloc_write_mask(mapping); > >> + size_t write_bytes = blocksize; > >> int ret = 0; > >> u64 block_start; > >> u64 block_end; > >> @@ -4990,11 +4992,26 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, > >> block_start = round_down(from, blocksize); > >> block_end = block_start + blocksize - 1; > >> > >> - ret = btrfs_delalloc_reserve_space(inode, &data_reserved, > >> - block_start, blocksize); > >> - if (ret) > >> + ret = btrfs_check_data_free_space(inode, &data_reserved, block_start, > >> + blocksize); > >> + if (ret < 0) { > >> + if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | > >> + BTRFS_INODE_PREALLOC)) && > >> + btrfs_check_can_nocow(BTRFS_I(inode), block_start, > >> + &write_bytes) > 0) { > >> + /* For nocow case, no need to reserve data space. */ > >> + only_release_metadata = true; > >> + } else { > >> + goto out; > >> + } > >> + } > >> + ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize); > >> + if (ret < 0) { > >> + if (!only_release_metadata) > >> + btrfs_free_reserved_data_space(inode, data_reserved, > >> + block_start, blocksize); > >> goto out; > >> - > >> + } > >> again: > >> page = find_or_create_page(mapping, index, mask); > >> if (!page) { > >> @@ -5063,10 +5080,20 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, > >> set_page_dirty(page); > >> unlock_extent_cached(io_tree, block_start, block_end, &cached_state); > >> > >> + if (only_release_metadata) > >> + set_extent_bit(&BTRFS_I(inode)->io_tree, block_start, > >> + block_end, EXTENT_NORESERVE, NULL, NULL, > >> + GFP_NOFS); > >> + > >> out_unlock: > >> - if (ret) > >> - btrfs_delalloc_release_space(inode, data_reserved, block_start, > >> - blocksize, true); > >> + if (ret) { > >> + if (!only_release_metadata) > >> + btrfs_delalloc_release_space(inode, data_reserved, > >> + block_start, blocksize, true); > >> + else > >> + btrfs_delalloc_release_metadata(BTRFS_I(inode), > >> + blocksize, true); > > > > I usually find it more intuitive to have it the other way around: > > > > if (only_release_metadata) > > ... > > else > > ... > > > > E.g., positive case first, negative in the else branch. But that's > > likely too much of a personal preference. > > > > Thanks. > > > >> + } > >> btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize); > >> unlock_page(page); > >> put_page(page); > >> -- > >> 2.25.0 > >> > > > > > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”
