On Thu, Jan 30, 2020 at 5:30 AM Qu Wenruo <wqu@xxxxxxxx> wrote: > > [BUG] > When the data space is exhausted, even the inode has NOCOW attribute, > btrfs will still refuse to truncate unaligned range due to ENOSPC. > > The following script can reproduce it pretty easily: > #!/bin/bash > > dev=/dev/test/test > mnt=/mnt/btrfs > > umount $dev &> /dev/null > umount $mnt&> /dev/null > > mkfs.btrfs -f $dev -b 1G > mount -o nospace_cache $dev $mnt > touch $mnt/foobar > chattr +C $mnt/foobar > > xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null > xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null > sync > > xfs_io -c "fpunch 0 2k" $mnt/foobar > umount $mnt > > Current btrfs will fail at the fpunch part. > > [CAUSE] > Because btrfs_truncate_block() always reserve space without checking the > NOCOW attribute. > > Since the writeback path follows NOCOW bit, we only need to bother the > space reservation code in btrfs_truncate_block(). > > [FIX] > Make btrfs_truncate_block() to follow btrfs_buffered_write() to try to > reserve data space first, and falls back to NOCOW check only when we > don't have enough space. > > Such always-try-reserve is an optimization introduced in > btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call. > > Since now check_can_nocow() is needed outside of inode.c, also export it > and rename it to btrfs_check_can_nocow(). > > Reported-by: Martin Doucha <martin.doucha@xxxxxxxx> > Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> > --- > Test case will be submitted to fstests by the reporter. Well, this is a sudden change of mind, isn't it? :) We had btrfs/172, which you removed very recently, that precisely tested this: https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?id=538d8a4bcc782258f8f95fae815d5e859dee9126 Even though there are several reasons why this can still fail (at writeback time), like regular buffered writes through the family of write() syscalls can, I think it's perfectly fine to have this behaviour. Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx> So I think we can just resurrect btrfs/172 now... > --- > fs/btrfs/ctree.h | 2 ++ > fs/btrfs/file.c | 10 +++++----- > fs/btrfs/inode.c | 41 ++++++++++++++++++++++++++++++++++------- > 3 files changed, 41 insertions(+), 12 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 54efb21c2727..b5639f3461e4 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -2954,6 +2954,8 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end); > loff_t btrfs_remap_file_range(struct file *file_in, loff_t pos_in, > struct file *file_out, loff_t pos_out, > loff_t len, unsigned int remap_flags); > +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos, > + size_t *write_bytes); > > /* tree-defrag.c */ > int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index 8d47c76b7bd1..8dc084600f4e 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -1544,8 +1544,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages, > return ret; > } > > -static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos, > - size_t *write_bytes) > +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos, > + size_t *write_bytes) > { > struct btrfs_fs_info *fs_info = inode->root->fs_info; > struct btrfs_root *root = inode->root; > @@ -1645,8 +1645,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, > if (ret < 0) { > if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | > BTRFS_INODE_PREALLOC)) && > - check_can_nocow(BTRFS_I(inode), pos, > - &write_bytes) > 0) { > + btrfs_check_can_nocow(BTRFS_I(inode), pos, > + &write_bytes) > 0) { > /* > * For nodata cow case, no need to reserve > * data space. > @@ -1923,7 +1923,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, > */ > if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | > BTRFS_INODE_PREALLOC)) || > - check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) { > + btrfs_check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) { > inode_unlock(inode); > return -EAGAIN; > } > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 5509c41a4f43..b5ae4bbf1ad4 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -4974,11 +4974,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, > struct extent_state *cached_state = NULL; > struct extent_changeset *data_reserved = NULL; > char *kaddr; > + bool only_release_metadata = false; > u32 blocksize = fs_info->sectorsize; > pgoff_t index = from >> PAGE_SHIFT; > unsigned offset = from & (blocksize - 1); > struct page *page; > gfp_t mask = btrfs_alloc_write_mask(mapping); > + size_t write_bytes = blocksize; > int ret = 0; > u64 block_start; > u64 block_end; > @@ -4990,11 +4992,26 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, > block_start = round_down(from, blocksize); > block_end = block_start + blocksize - 1; > > - ret = btrfs_delalloc_reserve_space(inode, &data_reserved, > - block_start, blocksize); > - if (ret) > + ret = btrfs_check_data_free_space(inode, &data_reserved, block_start, > + blocksize); > + if (ret < 0) { > + if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | > + BTRFS_INODE_PREALLOC)) && > + btrfs_check_can_nocow(BTRFS_I(inode), block_start, > + &write_bytes) > 0) { > + /* For nocow case, no need to reserve data space. */ > + only_release_metadata = true; > + } else { > + goto out; > + } > + } > + ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize); > + if (ret < 0) { > + if (!only_release_metadata) > + btrfs_free_reserved_data_space(inode, data_reserved, > + block_start, blocksize); > goto out; > - > + } > again: > page = find_or_create_page(mapping, index, mask); > if (!page) { > @@ -5063,10 +5080,20 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, > set_page_dirty(page); > unlock_extent_cached(io_tree, block_start, block_end, &cached_state); > > + if (only_release_metadata) > + set_extent_bit(&BTRFS_I(inode)->io_tree, block_start, > + block_end, EXTENT_NORESERVE, NULL, NULL, > + GFP_NOFS); > + > out_unlock: > - if (ret) > - btrfs_delalloc_release_space(inode, data_reserved, block_start, > - blocksize, true); > + if (ret) { > + if (!only_release_metadata) > + btrfs_delalloc_release_space(inode, data_reserved, > + block_start, blocksize, true); > + else > + btrfs_delalloc_release_metadata(BTRFS_I(inode), > + blocksize, true); I usually find it more intuitive to have it the other way around: if (only_release_metadata) ... else ... E.g., positive case first, negative in the else branch. But that's likely too much of a personal preference. Thanks. > + } > btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize); > unlock_page(page); > put_page(page); > -- > 2.25.0 > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”
