On 2020/6/16 上午1:49, fdmanana@xxxxxxxxxx wrote:
> From: Filipe Manana <fdmanana@xxxxxxxx>
>
> A RWF_NOWAIT write is not supposed to wait on filesystem locks that can be
> held for a long time or for ongoing IO to complete.
>
> However when calling check_can_nocow(), if the inode has prealloc extents
> or has the NOCOW flag set, we can block on extent (file range) locks
> through the call to btrfs_lock_and_flush_ordered_range(). Such lock can
> take a significant amount of time to be available. For example, a fiemap
> task may be running, and iterating through the entire file range checking
> all extents and doing backref walking to determine if they are shared,
> or a readpage operation may be in progress.
>
> Also at btrfs_lock_and_flush_ordered_range(), called by check_can_nocow(),
> after locking the file range we wait for any existing ordered extent that
> is in progress to complete. Another operation that can take a significant
> amount of time and defeat the purpose of RWF_NOWAIT.
>
> So fix this by trying to lock the file range and if it's currently locked
> return -EAGAIN to user space. If we are able to lock the file range without
> waiting and there is an ordered extent in the range, return -EAGAIN as
> well, instead of waiting for it to complete. Finally, don't bother trying
> to lock the snapshot lock of the root when attempting a RWF_NOWAIT write,
> as that is only important for buffered writes.
>
> Fixes: edf064e7c6fec3 ("btrfs: nowait aio support")
> Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
> ---
> fs/btrfs/file.c | 37 ++++++++++++++++++++++++++-----------
> 1 file changed, 26 insertions(+), 11 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 78481d1e5e6e..e5da2508f002 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1533,7 +1533,7 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
> }
>
> static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
> - size_t *write_bytes)
> + size_t *write_bytes, bool nowait)
> {
> struct btrfs_fs_info *fs_info = inode->root->fs_info;
> struct btrfs_root *root = inode->root;
> @@ -1541,27 +1541,43 @@ static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
> u64 num_bytes;
> int ret;
>
> - if (!btrfs_drew_try_write_lock(&root->snapshot_lock))
> + if (!nowait && !btrfs_drew_try_write_lock(&root->snapshot_lock))
Did I read it incorrectly or something is not correct here?
This means if nowait == true, we won't try to take the snapshot_lock at
all, and continue.
While if nowait == false (which means we need to wait), and we can't
grab the lock, we return -EAGAIN.
This doesn't look correct to me.
To me, that @nowait shouldn't affect the btrfs_drew_try_write_lock()
call anyway, since that call is won't sleep.
Thanks,
Qu
> return -EAGAIN;
>
> lockstart = round_down(pos, fs_info->sectorsize);
> lockend = round_up(pos + *write_bytes,
> fs_info->sectorsize) - 1;
> + num_bytes = lockend - lockstart + 1;
>
> - btrfs_lock_and_flush_ordered_range(inode, lockstart,
> - lockend, NULL);
> + if (nowait) {
> + struct btrfs_ordered_extent *ordered;
> +
> + if (!try_lock_extent(&inode->io_tree, lockstart, lockend))
> + return -EAGAIN;
> +
> + ordered = btrfs_lookup_ordered_range(inode, lockstart,
> + num_bytes);
> + if (ordered) {
> + btrfs_put_ordered_extent(ordered);
> + ret = -EAGAIN;
> + goto out_unlock;
> + }
> + } else {
> + btrfs_lock_and_flush_ordered_range(inode, lockstart,
> + lockend, NULL);
> + }
>
> - num_bytes = lockend - lockstart + 1;
> ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes,
> NULL, NULL, NULL);
> if (ret <= 0) {
> ret = 0;
> - btrfs_drew_write_unlock(&root->snapshot_lock);
> + if (!nowait)
> + btrfs_drew_write_unlock(&root->snapshot_lock);
> } else {
> *write_bytes = min_t(size_t, *write_bytes ,
> num_bytes - pos + lockstart);
> }
> -
> +out_unlock:
> unlock_extent(&inode->io_tree, lockstart, lockend);
>
> return ret;
> @@ -1633,7 +1649,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
> if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> BTRFS_INODE_PREALLOC)) &&
> check_can_nocow(BTRFS_I(inode), pos,
> - &write_bytes) > 0) {
> + &write_bytes, false) > 0) {
> /*
> * For nodata cow case, no need to reserve
> * data space.
> @@ -1911,12 +1927,11 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
> */
> if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> BTRFS_INODE_PREALLOC)) ||
> - check_can_nocow(BTRFS_I(inode), pos, &nocow_bytes) <= 0) {
> + check_can_nocow(BTRFS_I(inode), pos, &nocow_bytes,
> + true) <= 0) {
> inode_unlock(inode);
> return -EAGAIN;
> }
> - /* check_can_nocow() locks the snapshot lock on success */
> - btrfs_drew_write_unlock(&root->snapshot_lock);
> /*
> * There are holes in the range or parts of the range that must
> * be COWed (shared extents, RO block groups, etc), so just bail
>
Attachment:
signature.asc
Description: OpenPGP digital signature
