On Thu, Feb 07, 2013 at 03:12:07AM -0700, Miao Xie wrote:
> The deadlock problem happened when running fsstress(a test program in LTP).
>
> Steps to reproduce:
> # mkfs.btrfs -b 100M <partition>
> # mount <partition> <mnt>
> # <Path>/fsstress -p 3 -n 10000000 -d <mnt>
>
> The reason is:
> btrfs_direct_IO()
> |->do_direct_IO()
> |->get_page()
> |->get_blocks()
> | |->btrfs_delalloc_resereve_space()
> | |->btrfs_add_ordered_extent() ------- Add a new ordered extent
> |->dio_send_cur_page(page0) -------------- We didn't submit bio here
> |->get_page()
> |->get_blocks()
> |->btrfs_delalloc_resereve_space()
> |->flush_space()
> |->btrfs_start_ordered_extent()
> |->wait_event() ---------- Wait the completion of
> the ordered extent that is
> mentioned above
>
> But because we didn't submit the bio that is mentioned above, the ordered
> extent can not complete, we would wait for its completion forever.
>
> There are two methods which can fix this deadlock problem:
> 1. submit the bio before we invoke get_blocks()
> 2. reserve the space before we do dio
>
> Though the 1st is the simplest way, we need modify the code of VFS, and it
> is likely to break contiguous requests, and introduce performance regression
> for the other filesystems.
>
> So we have to choose the 2nd way.
I ran into this a few weeks ago and as Chris said I opted for option 4, just
do all the direct io stuff ourselves and ditch the generic stuff. This approach
works for now though so I'm good with it.
>
> Signed-off-by: Miao Xie <miaox@xxxxxxxxxxxxxx>
> Cc: Josef Bacik <jbacik@xxxxxxxxxxxx>
> ---
> fs/btrfs/extent-tree.c | 3 +-
> fs/btrfs/inode.c | 81 ++++++++++++++++++++++++-----------------------
> 2 files changed, 43 insertions(+), 41 deletions(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 85b8454..ca9afc4 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4670,7 +4670,8 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
> spin_lock(&BTRFS_I(inode)->lock);
> dropped = drop_outstanding_extent(inode);
>
> - to_free = calc_csum_metadata_size(inode, num_bytes, 0);
> + if (num_bytes)
> + to_free = calc_csum_metadata_size(inode, num_bytes, 0);
> spin_unlock(&BTRFS_I(inode)->lock);
> if (dropped > 0)
> to_free += btrfs_calc_trans_metadata_size(root, dropped);
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index ca7ace7..c5d829d 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6004,16 +6004,15 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
> u64 len = bh_result->b_size;
> struct btrfs_trans_handle *trans;
> int unlock_bits = EXTENT_LOCKED;
> - int ret;
> + int ret = 0;
>
> if (create) {
> - ret = btrfs_delalloc_reserve_space(inode, len);
> - if (ret)
> - return ret;
> + spin_lock(&BTRFS_I(inode)->lock);
> + BTRFS_I(inode)->outstanding_extents++;
> + spin_unlock(&BTRFS_I(inode)->lock);
> unlock_bits |= EXTENT_DELALLOC | EXTENT_DIRTY;
> - } else {
> + } else
> len = min_t(u64, len, root->sectorsize);
> - }
>
> lockstart = start;
> lockend = start + len - 1;
> @@ -6025,14 +6024,6 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
> if (lock_extent_direct(inode, lockstart, lockend, &cached_state, create))
> return -ENOTBLK;
>
> - if (create) {
> - ret = set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart,
> - lockend, EXTENT_DELALLOC, NULL,
> - &cached_state, GFP_NOFS);
> - if (ret)
> - goto unlock_err;
> - }
> -
> em = btrfs_get_extent(inode, NULL, 0, start, len, 0);
> if (IS_ERR(em)) {
> ret = PTR_ERR(em);
> @@ -6064,7 +6055,6 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
> if (!create && (em->block_start == EXTENT_MAP_HOLE ||
> test_bit(EXTENT_FLAG_PREALLOC, &em->flags))) {
> free_extent_map(em);
> - ret = 0;
This doesn't look right, this should be left here.
> goto unlock_err;
> }
>
> @@ -6162,6 +6152,11 @@ unlock:
> */
> if (start + len > i_size_read(inode))
> i_size_write(inode, start + len);
> +
> + ret = set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart,
> + lockstart + len - 1, EXTENT_DELALLOC, NULL,
> + &cached_state, GFP_NOFS);
> + BUG_ON(ret);
Return the error if there is one, there's no reason to add new BUG_ON()'s.
Thanks,
Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html