Re: [RFC][PATCH] Btrfs: fix deadlock due to unsubmitted

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 07, 2013 at 03:12:07AM -0700, Miao Xie wrote:
> The deadlock problem happened when running fsstress(a test program in LTP).
> 
> Steps to reproduce:
>  # mkfs.btrfs -b 100M <partition>
>  # mount <partition> <mnt>
>  # <Path>/fsstress -p 3 -n 10000000 -d <mnt>
> 
> The reason is:
> btrfs_direct_IO()
>  |->do_direct_IO()
>      |->get_page()
>      |->get_blocks()
>      |	 |->btrfs_delalloc_resereve_space()
>      |	 |->btrfs_add_ordered_extent() -------	Add a new ordered extent
>      |->dio_send_cur_page(page0) --------------	We didn't submit bio here
>      |->get_page()
>      |->get_blocks()
> 	 |->btrfs_delalloc_resereve_space()
> 	     |->flush_space()
> 		 |->btrfs_start_ordered_extent()
> 		     |->wait_event() ----------	Wait the completion of
> 						the ordered extent that is
> 						mentioned above
> 
> But because we didn't submit the bio that is mentioned above, the ordered
> extent can not complete, we would wait for its completion forever.
> 
> There are two methods which can fix this deadlock problem:
> 1. submit the bio before we invoke get_blocks()
> 2. reserve the space before we do dio
> 
> Though the 1st is the simplest way, we need modify the code of VFS, and it
> is likely to break contiguous requests, and introduce performance regression
> for the other filesystems.
> 
> So we have to choose the 2nd way.

I ran into this a few weeks ago and as Chris said I opted for option 4, just
do all the direct io stuff ourselves and ditch the generic stuff.  This approach
works for now though so I'm good with it.

> 
> Signed-off-by: Miao Xie <miaox@xxxxxxxxxxxxxx>
> Cc: Josef Bacik <jbacik@xxxxxxxxxxxx>
> ---
>  fs/btrfs/extent-tree.c |    3 +-
>  fs/btrfs/inode.c       |   81 ++++++++++++++++++++++++-----------------------
>  2 files changed, 43 insertions(+), 41 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 85b8454..ca9afc4 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4670,7 +4670,8 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
>  	spin_lock(&BTRFS_I(inode)->lock);
>  	dropped = drop_outstanding_extent(inode);
>  
> -	to_free = calc_csum_metadata_size(inode, num_bytes, 0);
> +	if (num_bytes)
> +		to_free = calc_csum_metadata_size(inode, num_bytes, 0);
>  	spin_unlock(&BTRFS_I(inode)->lock);
>  	if (dropped > 0)
>  		to_free += btrfs_calc_trans_metadata_size(root, dropped);
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index ca7ace7..c5d829d 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6004,16 +6004,15 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
>  	u64 len = bh_result->b_size;
>  	struct btrfs_trans_handle *trans;
>  	int unlock_bits = EXTENT_LOCKED;
> -	int ret;
> +	int ret = 0;
>  
>  	if (create) {
> -		ret = btrfs_delalloc_reserve_space(inode, len);
> -		if (ret)
> -			return ret;
> +		spin_lock(&BTRFS_I(inode)->lock);
> +		BTRFS_I(inode)->outstanding_extents++;
> +		spin_unlock(&BTRFS_I(inode)->lock);
>  		unlock_bits |= EXTENT_DELALLOC | EXTENT_DIRTY;
> -	} else {
> +	} else
>  		len = min_t(u64, len, root->sectorsize);
> -	}
>  
>  	lockstart = start;
>  	lockend = start + len - 1;
> @@ -6025,14 +6024,6 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
>  	if (lock_extent_direct(inode, lockstart, lockend, &cached_state, create))
>  		return -ENOTBLK;
>  
> -	if (create) {
> -		ret = set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart,
> -				     lockend, EXTENT_DELALLOC, NULL,
> -				     &cached_state, GFP_NOFS);
> -		if (ret)
> -			goto unlock_err;
> -	}
> -
>  	em = btrfs_get_extent(inode, NULL, 0, start, len, 0);
>  	if (IS_ERR(em)) {
>  		ret = PTR_ERR(em);
> @@ -6064,7 +6055,6 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
>  	if (!create && (em->block_start == EXTENT_MAP_HOLE ||
>  			test_bit(EXTENT_FLAG_PREALLOC, &em->flags))) {
>  		free_extent_map(em);
> -		ret = 0;

This doesn't look right, this should be left here.

>  		goto unlock_err;
>  	}
>  
> @@ -6162,6 +6152,11 @@ unlock:
>  		 */
>  		if (start + len > i_size_read(inode))
>  			i_size_write(inode, start + len);
> +
> +		ret = set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart,
> +				     lockstart + len - 1, EXTENT_DELALLOC, NULL,
> +				     &cached_state, GFP_NOFS);
> +		BUG_ON(ret);

Return the error if there is one, there's no reason to add new BUG_ON()'s.
Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux