Re: [PATCH v4 2/2] btrfs: extent-tree: Ensure we trim ranges across block group boundary

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 25, 2019 at 04:59:56PM +0800, Qu Wenruo wrote:
> [BUG]
> When deleting large files (which cross block group boundary) with discard
> mount option, we find some btrfs_discard_extent() calls only trimmed part
> of its space, not the whole range:
> 
>   btrfs_discard_extent: type=0x1 start=19626196992 len=2144530432 trimmed=1073741824 ratio=50%
> 
> type:		bbio->map_type, in above case, it's SINGLE DATA.
> start:		Logical address of this trim
> len:		Logical length of this trim
> trimmed:	Physically trimmed bytes
> ratio:		trimmed / len
> 
> Thus leading some unused space not discarded.
> 
> [CAUSE]
> When discard mount option is specified, after a transaction is fully
> committed (super block written to disk), we begin to cleanup pinned
> extents in the following call chain:
> 
> btrfs_commit_transaction()
> |- btrfs_finish_extent_commit()
>    |- find_first_extent_bit(unpin, 0, &start, &end, EXTENT_DIRTY);
>    |- btrfs_discard_extent()
> 
> However pinned extents are recorded in an extent_io_tree, which can
> merge adjacent extent states.
> 
> When a large file get deleted and it has adjacent file extents across
> block group boundary, we will get a large merged range, like this:
> 
>       |<---    BG1    --->|<---      BG2     --->|
>       |//////|<--   Range to discard   --->|/////|
> 
> To discard that range, we have the following calls:
> btrfs_discard_extent()
> |- btrfs_map_block()
> |  Returned bbio will end at BG1's end. As btrfs_map_block()
> |  never returns result across block group boundary.
> |- btrfs_issuse_discard()
>    Issue discard for each stripe.
> 
> So we will only discard the range in BG1, not the remaining part in BG2.
> 
> Furthermore, this bug is not that reliably observed, for above case, if
> there is no other extent in BG2, BG2 will be empty and btrfs will trim
> all space of BG2, covering up the bug.
> 
> [FIX]
> - Allow __btrfs_map_block_for_discard() to modify @length parameter
>   btrfs_map_block() uses its @length paramter to notify the caller how
>   many bytes are mapped in current call.
>   With __btrfs_map_block_for_discard() also modifing the @length,
>   btrfs_discard_extent() now understands when to do extra trim.
> 
> - Call btrfs_map_block() in a loop until we hit the range end
>   Since we now know how many bytes are mapped each time, we can iterate
>   through each block group boundary and issue correct trim for each
>   range.
> 
> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> ---
>  fs/btrfs/extent-tree.c | 41 +++++++++++++++++++++++++++++++----------
>  fs/btrfs/volumes.c     |  6 ++++--
>  2 files changed, 35 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 49cb26fa7c63..ff2838bd677d 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -1306,8 +1306,10 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
>  int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
>  			 u64 num_bytes, u64 *actual_bytes)
>  {
> -	int ret;
> +	int ret = 0;
>  	u64 discarded_bytes = 0;
> +	u64 end = bytenr + num_bytes;
> +	u64 cur = bytenr;
>  	struct btrfs_bio *bbio = NULL;
>  
>  
> @@ -1316,15 +1318,23 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
>  	 * associated to its stripes that don't go away while we are discarding.
>  	 */
>  	btrfs_bio_counter_inc_blocked(fs_info);
> -	/* Tell the block device(s) that the sectors can be discarded */
> -	ret = btrfs_map_block(fs_info, BTRFS_MAP_DISCARD, bytenr, &num_bytes,
> -			      &bbio, 0);
> -	/* Error condition is -ENOMEM */
> -	if (!ret) {
> -		struct btrfs_bio_stripe *stripe = bbio->stripes;
> +	while (cur < end) {
> +		struct btrfs_bio_stripe *stripe;
>  		int i;
>  
> +		num_bytes = end - cur;
> +		/* Tell the block device(s) that the sectors can be discarded */
> +		ret = btrfs_map_block(fs_info, BTRFS_MAP_DISCARD, cur,
> +				      &num_bytes, &bbio, 0);
> +		/*
> +		 * Error can be -ENOMEM, -ENOENT (no such chunk mapping) or
> +		 * -EOPNOTSUPP. For any such error, @num_bytes is not updated,
> +		 * thus we can't continue anyway.
> +		 */
> +		if (ret < 0)
> +			goto out;
>  
> +		stripe = bbio->stripes;
>  		for (i = 0; i < bbio->num_stripes; i++, stripe++) {
>  			u64 bytes;
>  			struct request_queue *req_q;
> @@ -1341,10 +1351,19 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
>  						  stripe->physical,
>  						  stripe->length,
>  						  &bytes);
> -			if (!ret)
> +			if (!ret) {
>  				discarded_bytes += bytes;
> -			else if (ret != -EOPNOTSUPP)
> -				break; /* Logic errors or -ENOMEM, or -EIO but I don't know how that could happen JDM */
> +			} else if (ret != -EOPNOTSUPP) {
> +				/*
> +				 * Logic errors or -ENOMEM, or -EIO but I don't
> +				 * know how that could happen JDM
> +				 *
> +				 * Ans since there are two loops, explicitly

Hate for there to be a v5 at this point, but it should be "and".  Anyway

Reviewed-by: Josef Bacik <josef@xxxxxxxxxxxxxx>

Thanks,

Josef



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux