Re: [PATCH 03/19] btrfs: keep track of which extents have been discarded

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 07, 2019 at 06:38:10PM -0400, Dennis Zhou wrote:
> On Mon, Oct 07, 2019 at 04:37:28PM -0400, Josef Bacik wrote:
> > On Mon, Oct 07, 2019 at 04:17:34PM -0400, Dennis Zhou wrote:
> > > Async discard will use the free space cache as backing knowledge for
> > > which extents to discard. This patch plumbs knowledge about which
> > > extents need to be discarded into the free space cache from
> > > unpin_extent_range().
> > > 
> > > An untrimmed extent can merge with everything as this is a new region.
> > > Absorbing trimmed extents is a tradeoff to for greater coalescing which
> > > makes life better for find_free_extent(). Additionally, it seems the
> > > size of a trim isn't as problematic as the trim io itself.
> > > 
> > > When reading in the free space cache from disk, if sync is set, mark all
> > > extents as trimmed. The current code ensures at transaction commit that
> > > all free space is trimmed when sync is set, so this reflects that.
> > > 
> > > Signed-off-by: Dennis Zhou <dennis@xxxxxxxxxx>
> > > ---
> > >  fs/btrfs/extent-tree.c      | 15 ++++++++++-----
> > >  fs/btrfs/free-space-cache.c | 38 ++++++++++++++++++++++++++++++-------
> > >  fs/btrfs/free-space-cache.h | 10 +++++++++-
> > >  fs/btrfs/inode-map.c        | 13 +++++++------
> > >  4 files changed, 57 insertions(+), 19 deletions(-)
> > > 
> > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > > index 77a5904756c5..b9e3bedad878 100644
> > > --- a/fs/btrfs/extent-tree.c
> > > +++ b/fs/btrfs/extent-tree.c
> > > @@ -2782,7 +2782,7 @@ fetch_cluster_info(struct btrfs_fs_info *fs_info,
> > >  }
> > >  
> > >  static int unpin_extent_range(struct btrfs_fs_info *fs_info,
> > > -			      u64 start, u64 end,
> > > +			      u64 start, u64 end, u32 fsc_flags,
> > >  			      const bool return_free_space)
> > >  {
> > >  	struct btrfs_block_group_cache *cache = NULL;
> > > @@ -2816,7 +2816,9 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info,
> > >  		if (start < cache->last_byte_to_unpin) {
> > >  			len = min(len, cache->last_byte_to_unpin - start);
> > >  			if (return_free_space)
> > > -				btrfs_add_free_space(cache, start, len);
> > > +				__btrfs_add_free_space(fs_info,
> > > +						       cache->free_space_ctl,
> > > +						       start, len, fsc_flags);
> > >  		}
> > >  
> > >  		start += len;
> > > @@ -2894,6 +2896,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
> > >  
> > >  	while (!trans->aborted) {
> > >  		struct extent_state *cached_state = NULL;
> > > +		u32 fsc_flags = 0;
> > >  
> > >  		mutex_lock(&fs_info->unused_bg_unpin_mutex);
> > >  		ret = find_first_extent_bit(unpin, 0, &start, &end,
> > > @@ -2903,12 +2906,14 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
> > >  			break;
> > >  		}
> > >  
> > > -		if (btrfs_test_opt(fs_info, DISCARD_SYNC))
> > > +		if (btrfs_test_opt(fs_info, DISCARD_SYNC)) {
> > >  			ret = btrfs_discard_extent(fs_info, start,
> > >  						   end + 1 - start, NULL);
> > > +			fsc_flags |= BTRFS_FSC_TRIMMED;
> > > +		}
> > >  
> > >  		clear_extent_dirty(unpin, start, end, &cached_state);
> > > -		unpin_extent_range(fs_info, start, end, true);
> > > +		unpin_extent_range(fs_info, start, end, fsc_flags, true);
> > >  		mutex_unlock(&fs_info->unused_bg_unpin_mutex);
> > >  		free_extent_state(cached_state);
> > >  		cond_resched();
> > > @@ -5512,7 +5517,7 @@ u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo)
> > >  int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info,
> > >  				   u64 start, u64 end)
> > >  {
> > > -	return unpin_extent_range(fs_info, start, end, false);
> > > +	return unpin_extent_range(fs_info, start, end, 0, false);
> > >  }
> > >  
> > >  /*
> > > diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> > > index d54dcd0ab230..f119895292b8 100644
> > > --- a/fs/btrfs/free-space-cache.c
> > > +++ b/fs/btrfs/free-space-cache.c
> > > @@ -747,6 +747,14 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode,
> > >  			goto free_cache;
> > >  		}
> > >  
> > > +		/*
> > > +		 * Sync discard ensures that the free space cache is always
> > > +		 * trimmed.  So when reading this in, the state should reflect
> > > +		 * that.
> > > +		 */
> > > +		if (btrfs_test_opt(fs_info, DISCARD_SYNC))
> > > +			e->flags |= BTRFS_FSC_TRIMMED;
> > > +
> > >  		if (!e->bytes) {
> > >  			kmem_cache_free(btrfs_free_space_cachep, e);
> > >  			goto free_cache;
> > > @@ -2165,6 +2173,7 @@ static bool try_merge_free_space(struct btrfs_free_space_ctl *ctl,
> > >  	bool merged = false;
> > >  	u64 offset = info->offset;
> > >  	u64 bytes = info->bytes;
> > > +	bool is_trimmed = btrfs_free_space_trimmed(info);
> > >  
> > >  	/*
> > >  	 * first we want to see if there is free space adjacent to the range we
> > > @@ -2178,7 +2187,8 @@ static bool try_merge_free_space(struct btrfs_free_space_ctl *ctl,
> > >  	else
> > >  		left_info = tree_search_offset(ctl, offset - 1, 0, 0);
> > >  
> > > -	if (right_info && !right_info->bitmap) {
> > > +	if (right_info && !right_info->bitmap &&
> > > +	    (!is_trimmed || btrfs_free_space_trimmed(right_info))) {
> > >  		if (update_stat)
> > >  			unlink_free_space(ctl, right_info);
> > >  		else
> > > @@ -2189,7 +2199,8 @@ static bool try_merge_free_space(struct btrfs_free_space_ctl *ctl,
> > >  	}
> > >  
> > >  	if (left_info && !left_info->bitmap &&
> > > -	    left_info->offset + left_info->bytes == offset) {
> > > +	    left_info->offset + left_info->bytes == offset &&
> > > +	    (!is_trimmed || btrfs_free_space_trimmed(left_info))) {
> > 
> > So we allow merging if we haven't trimmed this entry, or if the adjacent entry
> > is already trimmed?  This means we'll merge if we trimmed the new entry
> > regardless of the adjacent entries status, or if the new entry is drity.  Why is
> > that?  Thanks,
> > 
> 
> This is the tradeoff I called out above here:
> 
> > > Absorbing trimmed extents is a tradeoff to for greater coalescing which
> > > makes life better for find_free_extent(). Additionally, it seems the
> > > size of a trim isn't as problematic as the trim io itself.
> 
> A problematic example case:
> 
> |----trimmed----|/////X/////|-----trimmed-----|
> 
> If region X gets freed and returned to the free space cache, we end up
> with the following:
> 
> |----trimmed----|-untrimmed-|-----trimmed-----|
> 
> This isn't great because now we need to teach find_free_extent() to span
> multiple btrfs_free_space entries, something I didn't want to do. So the
> other option is to overtrim trading for a simpler find_free_extent().
> Then the above becomes:
> 
> |-------------------trimmed-------------------|
> 
> It makes the assumption that if we're inserting, it's generally is free
> space being returned rather than we needed to slice out from the middle
> of a block. It does still have degenerative cases, but it's better than
> the above. The merging also allows for stuff to come out of bitmaps more
> proactively too.
> 
> Also from what it seems, the cost of a discard operation is quite costly
> relative to the amount your discarding (1 larger discard is better than
> several smaller discards) as it will clog up the device too.


OOOOOh I fucking get it now.  That's going to need a comment, because it's not
obvious at all.

However I still wonder if this is right.  Your above examples are legitimate,
but say you have

| 512mib adding back that isn't trimmed |------- 512mib trimmed ------|

we'll merge these two, but really we should probably trim that 512mib chunk
we're adding right?  Thanks,

Josef



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux