On Wed, Oct 23, 2019 at 06:52:57PM -0400, Dennis Zhou wrote:
> Async discard will use the free space cache as backing knowledge for
> which extents to discard. This patch plumbs knowledge about which
> extents need to be discarded into the free space cache from
> unpin_extent_range().
>
> An untrimmed extent can merge with everything as this is a new region.
> Absorbing trimmed extents is a tradeoff to for greater coalescing which
> makes life better for find_free_extent(). Additionally, it seems the
> size of a trim isn't as problematic as the trim io itself.
>
> When reading in the free space cache from disk, if sync is set, mark all
> extents as trimmed. The current code ensures at transaction commit that
> all free space is trimmed when sync is set, so this reflects that.
>
> Signed-off-by: Dennis Zhou <dennis@xxxxxxxxxx>
> ---
> fs/btrfs/extent-tree.c | 15 +++++++---
> fs/btrfs/free-space-cache.c | 60 ++++++++++++++++++++++++++++++++-----
> fs/btrfs/free-space-cache.h | 17 ++++++++++-
> fs/btrfs/inode-map.c | 13 ++++----
> 4 files changed, 87 insertions(+), 18 deletions(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 77a5904756c5..6a40bba3cb19 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2783,6 +2783,7 @@ fetch_cluster_info(struct btrfs_fs_info *fs_info,
>
> static int unpin_extent_range(struct btrfs_fs_info *fs_info,
> u64 start, u64 end,
> + enum btrfs_trim_state trim_state,
> const bool return_free_space)
> {
> struct btrfs_block_group_cache *cache = NULL;
> @@ -2816,7 +2817,9 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info,
> if (start < cache->last_byte_to_unpin) {
> len = min(len, cache->last_byte_to_unpin - start);
> if (return_free_space)
> - btrfs_add_free_space(cache, start, len);
> + __btrfs_add_free_space(fs_info,
> + cache->free_space_ctl,
> + start, len, trim_state);
> }
>
> start += len;
> @@ -2894,6 +2897,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
>
> while (!trans->aborted) {
> struct extent_state *cached_state = NULL;
> + enum btrfs_trim_state trim_state = BTRFS_TRIM_STATE_UNTRIMMED;
>
> mutex_lock(&fs_info->unused_bg_unpin_mutex);
> ret = find_first_extent_bit(unpin, 0, &start, &end,
> @@ -2903,12 +2907,14 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
> break;
> }
>
> - if (btrfs_test_opt(fs_info, DISCARD_SYNC))
> + if (btrfs_test_opt(fs_info, DISCARD_SYNC)) {
> ret = btrfs_discard_extent(fs_info, start,
> end + 1 - start, NULL);
> + trim_state = BTRFS_TRIM_STATE_TRIMMED;
> + }
>
> clear_extent_dirty(unpin, start, end, &cached_state);
> - unpin_extent_range(fs_info, start, end, true);
> + unpin_extent_range(fs_info, start, end, trim_state, true);
> mutex_unlock(&fs_info->unused_bg_unpin_mutex);
> free_extent_state(cached_state);
> cond_resched();
> @@ -5512,7 +5518,8 @@ u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo)
> int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info,
> u64 start, u64 end)
> {
> - return unpin_extent_range(fs_info, start, end, false);
> + return unpin_extent_range(fs_info, start, end,
> + BTRFS_TRIM_STATE_UNTRIMMED, false);
> }
>
> /*
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index d54dcd0ab230..d7f0cb961496 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -747,6 +747,14 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode,
> goto free_cache;
> }
>
> + /*
> + * Sync discard ensures that the free space cache is always
> + * trimmed. So when reading this in, the state should reflect
> + * that.
> + */
> + if (btrfs_test_opt(fs_info, DISCARD_SYNC))
> + e->trim_state = BTRFS_TRIM_STATE_TRIMMED;
> +
BTRFS_TRIM_STATE_TRIMMED == 0, so if we don't have DISCARD_SYNC set, we'll still
end up with TRIMMED. Which means we'll end up with weirdness later because
unpinned extents will come back as UNTRIMMED in the !DISCARD_SYNC case. I'm
thinking about spinning rust here where we'll never have discard, so we should
either always treat the free space as trimmed, or always treat it as untrimmed,
so we don't end up with different allocator behavior for the undiscardable
devices. Thanks,
Josef