On 5/4/18 1:56 AM, Qu Wenruo wrote:
> Under the following case, qgroup rescan can double account cowed tree
> blocks:
>
> In this case, extent tree only has one tree block.
>
> -
> | transid=5 last committed=4
> | btrfs_qgroup_rescan_worker()
> | |- btrfs_start_transaction()
> | | transid = 5
> | |- qgroup_rescan_leaf()
> | |- btrfs_search_slot_for_read() on extent tree
> | Get the only extent tree block from commit root (transid = 4).
> | Scan it, set qgroup_rescan_progress to the last
> | EXTENT/META_ITEM + 1
> | now qgroup_rescan_progress = A + 1.
> |
> | fs tree get CoWed, new tree block is at A + 16K
> | transid 5 get committed
> -
> | transid=6 last committed=5
> | btrfs_qgroup_rescan_worker()
> | btrfs_qgroup_rescan_worker()
> | |- btrfs_start_transaction()
> | | transid = 5
> | |- qgroup_rescan_leaf()
> | |- btrfs_search_slot_for_read() on extent tree
> | Get the only extent tree block from commit root (transid = 5).
> | scan it using qgroup_rescan_progress (A + 1).
> | found new tree block beyong A, and it's fs tree block,
> | account it to increase qgroup numbers.
> -
>
> In above case, tree block A, and tree block A + 16K get accounted twice,
> while qgroup rescan should stop when it already reach the last leaf,
> other than continue using its qgroup_rescan_progress.
>
> Such case could happen by just looping btrfs/017 and with some
> possibility it can hit such double qgroup accounting problem.
>
> Fix it by checking the path to determine if we should finish qgroup
> rescan, other than relying on next loop to exit.
>
> Reported-by: Nikolay Borisov <nborisov@xxxxxxxx>
> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> ---
> fs/btrfs/qgroup.c | 48 +++++++++++++++++++++++++++++++++--------------
> 1 file changed, 34 insertions(+), 14 deletions(-)
>
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 829e8fe5c97e..2ee2d21d43ab 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2579,6 +2579,21 @@ void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
> spin_unlock(&fs_info->qgroup_lock);
> }
>
> +/*
> + * Check if the leaf is the last leaf. Which means all node pointers
> + * are at their last position.
> + */
> +static bool is_last_leaf(struct btrfs_path *path)
> +{
> + int i;
> +
> + for (i = 1; i < BTRFS_MAX_LEVEL && path->nodes[i]; i++) {
> + if (path->slots[i] != btrfs_header_nritems(path->nodes[i]) - 1)
> + return false;
> + }
> + return true;
> +}
> +
> /*
> * returns < 0 on error, 0 when more leafs are to be scanned.
> * returns 1 when done.
> @@ -2592,6 +2607,7 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path,
> struct ulist *roots = NULL;
> struct seq_list tree_mod_seq_elem = SEQ_LIST_INIT(tree_mod_seq_elem);
> u64 num_bytes;
> + bool done;
> int slot;
> int ret;
>
> @@ -2606,20 +2622,9 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path,
> fs_info->qgroup_rescan_progress.type,
> fs_info->qgroup_rescan_progress.offset, ret);
>
> - if (ret) {
> - /*
> - * The rescan is about to end, we will not be scanning any
> - * further blocks. We cannot unset the RESCAN flag here, because
> - * we want to commit the transaction if everything went well.
> - * To make the live accounting work in this phase, we set our
> - * scan progress pointer such that every real extent objectid
> - * will be smaller.
> - */
> - fs_info->qgroup_rescan_progress.objectid = (u64)-1;
> - btrfs_release_path(path);
> - mutex_unlock(&fs_info->qgroup_rescan_lock);
> - return ret;
> - }
> + done = is_last_leaf(path);
> + if (ret)
> + goto finish;
>
> btrfs_item_key_to_cpu(path->nodes[0], &found,
> btrfs_header_nritems(path->nodes[0]) - 1);
> @@ -2665,8 +2670,23 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path,
> free_extent_buffer(scratch_leaf);
> }
> btrfs_put_tree_mod_seq(fs_info, &tree_mod_seq_elem);
> + if (done && !ret)
> + goto finish;
This causes a double unlock. The lock was released prior to iterating
the leaf. Otherwise, looks good.
-Jeff
>
> return ret;
> +finish:
> + /*
> + * The rescan is about to end, we will not be scanning any
> + * further blocks. We cannot unset the RESCAN flag here, because
> + * we want to commit the transaction if everything went well.
> + * To make the live accounting work in this phase, we set our
> + * scan progress pointer such that every real extent objectid
> + * will be smaller.
> + */
> + fs_info->qgroup_rescan_progress.objectid = (u64)-1;
> + btrfs_release_path(path);
> + mutex_unlock(&fs_info->qgroup_rescan_lock);
> + return 1;
> }
>
> static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
>
--
Jeff Mahoney
SUSE Labs
Attachment:
signature.asc
Description: OpenPGP digital signature
