Hi Filipe,
Can you please explain more what is the scenario you are worried about.
Let's say we have two FS trees (subvolumes) subv1 and subv2, subv2
being a RO snapshot of subv1. And they have a shared subtree at
logical==X. Now we change subv1, so its subtree is COW'ed and some
other logical address (Y) is being allocated for subtree root. But X
still cannot be reused as long as subv2 exists. That's the essence of
the extent tree providing refcount for each tree/data block in the FS,
no?
Now finally we delete subv2 and block X is freed. So it can be
reallocated as a root of another subtree. And then it might be
snapshotted again and shared as before.
So where do you see a problem?
If we have two FS-tree subtrees starting at the same logical=X, how
can they be different? This means we allocated logical=X again, while
it was still in use, which is very very bad.
Am I missing something here?
Thanks,
Alex.
P.S.: by "logical" I (and hopefully you) mean the extent-tree level
addresses, i.e., if we have a tree block with logical=X, then we also
have an EXTENT_ITEM with key (X, EXTENT_ITEM, nodesize/leafsize).
On Fri, Feb 21, 2014 at 12:15 AM, Filipe David Borba Manana
<fdmanana@xxxxxxxxx> wrote:
> Just comparing the pointers (logical disk addresses) of the btree nodes is
> not completely bullet proof, we have to check if their generation numbers
> match too.
>
> It is guaranteed that a COW operation will result in a block with a different
> logical disk address than the original block's address, but over time we can
> reuse that former logical disk address.
>
> For example, creating a 2Gb filesystem on a loop device, and having a script
> running in a loop always updating the access timestamp of a file, resulted in
> the same logical disk address being reused for the same fs btree block in about
> only 4 minutes.
>
> This could make us skip entire subtrees when doing an incremental send (which
> is currently the only user of btrfs_compare_trees). However the odds of getting
> 2 blocks at the same tree level, with the same logical disk address, equal first
> slot keys and different generations, should hopefully be very low.
>
> Signed-off-by: Filipe David Borba Manana <fdmanana@xxxxxxxxx>
> ---
> fs/btrfs/ctree.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index cbd3a7d..88d1b1e 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -5376,6 +5376,8 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
> int advance_right;
> u64 left_blockptr;
> u64 right_blockptr;
> + u64 left_gen;
> + u64 right_gen;
> u64 left_start_ctransid;
> u64 right_start_ctransid;
> u64 ctransid;
> @@ -5640,7 +5642,14 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
> right_blockptr = btrfs_node_blockptr(
> right_path->nodes[right_level],
> right_path->slots[right_level]);
> - if (left_blockptr == right_blockptr) {
> + left_gen = btrfs_node_ptr_generation(
> + left_path->nodes[left_level],
> + left_path->slots[left_level]);
> + right_gen = btrfs_node_ptr_generation(
> + right_path->nodes[right_level],
> + right_path->slots[right_level]);
> + if (left_blockptr == right_blockptr &&
> + left_gen == right_gen) {
> /*
> * As we're on a shared block, don't
> * allow to go deeper.
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html