On Fri, Oct 30, 2009 at 09:48:55AM +0800, Yan, Zheng wrote:
> On Fri, Oct 30, 2009 at 4:35 AM, Josef Bacik <josef@xxxxxxxxxx> wrote:
> > If there are alot of snapshots we could conceivably end up using much more
> > metadata space than what the worst case calculation would come up with, so
> > instead of trying to figure out how many extents will be modified by changing
> > items, just calculate it per tree, and then reserve space specifically for the
> > extent root. This patch does a max(size of extent root tree, 20% of free
> > metadata space) reservation for the extent root, and then tracks how much
> > space is used per transaction for the extent root. Then if we get close to
> > this amount in our transaction we will force the commit of the transaction to
> > free up space. This will help us better deal with the massive space
> > requirements that will come when users have alot of snapshots. This was
> > tested with my normal enospc tests. Thanks,
> >
> > Signed-off-by: Josef Bacik <josef@xxxxxxxxxx>
> > ---
> > fs/btrfs/ctree.h | 2 +
> > fs/btrfs/extent-tree.c | 115 +++++++++++++++++++++++++++++++++++++++++-------
> > fs/btrfs/transaction.c | 19 +++++++-
> > fs/btrfs/transaction.h | 1 +
> > 4 files changed, 119 insertions(+), 18 deletions(-)
> >
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index e5dd628..3e00df1 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -2049,6 +2049,8 @@ void btrfs_delalloc_reserve_space(struct btrfs_root *root, struct inode *inode,
> > u64 bytes);
> > void btrfs_delalloc_free_space(struct btrfs_root *root, struct inode *inode,
> > u64 bytes);
> > +int btrfs_check_reserved_extent_space(struct btrfs_trans_handle *trans,
> > + struct btrfs_root *root);
> > /* ctree.c */
> > int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key,
> > int level, int *slot);
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index c56f916..7a1f1a4 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -2768,6 +2768,12 @@ static u64 calculate_bytes_needed(struct btrfs_root *root, int num_items)
> > u64 num_bytes;
> > int level;
> >
> > + /*
> > + * The level is the number of node level's we could possibly cow. So
> > + * since we track root item reservations seperately we don't need that,
> > + * and we don't need to count the leaf level, so the max number of
> > + * levels that could have nodes is MAX_LEVEL - 2.
> > + */
> > level = BTRFS_MAX_LEVEL - 2;
> > /*
> > * NOTE: these calculations are absolutely the worst possible case.
> > @@ -2776,17 +2782,11 @@ static u64 calculate_bytes_needed(struct btrfs_root *root, int num_items)
> > */
> >
> > /*
> > - * for every item we insert we could insert both an extent item and a
> > - * extent ref item. Then for ever item we insert, we will need to cow
> > - * both the original leaf, plus the leaf to the left and right of it.
> > - *
> > - * Unless we are talking about the extent root, then we just want the
> > - * number of items * 2, since we just need the extent item plus its ref.
> > + * For each item we need we'll assume that we're going to add a leaf for
> > + * it, and then we may cow the leaves on either side to make room for
> > + * the items we're inserting, so multiply by 3.
> > */
> > - if (root == root->fs_info->extent_root)
> > - num_bytes = num_items * 2;
> > - else
> > - num_bytes = (num_items + (2 * num_items)) * 3;
> > + num_bytes = num_items * 3;
> >
> > /*
> > * num_bytes is total number of leaves we could need times the leaf
> > @@ -3031,6 +3031,64 @@ out:
> > }
> >
> > /*
> > + * This checks how many bytes we've allocated for the extent tree vs. how much
> > + * we reserved, and if we need to go ahead and force the transaction to commit.
> > + */
> > +int btrfs_check_reserved_extent_space(struct btrfs_trans_handle *trans,
> > + struct btrfs_root *root)
> > +{
> > + struct btrfs_fs_info *info = root->fs_info;
> > + struct btrfs_space_info *meta_sinfo;
> > + u64 used;
> > + u64 alloc_target;
> > +
> > + /* don't force a commit if this is an embedded transaction */
> > + if (current->journal_info != trans)
> > + return 0;
> > +
> > + /* get the space info for where the metadata will live */
> > + alloc_target = btrfs_get_alloc_profile(root, 0);
> > + meta_sinfo = __find_space_info(info, alloc_target);
> > +
> > + spin_lock(&meta_sinfo->lock);
> > +
> > + if (unlikely(!meta_sinfo->bytes_root)) {
> > + u64 bytes;
> > +
> > + /*
> > + * somebody should have already reserved space before starting
> > + * the transaction, so WARN_ON so we can fix that person.
> > + */
> > + WARN_ON(1);
> > +
> > + used = meta_sinfo->bytes_used + meta_sinfo->bytes_reserved +
> > + meta_sinfo->bytes_pinned + meta_sinfo->bytes_readonly +
> > + meta_sinfo->bytes_super + meta_sinfo->bytes_root +
> > + meta_sinfo->bytes_may_use + meta_sinfo->bytes_delalloc;
> > +
> > + bytes = meta_sinfo->total_bytes - used;
> > + meta_sinfo->bytes_root = calculate_bytes_needed(root, 6);
> > + bytes *= 20;
> > + bytes = div64_u64(bytes, 100);
> > + bytes = max(bytes,
> > + btrfs_root_used(&info->extent_root->root_item));
>
> This won't work since the used space accounting in root item isn't accuracy.
> We can fix the bug and use btrfsck to fix the accounting. But it introduces
> backwards compatibility issue. I think we can use 2% of total space as
> a temporary substitution.
>
I didn't look too thoroughly at this, but it looked like the accounting was done
properly, since we update it in btrfs_update_block_group. Oh though I guess we
could always be using the extent_root, is that the problem? If you let me know
what the issue is I will fix it tomorrow. Thanks,
Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html