On Mon, Aug 19, 2019 at 11:30:16AM +0300, Nikolay Borisov wrote:
>
>
> On 16.08.19 г. 18:05 ч., Josef Bacik wrote:
> > btrfs_calc_trunc_metadata_size differs from trans_metadata_size in that
> > it doesn't take into account any splitting at the levels, because
> > truncate will never split nodes. However truncate _and_ changing will
> > never split nodes, so rename btrfs_calc_trunc_metadata_size to
> > btrfs_calc_metadata_size. Also btrfs_calc_trans_metadata_size is purely
> > for inserting items, so rename this to btrfs_calc_insert_metadata_size.
> > Making these clearer will help when I start using them differently in
> > upcoming patches.
> >
> > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
> > ---
> > fs/btrfs/block-group.c | 4 ++--
> > fs/btrfs/ctree.h | 14 +++++++++-----
> > fs/btrfs/delalloc-space.c | 8 ++++----
> > fs/btrfs/delayed-inode.c | 4 ++--
> > fs/btrfs/delayed-ref.c | 8 ++++----
> > fs/btrfs/file.c | 4 ++--
> > fs/btrfs/free-space-cache.c | 4 ++--
> > fs/btrfs/inode-map.c | 2 +-
> > fs/btrfs/inode.c | 6 +++---
> > fs/btrfs/props.c | 2 +-
> > fs/btrfs/root-tree.c | 2 +-
> > fs/btrfs/space-info.c | 2 +-
> > fs/btrfs/transaction.c | 4 ++--
> > 13 files changed, 34 insertions(+), 30 deletions(-)
>
> Reviewed-by: Nikolay Borisov <nborisov@xxxxxxxx> see one discussion
> point below.
> >
> > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> > index afae5c731904..3147e840f839 100644
> > --- a/fs/btrfs/block-group.c
> > +++ b/fs/btrfs/block-group.c
> > @@ -3014,8 +3014,8 @@ void check_system_chunk(struct btrfs_trans_handle *trans, u64 type)
> > num_devs = get_profile_num_devs(fs_info, type);
> >
> > /* num_devs device items to update and 1 chunk item to add or remove */
> > - thresh = btrfs_calc_trunc_metadata_size(fs_info, num_devs) +
> > - btrfs_calc_trans_metadata_size(fs_info, 1);
> > + thresh = btrfs_calc_metadata_size(fs_info, num_devs) +
> > + btrfs_calc_insert_metadata_size(fs_info, 1);
> >
> > if (left < thresh && btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
> > btrfs_info(fs_info, "left=%llu, need=%llu, flags=%llu",
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index 85b808e3ea42..f352aa098015 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -2450,17 +2450,21 @@ int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb,
> >
> > u64 btrfs_csum_bytes_to_leaves(struct btrfs_fs_info *fs_info, u64 csum_bytes);
> >
> > -static inline u64 btrfs_calc_trans_metadata_size(struct btrfs_fs_info *fs_info,
> > - unsigned num_items)
> > +/*
> > + * Use this if we would be adding new items, as we could split nodes as we cow
> > + * down the tree.
> > + */
> > +static inline u64 btrfs_calc_insert_metadata_size(struct btrfs_fs_info *fs_info,
> > + unsigned num_items)
> > {
> > return (u64)fs_info->nodesize * BTRFS_MAX_LEVEL * 2 * num_items;
> > }
> >
>
> Isn't assuming that we are going to split on every level of the cow
> rather pessimistic, bordering on impossible. Isn't it realistically
> possible that we will only ever split up until root->level.
>
I had this discussion with Chris when I messed with this. We do pass in the
root we intend on messing with so we could very well do this, but it sort of
scares me because maybe we've been using more of our worst case reservations
than I expected.
My plan is to get these last corners worked out, get a few more months of
production testing, and then start experimenting with using the root->level + 1
for the maximum level and see how that goes. The ramp up from 1 level to 3
level roots happens pretty fast, so I suspect there'll be weird corner cases
going from empty->not empty, but I _think_ we should be fine to make this change
further down the road?
Also I will probably start with _just_ the transaction reservations, and leave
the delayed refs calculations the absolute worst case since those can explode.
Thanks,
Josef