Re: [PATCH] btrfs: Remove the duplicated and sometimes too cautious btrfs_can_relocate()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2019/7/18 下午7:16, Filipe Manana wrote:
> On Thu, Jul 18, 2019 at 6:50 AM Qu Wenruo <wqu@xxxxxxxx> wrote:
>>
>> [BUG]
>> The following script can easily cause unexpected ENOSPC:
>>   umount $dev &> /dev/null
>>   umount $mnt &> /dev/null
>>
>>   mkfs.btrfs -b 1G -m single -d single $dev -f > /dev/null
>>
>>   mount $dev $mnt -o enospc_debug
>>
>>   for i in $(seq -w 0 511); do
>>         xfs_io -f -c "pwrite 0 1m" $mnt/inline_$i > /dev/null
>>   done
>>   sync
>>
>>   btrfs balance start --full $mnt || return 1
>>   sync
>>
>>   # This will report -ENOSPC
>>   btrfs balance start --full $mnt || return 1
>>   umount $mnt
>>
>> Also, btrfs/156 can also fail due to ENOSPC.
> 
> Well, that script you pasted is btrfs/156 essentially.
> 
> When did the test started failing? When the test was added, it didn't
> fail, did it?

Biosect to commit 302167c50b32 ("btrfs: don't end the transaction for
delayed refs in throttle").

But that commit itself looks pretty valid to me.

And as described, it failed at btrfs_can_relocate(), not exactly the
test case is expected to fail.
(IIRC when the test case is submitted, the error happens at
inc_block_group_ro(), hence why I added some debug message there)

> 
>>
>> [CAUSE]
>> The ENOSPC is reported by btrfs_can_relocate().
>>
>> In btrfs_can_relocate(), it does the following check:
>> - If the block group is empty
>>   If empty, definitely we can relocate this block group.
>> - If we are not the only block group and we have enough space
>>   Then we can relocate this block group.
>>
>> Above two checks are completely OK, although I could argue they doesn't
>> make much sense, but the following check is vague and even sometimes
>> too cautious to cause ENOSPC:
>> - If we can allocate a new block group as large as current one.
>>   If we failed previous two checks, we must pass this to relocate this
>>   block group.
>>
>> There are several problems here:
>> 1. We don't need to allocate as large as the source block group.
>>    E.g. source block group is 1G sized, but only 1M used. We only need
>>    to allocated a data chunk larger than 1M to continue relocation.
> 
> Right. But where does btrfs_can_relocate() do such assumption?
> It only tries to check if there's enough space for an amount that
> corresponds to the amount used in the block group, that is, not the
> size of the block group (unless the block group is completely full).

You're right, my description here is wrong.

I'll remove this paragraph completely.

Thanks,
Qu

> 
>>
>> 2. The check in btrfs_can_relocate() is vague and impossible to be as
>>    accurate as __btrfs_alloc_chunk()
>>    How could this less than 200 lines code do the same work as
>>    __btrfs_alloc_chunk()? And it's hard to maintain two different
>>    functions to do similar work.
>>
>> 3. We have more accurate check in btrfs_inc_block_group_ro().
>>    Btrfs_inc_block_group_ro() is doing similar check but much better.
>>    In btrfs_inc_block_group_ro() we do:
>>    * Forced chunk allocation if we're converting
>>
>>    * Try to mark block group ro first
>>      in inc_btrfs_block_group_ro(), we will do comprehensive space
>>      check to ensure we have enough free space for the used and reserved
>>      space of the block group.
>>      If succeeded, we're done.
>>
>>    * Force chunk allocation for more space
>>      If we failed here, we indeed hits ENOSPC.
>>
>>    * Try to mark block group ro again
>>      As we have extra space, we can try again.
>>      This is the last chance, either we have enough space now and
>>      success, or the newly allocated space is not large enough, ENOSPC
>>      is returned.
>>
>>    Such try-allocate-try behavior is way more accurate in every way
>>    compared to btrfs_can_relocate(), we can rely on
>>    btrfs_inc_block_group_ro() to replace btrfs_can_relocate()
>>    completely.
>>
>> [FIX]
>> Since regular balance routine already has a better ENOSPC detector,
>> there is no need to keep the false-alert-prone btrfs_can_relocate().
>>
>> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
>> ---
>>  fs/btrfs/ctree.h       |   1 -
>>  fs/btrfs/extent-tree.c | 141 -----------------------------------------
>>  fs/btrfs/volumes.c     |   4 --
>>  3 files changed, 146 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 0a61dff27f57..965d1e5a4af7 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -2772,7 +2772,6 @@ int btrfs_setup_space_cache(struct btrfs_trans_handle *trans);
>>  int btrfs_extent_readonly(struct btrfs_fs_info *fs_info, u64 bytenr);
>>  int btrfs_free_block_groups(struct btrfs_fs_info *info);
>>  int btrfs_read_block_groups(struct btrfs_fs_info *info);
>> -int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr);
>>  int btrfs_make_block_group(struct btrfs_trans_handle *trans,
>>                            u64 bytes_used, u64 type, u64 chunk_offset,
>>                            u64 size);
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 5faf057f6f37..822a4102980d 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -9774,147 +9774,6 @@ void btrfs_dec_block_group_ro(struct btrfs_block_group_cache *cache)
>>         spin_unlock(&sinfo->lock);
>>  }
>>
>> -/*
>> - * Checks to see if it's even possible to relocate this block group.
>> - *
>> - * @return - -1 if it's not a good idea to relocate this block group, 0 if its
>> - * ok to go ahead and try.
>> - */
>> -int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr)
>> -{
>> -       struct btrfs_block_group_cache *block_group;
>> -       struct btrfs_space_info *space_info;
>> -       struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
>> -       struct btrfs_device *device;
>> -       u64 min_free;
>> -       u64 dev_min = 1;
>> -       u64 dev_nr = 0;
>> -       u64 target;
>> -       int debug;
>> -       int index;
>> -       int full = 0;
>> -       int ret = 0;
>> -
>> -       debug = btrfs_test_opt(fs_info, ENOSPC_DEBUG);
>> -
>> -       block_group = btrfs_lookup_block_group(fs_info, bytenr);
>> -
>> -       /* odd, couldn't find the block group, leave it alone */
>> -       if (!block_group) {
>> -               if (debug)
>> -                       btrfs_warn(fs_info,
>> -                                  "can't find block group for bytenr %llu",
>> -                                  bytenr);
>> -               return -1;
>> -       }
>> -
>> -       min_free = btrfs_block_group_used(&block_group->item);
>> -
>> -       /* no bytes used, we're good */
>> -       if (!min_free)
>> -               goto out;
>> -
>> -       space_info = block_group->space_info;
>> -       spin_lock(&space_info->lock);
>> -
>> -       full = space_info->full;
>> -
>> -       /*
>> -        * if this is the last block group we have in this space, we can't
>> -        * relocate it unless we're able to allocate a new chunk below.
>> -        *
>> -        * Otherwise, we need to make sure we have room in the space to handle
>> -        * all of the extents from this block group.  If we can, we're good
>> -        */
>> -       if ((space_info->total_bytes != block_group->key.offset) &&
>> -           (btrfs_space_info_used(space_info, false) + min_free <
>> -            space_info->total_bytes)) {
>> -               spin_unlock(&space_info->lock);
>> -               goto out;
>> -       }
>> -       spin_unlock(&space_info->lock);
>> -
>> -       /*
>> -        * ok we don't have enough space, but maybe we have free space on our
>> -        * devices to allocate new chunks for relocation, so loop through our
>> -        * alloc devices and guess if we have enough space.  if this block
>> -        * group is going to be restriped, run checks against the target
>> -        * profile instead of the current one.
>> -        */
>> -       ret = -1;
>> -
>> -       /*
>> -        * index:
>> -        *      0: raid10
>> -        *      1: raid1
>> -        *      2: dup
>> -        *      3: raid0
>> -        *      4: single
>> -        */
>> -       target = get_restripe_target(fs_info, block_group->flags);
>> -       if (target) {
>> -               index = btrfs_bg_flags_to_raid_index(extended_to_chunk(target));
>> -       } else {
>> -               /*
>> -                * this is just a balance, so if we were marked as full
>> -                * we know there is no space for a new chunk
>> -                */
>> -               if (full) {
>> -                       if (debug)
>> -                               btrfs_warn(fs_info,
>> -                                          "no space to alloc new chunk for block group %llu",
>> -                                          block_group->key.objectid);
>> -                       goto out;
>> -               }
>> -
>> -               index = btrfs_bg_flags_to_raid_index(block_group->flags);
>> -       }
>> -
>> -       if (index == BTRFS_RAID_RAID10) {
>> -               dev_min = 4;
>> -               /* Divide by 2 */
>> -               min_free >>= 1;
>> -       } else if (index == BTRFS_RAID_RAID1) {
>> -               dev_min = 2;
>> -       } else if (index == BTRFS_RAID_DUP) {
>> -               /* Multiply by 2 */
>> -               min_free <<= 1;
>> -       } else if (index == BTRFS_RAID_RAID0) {
>> -               dev_min = fs_devices->rw_devices;
>> -               min_free = div64_u64(min_free, dev_min);
>> -       }
>> -
>> -       mutex_lock(&fs_info->chunk_mutex);
>> -       list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list) {
>> -               u64 dev_offset;
>> -
>> -               /*
>> -                * check to make sure we can actually find a chunk with enough
>> -                * space to fit our block group in.
>> -                */
>> -               if (device->total_bytes > device->bytes_used + min_free &&
>> -                   !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) {
>> -                       ret = find_free_dev_extent(device, min_free,
>> -                                                  &dev_offset, NULL);
>> -                       if (!ret)
>> -                               dev_nr++;
>> -
>> -                       if (dev_nr >= dev_min)
>> -                               break;
> 
> And here's a bug in that code. Before breaking out of the loop, ret
> should be set to 0.
> 
> In general I'm ok with the change, but would like an answer to those questions.
> 
> Thanks.
> 
>> -
>> -                       ret = -1;
>> -               }
>> -       }
>> -       if (debug && ret == -1)
>> -               btrfs_warn(fs_info,
>> -                          "no space to allocate a new chunk for block group %llu",
>> -                          block_group->key.objectid);
>> -       mutex_unlock(&fs_info->chunk_mutex);
>> -out:
>> -       btrfs_put_block_group(block_group);
>> -       return ret;
>> -}
>> -
>>  static int find_first_block_group(struct btrfs_fs_info *fs_info,
>>                                   struct btrfs_path *path,
>>                                   struct btrfs_key *key)
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 1c2a6e4b39da..f209127a8bc6 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -3071,10 +3071,6 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
>>          */
>>         lockdep_assert_held(&fs_info->delete_unused_bgs_mutex);
>>
>> -       ret = btrfs_can_relocate(fs_info, chunk_offset);
>> -       if (ret)
>> -               return -ENOSPC;
>> -
>>         /* step one, relocate all the extents inside this chunk */
>>         btrfs_scrub_pause(fs_info);
>>         ret = btrfs_relocate_block_group(fs_info, chunk_offset);
>> --
>> 2.22.0
>>
> 
> 

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux