On 18.07.19 г. 8:48 ч., Qu Wenruo wrote:
> [BUG]
> The following script can easily cause unexpected ENOSPC:
> umount $dev &> /dev/null
> umount $mnt &> /dev/null
>
> mkfs.btrfs -b 1G -m single -d single $dev -f > /dev/null
>
> mount $dev $mnt -o enospc_debug
>
> for i in $(seq -w 0 511); do
> xfs_io -f -c "pwrite 0 1m" $mnt/inline_$i > /dev/null
> done
> sync
>
> btrfs balance start --full $mnt || return 1
> sync
>
> # This will report -ENOSPC
> btrfs balance start --full $mnt || return 1
> umount $mnt
>
> Also, btrfs/156 can also fail due to ENOSPC.
>
> [CAUSE]
> The ENOSPC is reported by btrfs_can_relocate().
>
> In btrfs_can_relocate(), it does the following check:
> - If the block group is empty
> If empty, definitely we can relocate this block group.
> - If we are not the only block group and we have enough space
> Then we can relocate this block group.
>
> Above two checks are completely OK, although I could argue they doesn't
> make much sense, but the following check is vague and even sometimes
> too cautious to cause ENOSPC:
> - If we can allocate a new block group as large as current one.
> If we failed previous two checks, we must pass this to relocate this
> block group.
btrfs_can_relocate chunk requires min_free to be allocatable.
min_free is defined as the used space in the block group being
relocated, which I think is correct. Also I find the logic which
adjusts min_free and dev_min to also be correct. Finally the function
checks whether the device's freespace is fragmented by trying to find a
device chunk with the appropriate size. The question is - can we really
have a device that has enough free space, yet is fragmented such that
find_free_dev_extent fails which results in failing the allocation? I
think the answer is no since we allocate in chunk granularity. What am I missing?
OTOH, in btrfs_inc_block_group_ro we only allocate a chunk if:
a) we are changing raid profiles
b) if inc_block_group_ro fails for our block group.
And for b) I'm a bit puzzled as to what the code is supposed to mean. We have:
num_bytes = cache->key.offset - cache->reserved - cache->pinned -
cache->bytes_super - btrfs_block_group_used(&cache->item);
sinfo_used = btrfs_space_info_used(sinfo, true);
if (sinfo_used + num_bytes + min_allocable_bytes <=
sinfo->total_bytes) {
//set ro
}
This means if the free space in the block group + the used space in the
space info is smaller than the total space in
the space info - make this block group RO. What's the rationale behind that?