Re: [PATCH] btrfs: scrub: Require mandatory block group RO for dev-replace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/1/25 下午7:35, David Sterba wrote:
> On Sat, Jan 25, 2020 at 08:36:14AM +0800, Qu Wenruo wrote:
>>>> The purpose seems to be to catch generic error codes other than
>>>> EINPROGRESS and ECNACELED, I don't see much point printing a warning in
>>>> that case. But it' a new ENOSPC problem, likely caused by the
>>>> read-only status switching.
>>>>
>>>> My test devices are 12G, there's full log of the test to see at which
>>>> phase it happened.
>>>
>>> It passes for me on 20G devices, haven't tested with 12G however
>>> (can't afford to reboot any of my vms now).
>>
>> 5G for all scratch devices, and failed to reproduce it.
>> (The full run before submitting the patch also failed to reproduce it)
> 
> 5G is not actually enough to run some of the tests that require at least
> 10G of free space (so the block device needs to be a bit larger).

BTW, that ENOSPC sounds like a metadata over-commit bug, there if we can
allocate 2 or more meta chunks, and meta rsv already needs 2 new chunks,
then btrfs_inc_block_group_ro() will always fail no matter whatever the
parameter we're using for any metadata chunk.
(Since btrfs_block_group_ro() can only pre-alloc one chunk, while our
current meta over-commit allows way more meta rsv than just one chunk)

That one can be solved by my previous per-profile avail space patchset.

> 
>>> I think that happens because before this patch we ignored ENOSPC
>>> errors, when trying to set a block group to RO, for device replace and
>>> scrub.
>>> But with this patch we don't ignore ENOSPC for device replace anymore
>>> - this is actually correct because if we ignore we can corrupt nocow
>>> writes (including writes into prealloc extents).
>>>
>>> Now if it's a real ENOSPC situation or just a bug in the space
>>> management code, it's a different thing to look at.
>>
>> I tend to take a middle land of the problem.
>>
>> For current stage, the WARN_ON() is indeed overkilled, at least for ENOSPC.
>>
>> But on the other handle, the full RO of a block group for scrub/replace
>> is also a little overkilled.
>> Just as Filipe mentioned, we only want to kill nocow writes into a block
>> group, but still allow COW writes.
>>
>> It looks like something like mark_block_group_nocow_ro() in the future
>> could reduce the possibility if not fully kill it.
> 
> Yeah this sounds doable.

But it may need more consideration since the rw -> nocow_ro -> ro state
machine can be more complex than my initial thought.

> 
>> It looks like changing the WARN_ON(ret) to WARN_ON(ret != -ENOSPC) would
>> be needed for this patch as a quick fix.
> 
> I'll remove the warning completely, as a separate patch.

That's great.

Thanks,
Qu




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux