On 2020/1/25 下午7:35, David Sterba wrote: > On Sat, Jan 25, 2020 at 08:36:14AM +0800, Qu Wenruo wrote: >>>> The purpose seems to be to catch generic error codes other than >>>> EINPROGRESS and ECNACELED, I don't see much point printing a warning in >>>> that case. But it' a new ENOSPC problem, likely caused by the >>>> read-only status switching. >>>> >>>> My test devices are 12G, there's full log of the test to see at which >>>> phase it happened. >>> >>> It passes for me on 20G devices, haven't tested with 12G however >>> (can't afford to reboot any of my vms now). >> >> 5G for all scratch devices, and failed to reproduce it. >> (The full run before submitting the patch also failed to reproduce it) > > 5G is not actually enough to run some of the tests that require at least > 10G of free space (so the block device needs to be a bit larger). BTW, that ENOSPC sounds like a metadata over-commit bug, there if we can allocate 2 or more meta chunks, and meta rsv already needs 2 new chunks, then btrfs_inc_block_group_ro() will always fail no matter whatever the parameter we're using for any metadata chunk. (Since btrfs_block_group_ro() can only pre-alloc one chunk, while our current meta over-commit allows way more meta rsv than just one chunk) That one can be solved by my previous per-profile avail space patchset. > >>> I think that happens because before this patch we ignored ENOSPC >>> errors, when trying to set a block group to RO, for device replace and >>> scrub. >>> But with this patch we don't ignore ENOSPC for device replace anymore >>> - this is actually correct because if we ignore we can corrupt nocow >>> writes (including writes into prealloc extents). >>> >>> Now if it's a real ENOSPC situation or just a bug in the space >>> management code, it's a different thing to look at. >> >> I tend to take a middle land of the problem. >> >> For current stage, the WARN_ON() is indeed overkilled, at least for ENOSPC. >> >> But on the other handle, the full RO of a block group for scrub/replace >> is also a little overkilled. >> Just as Filipe mentioned, we only want to kill nocow writes into a block >> group, but still allow COW writes. >> >> It looks like something like mark_block_group_nocow_ro() in the future >> could reduce the possibility if not fully kill it. > > Yeah this sounds doable. But it may need more consideration since the rw -> nocow_ro -> ro state machine can be more complex than my initial thought. > >> It looks like changing the WARN_ON(ret) to WARN_ON(ret != -ENOSPC) would >> be needed for this patch as a quick fix. > > I'll remove the warning completely, as a separate patch. That's great. Thanks, Qu
