On Sat, Jan 25, 2020 at 08:36:14AM +0800, Qu Wenruo wrote: > >> The purpose seems to be to catch generic error codes other than > >> EINPROGRESS and ECNACELED, I don't see much point printing a warning in > >> that case. But it' a new ENOSPC problem, likely caused by the > >> read-only status switching. > >> > >> My test devices are 12G, there's full log of the test to see at which > >> phase it happened. > > > > It passes for me on 20G devices, haven't tested with 12G however > > (can't afford to reboot any of my vms now). > > 5G for all scratch devices, and failed to reproduce it. > (The full run before submitting the patch also failed to reproduce it) 5G is not actually enough to run some of the tests that require at least 10G of free space (so the block device needs to be a bit larger). > > I think that happens because before this patch we ignored ENOSPC > > errors, when trying to set a block group to RO, for device replace and > > scrub. > > But with this patch we don't ignore ENOSPC for device replace anymore > > - this is actually correct because if we ignore we can corrupt nocow > > writes (including writes into prealloc extents). > > > > Now if it's a real ENOSPC situation or just a bug in the space > > management code, it's a different thing to look at. > > I tend to take a middle land of the problem. > > For current stage, the WARN_ON() is indeed overkilled, at least for ENOSPC. > > But on the other handle, the full RO of a block group for scrub/replace > is also a little overkilled. > Just as Filipe mentioned, we only want to kill nocow writes into a block > group, but still allow COW writes. > > It looks like something like mark_block_group_nocow_ro() in the future > could reduce the possibility if not fully kill it. Yeah this sounds doable. > It looks like changing the WARN_ON(ret) to WARN_ON(ret != -ENOSPC) would > be needed for this patch as a quick fix. I'll remove the warning completely, as a separate patch.
