On Wed, May 27, 2020 at 1:51 PM Goffredo Baroncelli <kreijack@xxxxxxxxx> wrote: > > On 5/27/20 8:40 PM, Chris Murphy wrote: > > On Wed, May 27, 2020 at 10:23 AM Goffredo Baroncelli <kreijack@xxxxxxxxx> wrote: > >> > >> Hi All, > >> > >> On 5/27/20 8:25 AM, Chris Murphy wrote: > >>> On Tue, May 26, 2020 at 11:22 PM Andrei Borzenkov <arvidjaar@xxxxxxxxx> wrote: > >>>> > >>>> 27.05.2020 05:20, Chris Murphy пишет: > >>>>> > >>>>> single, dup, raid0, raid1 (all), raid10 are safe and stable. > >>>> > >>>> Until btrfs can reliably detect and automatically handle outdated device > >>>> I would not call any multi-device profiles "safe", at least unconditionally. > >>> > >>> I agree. > >>> > >> > >> Checking the generation of each device should be sufficient to detect "outdated" devices. Why this check is not performed ? > >> May be that I am missing something ? > > > > But transid isn't unique enough except in isolation. Degraded volumes > > are treated completely independently. So if I take a 2x raid1 and > > mount each one degraded on separate computers and modify them. Then > > join them back together, how can Btrfs resolve the differences? It's a > > mess. Yes that is obviously a kind of sabotage. While not literal > > sabotage, the effect is the same if you have alternating degraded > > drives in successive boots. > > Even tough we can't close all the holes, we can reduce the likelihood of a this issue. > > Anyway mounting a filesystem with different generation number is wrong. And the > fact the we can't prevent all the kind of mismatches doesn't mean that > we don't have to do anything. Yep. You're right. > > I am thinking about adding a "opt in" check. I.e. if the mismatch happens > btrfs should raise a warning. If a flag is passed at mount (like > mount -o prevent-generation-mismatch) and the generations don't match, > the mount fails. I wonder about using a compat_flag to set a device as having been mounted degraded. The next time a mount happens, all devices with compat_flag degraded set should have identical transids or we know something is screwy. If there is a device that does not have degraded flag, and has older transid, there could be some kind of sanity check to make sure the last 1-3 transids transactions are the same (?) and if so (a) allow a non-degraded mount, (b) warn, (c) "replay" the transactions between stale and current, so that all devices are caught up, similar to the partial rebuild mdadm does using write intent bitmap as the hint for what needs to be caught up. -- Chris Murphy
