On Tue, Sep 24, 2019, 18:34 Chris Murphy, <lists@xxxxxxxxxxxxxxxxx> wrote: > On Tue, Sep 24, 2019 at 4:04 PM Nick Bowler <nbowler@xxxxxxxxxx> wrote: > > - Running Linux 5.2.14, I pushed this system to OOM; the oom killer > > ran and killed some userspace tasks. At this point many of the > > remaining tasks were stuck in uninterruptible sleeps. Not really > > worried, I turned the machine off and on again to just get everything > > back to normal. But I guess now that everything had gone horribly > > wrong already at this point... > > Yeah the kernel oomkiller is pretty much only about kernel > preservation, not user space preservation. Indeed I am not bothered at all by needing to turn it off and on again in this situation. But filesystems being completely trashed is another matter... > > - Upon reboot, the system boots OK but now btrfs is throwing zillions > > of checksum errors. After some time the filesystem is remounted > > readonly and I lose the ability to interact with the system at all, so > > it gets powered off. > > > > - Now the filesystem is unmountable. > > The transid errors look like they might be caused by the 5.2 regression > > https://lore.kernel.org/linux-btrfs/20190911145542.1125-1-fdmanana@xxxxxxxxxx/T/#u > > Fixed since 5.2.15 and 5.3.0. Yikes, so my decision to update the latest kernel two weeks ago perhaps was a very bad one. Should've stuck with 4.19.y I guess. > So if you're willing to blow shit up again, you can try to reproduce > with one of those. Well I could try but it sounds like this might be hard to reproduce... > I was also doing oomkiller blow shit up tests a few weeks ago with > these same problem kernels and never hit this bug, or any others. I > also had to do a LOT of force power offs because the system just > became totally wedged in and I had no way of estimating how long it > would be for recovery so after 30 minutes I hit the power button. Many > times. Zero corruptions. That's with a single Samsung 840 EVO in a > laptop relegated to such testing. Just a thought... the system was alive but I was able to briefly inspect the situation and notice that tasks were blocked and unkillable... until my shell hung too and then I was hosed. But I didn't hit the power button but rather rebooted with sysrq+e, sysrq+u, sysrq+b. Not sure if that makes a difference. > Might be a different bug. Not sure. But also, this is with > > > [ 347.551595] CPU: 3 PID: 1143 Comm: mount Not tainted 4.19.34-1-lts #1 > > So I don't know how an older kernel will report on the problem caused > by the 5.2 bug. This is the kernel from systemrescuecd. I can try taking a disk image and mounting on another machine with a newer linux version. Thanks, Nick
