On Fri, Sep 23, 2016 at 12:58 AM, Duncan <1i5t5.duncan@xxxxxxx> wrote: > > Btrfs raid1 you say, and you have existing compressed files it's trying > to read in the backtrace? > > Sounds like the issues I see sometimes and have posted about where after > a crash that resulted in one device of my raid1 pair getting behind the > other, the kernel will crash if it sees too many csum-errors, even tho > it's /supposed/ to check the other copy and read from it if valid (which > it is as a btrfs scrub resolves the issue). > > When booted to rescue/single-user mode, can you run a scrub? After a few reboots trying to capture the initial panic message (even when I set panic_on_oops=1 I was getting multiple ones with only the tainted one staying on screen), the system managed to stay up. I completed a scrub and it found no errors. I also haven't had any issues with it but haven't attempted another reboot. I figured the safest course was to just leave it on for a good week so that whatever was in the log/etc that was giving it trouble works its way out. I'm also doing a balance which may or may not help (and which is useful anyway since I increased the size of the drive I replaced). I'm still pretty skeptical of a hardware problem, but once I think the system is able to be safely rebooted I'll go ahead and run a longer memory test/etc. This really doesn't seem like a memory problem, and I don't see a corrupted binary as an issue since everything running in kernel space is versioned and the problem happens with multiple kernel versions (and the older ones haven't been touched on-disk in ages). A problem in glibc/etc shouldn't cause a kernel oops absent a bug. But, if there is a hardware problem I obviously want to know about it, and I've had a few RAM failures over the years... -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
