On Mon, Dec 3, 2018 at 8:32 PM Mike Javorski <mike.javorski@xxxxxxxxx> wrote: > > Need a bit of advice here ladies / gents. I am running into an issue > which Qu Wenruo seems to have posted a patch for several weeks ago > (see https://patchwork.kernel.org/patch/10694997/). > > Here is the relevant dmesg output which led me to Qu's patch. > ---- > [ 10.032475] BTRFS critical (device sdb): corrupt leaf: root=2 > block=24655027060736 slot=20 bg_start=13188988928 bg_len=10804527104, > invalid block group size, have 10804527104 expect (0, 10737418240] > [ 10.032493] BTRFS error (device sdb): failed to read block groups: -5 > [ 10.053365] BTRFS error (device sdb): open_ctree failed > ---- > > This server has a 16 disk btrfs filesystem (RAID6) which I boot > periodically to btrfs-send snapshots to. This machine is running > ArchLinux and I had just updated to their latest 4.19.4 kernel > package (from 4.18.10 which was working fine). I've tried updating to > the 4.19.6 kernel that is in testing, but that doesn't seem to resolve > the issue. From what I can see on kernel.org, the patch above is not > pushed to stable or to Linus' tree. > > At this point the question is what to do. Is my FS toast? Could I > revert to the 4.18.10 kernel and boot safely? I don't know if the 4.19 > boot process may have flipped some bits which would make reverting > problematic. That patch is not yet merged in linux-next so to use it, you'd need to apply yourself and compile a kernel. I can't tell for sure if it'd help. But, the less you change the file system, the better chance of saving it. I have no idea why there'd be a corrupt leaf just due to a kernel version change, though. Needless to say, raid56 just seems fragile once it runs into any kind of trouble. I personally wouldn't boot off it at all. I would only mount it from another system, ideally an installed system but a live system with the kernel versions you need would also work. That way you can get more information without changes, and booting will almost immediately mount rw, if mount succeeds at all, and will write a bunch of changes to the file system. Whether it's a case of 4.18.10 not detecting corruption that 4.19 sees, or if 4.19 already caused it, the best chance is to not mount it rw, and not run check --repair, until you get some feedback from a developer. The thing I'd like to see is # btrfs rescue super -v /anydevice/ # btrfs insp dump-s -f /anydevice/ First command will tell us if all the supers are the same and valid across all devices. And the second one, hopefully it's pointed to a device with valid super, will tell us if there's a log root value other than 0. Both of those are read only commands. -- Chris Murphy
