On Sat, Mar 9, 2019 at 2:36 PM Glenn Trigg <ggtrigg@xxxxxxxxx> wrote: > I had some random machine freezing events which I suspected was due to > issues with a raid1 filesystem and kernel module crashes. Hard to say with available information. It's more likely hardware related, and then there's on-disk corruption. This: > % mount -r /dev/sda1 /data > mount: /data: can't read superblock on /dev/sda1. and this: > % btrfs rescue super-recover /dev/sda1 > All supers are valid, no need to recover Seem in conflict. I don't really understand how the kernel complains about a bad super and yet user space tools say they're all OK. What happens if you try: # mount -o ro,nologreplay,usebackuproot If that doesn't work, including kernel messages again, and also include output from: # btrfs insp dump-s -fa /dev/sda1 # btrfs insp dump-s -fa /dev/sdb1 > > and dmesg says: > > [15944.017629] BTRFS info (device sda1): disk space caching is enabled > [15944.017632] BTRFS info (device sda1): has skinny extents > [15944.024480] BTRFS info (device sda1): bdev /dev/sda1 errs: wr 0, rd > 0, flush 0, corrupt 1, gen 0 > [15944.024487] BTRFS info (device sda1): bdev /dev/sdb1 errs: wr 0, rd > 0, flush 0, corrupt 4, gen 0 > [15944.029292] BTRFS error (device sda1): parent transid verify failed > on 628168376320 wanted 37601 found 37700 > [15944.029466] BTRFS error (device sda1): parent transid verify failed > on 628168376320 wanted 37601 found 37700 That's usually bad. > Other system information is: > % uname -a > Linux izen 4.18.0-16-generic #17-Ubuntu SMP Fri Feb 8 00:06:57 UTC > 2019 x86_64 x86_64 x86_64 GNU/Linux It looks like extent tree corruption so I don't think it'll help to use a newer kernel; but I'd try it anyway in the meantime until a developer gets around to responding. Distro specific kernels tend to be supported by that distribution where upstream lists tend to support mainline. So I suggest 5.0.4, or 4.19.32, or you can be brave and download this, image it to a USB stick (dd if=file of=/dev/ bs=1M oflag=direct) which of course will erase everything on the stick. https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190327.n.0/compose/Everything/x86_64/iso/Fedora-Everything-netinst-x86_64-Rawhide-20190327.n.0.iso That might have 5.1rc2 on it, or something in between rc1 and rc2. You're still going to try and mount it read-only per above command, so even if it blows up it's not going to make this worse. > % btrfs check /dev/sda1 > Checking filesystem on /dev/sda1 > UUID: d5e50511-3e31-4de6-ba37-c5841895be9f > checking extents > parent transid verify failed on 628168343552 wanted 28163 found 37700 > parent transid verify failed on 628168343552 wanted 28163 found 37700 > parent transid verify failed on 628168343552 wanted 28163 found 37700 > parent transid verify failed on 628168343552 wanted 28163 found 37700 The transid's are really far apart, definitely something went really wrong. It could be hardware or both hardware and btrfs bug. That it affected *both* copies is a little weird unless it's memory corruption related, and then a lot of things can go wrong. > > Where do I go from here? If it can't be mounted, then the only chance is `btrfs-find-tree` and `btrfs restore` to try and scrape out whatever data you need that isn't already backed up. The priority before trying to repair it, is to get anything important off because trying to repair it has a good chance of permanent data loss. Definitely the latest tools are recommended for repair, kernel doesn't matter so much. -- Chris Murphy
