Re: help request for an unmountable raid1 filesystem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Mar 9, 2019 at 2:36 PM Glenn Trigg <ggtrigg@xxxxxxxxx> wrote:

> I had some random machine freezing events which I suspected was due to
> issues with a raid1 filesystem and kernel module crashes.

Hard to say with available information. It's more likely hardware
related, and then there's on-disk corruption.


This:

> % mount -r /dev/sda1 /data
> mount: /data: can't read superblock on /dev/sda1.

and this:

> % btrfs rescue super-recover /dev/sda1
> All supers are valid, no need to recover

Seem in conflict. I don't really understand how the kernel complains
about a bad super and yet user space tools say they're all OK.  What
happens if you try:

# mount -o ro,nologreplay,usebackuproot

If that doesn't work, including kernel messages again, and also
include output from:

# btrfs insp dump-s -fa /dev/sda1
# btrfs insp dump-s -fa /dev/sdb1



>
> and dmesg says:
>
> [15944.017629] BTRFS info (device sda1): disk space caching is enabled
> [15944.017632] BTRFS info (device sda1): has skinny extents
> [15944.024480] BTRFS info (device sda1): bdev /dev/sda1 errs: wr 0, rd
> 0, flush 0, corrupt 1, gen 0
> [15944.024487] BTRFS info (device sda1): bdev /dev/sdb1 errs: wr 0, rd
> 0, flush 0, corrupt 4, gen 0
> [15944.029292] BTRFS error (device sda1): parent transid verify failed
> on 628168376320 wanted 37601 found 37700
> [15944.029466] BTRFS error (device sda1): parent transid verify failed
> on 628168376320 wanted 37601 found 37700

That's usually bad.


> Other system information is:
> % uname -a
> Linux izen 4.18.0-16-generic #17-Ubuntu SMP Fri Feb 8 00:06:57 UTC
> 2019 x86_64 x86_64 x86_64 GNU/Linux

It looks like extent tree corruption so I don't think it'll help to
use a newer kernel; but I'd try it anyway in the meantime until a
developer gets around to responding. Distro specific kernels tend to
be supported by that distribution where upstream lists tend to support
mainline. So I suggest 5.0.4, or 4.19.32, or you can be brave and
download this, image it to a USB stick (dd if=file of=/dev/ bs=1M
oflag=direct) which of course will erase everything on the stick.

https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190327.n.0/compose/Everything/x86_64/iso/Fedora-Everything-netinst-x86_64-Rawhide-20190327.n.0.iso

That might have 5.1rc2 on it, or something in between rc1 and rc2.
You're still going to try and mount it read-only per above command, so
even if it blows up it's not going to make this worse.


> % btrfs check /dev/sda1
> Checking filesystem on /dev/sda1
> UUID: d5e50511-3e31-4de6-ba37-c5841895be9f
> checking extents
> parent transid verify failed on 628168343552 wanted 28163 found 37700
> parent transid verify failed on 628168343552 wanted 28163 found 37700
> parent transid verify failed on 628168343552 wanted 28163 found 37700
> parent transid verify failed on 628168343552 wanted 28163 found 37700

The transid's are really far apart, definitely something went really
wrong. It could be hardware or both hardware and btrfs bug. That it
affected *both* copies is a little weird unless it's memory corruption
related, and then a lot of things can go wrong.


>
> Where do I go from here?

If it can't be mounted, then the only chance is `btrfs-find-tree` and
`btrfs restore` to try and scrape out whatever data you need that
isn't already backed up. The priority before trying to repair it, is
to get anything important off because trying to repair it has a good
chance of permanent data loss. Definitely the latest tools are
recommended for repair, kernel doesn't matter so much.


-- 
Chris Murphy



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux