On 2018/12/24 下午7:31, Peter Chant wrote:
On 12/24/18 12:58 AM, Chris Murphy wrote:
On Sat, Dec 22, 2018 at 10:22 AM Peter Chant <pete@xxxxxxxxxxxxxxx> wrote:
btrfs rescue super -v /dev/sdb2
...
All supers are valid, no need to recover
btrfs insp dump-s -f <dev>
...
generation 7937947
...
backup 0:
backup_tree_root: 1113909100544 gen: 7937935 level: 1
...
backup 1:
backup_tree_root: 1113907347456 gen: 7937936 level: 1
...
backup 2:
backup_tree_root: 1113911951360 gen: 7937937 level: 1
...
backup 3:
backup_tree_root: 1113907494912 gen: 7937934 level: 1
...
The kernel wrote out three valid checksummed supers, with what seems
to be a rather significant sanity violation. The super generation and
tree root address do not match any of the backup tree roots. The
*current* tree root is supposed to be in one of the backups as well.
I wonder if this is a result of my trying to fix things? E.g. btrfs
rescue super-recover or my attempts using the tools (and kernel) in Mint
18.1 at one point?
At least super-recover is not responsible for this.
While btrfs check --repair could indeed cause problems.
So it may be the case.
I must admit, early on I had assumed that either this file system was a
simple fix or was completely trashed, so I thought I'd have a quick go
at fixing it, or wipe it and start again. But then I seemed to get
close with only the one error, but unmountable.
Qu, any idea how this is even theoretically possible? Bit flip right
before the super is computed and checksummed? Seems like some kind of
corruption before checksum is computed.
I'm getting suspicious of the drive as when I was trying the various
btrfs rescue * tools I saw a 'bad block', or similar, error displayed.
I also have a separate basic install on ext4 on the same disk. Though
e2fsck shows no errors and mounts fine I cannot log into that install.
Maybe a coincidence, but too many bad things thrown up make me
suspicious. Whatever is happening this seems to be really fighting me.
I'm not sure how even a bad device accounts for the super generation
and backup mismatches. That's damn strange.
I'm less suspicious of the drive now. I've been using an ext4 partition
on the same drive for a few days now, having reinstalled on that and
everything _seems_ fine. Mind you, apart from usb sticks, I've not
experienced a ssd failure. Perhaps my hdd failure experience is not
relevent, i.e. they work until they start throwing errors and then
rapidly fail?
I don't really believe a drive can be so easily corrupted to certain
bits while all other bits are OK.
If you get bored with the back and forth and just want to give up,
that's fine. I suggest that if you have the time and space, to take a
btrfs-image in case Qu or some other developer wants to look at this
file system at some point. The btrfs-image is a read only process, can
be set to scrub filenames, and only contains metadata. Size of the
resulting file is around 1/2 of the size of metadata, when doing
'btrfs filesystem usage' or 'btrfs filesystem df'. So you'll need that
much free space to direct the command to.
btrfs-image -ss -c9 -t4 <devicetoimage> pathtofile
Just done that:
bash-4.3# btrfs-image -ss -c9 -t4 /dev/sdd2
/mnt/backup/btrfs_issue_dec_2018/btrfs_root_image_error_20181224.img
WARNING: cannot find a hash collision for '..', generating garbage, it
won't match indexes
It might fail, if so you can try adding -w and see if that helps.
OK, try with -w:
OK, many many complaints about hash collisions:
...
ARNING: cannot find a hash collision for 'ifup', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'catv', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'FDPC', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'LIBS', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'INTC', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'SPI', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'PDCA', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'EBI', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'SMC', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'WIFI', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'LWIP', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'HID', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'yun', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'avr4', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'avr6', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'WiFi', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'TFT', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'Knob', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'FP.h', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'SD.h', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'Beep', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'FORK', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'CHM', generating garbage, it
won't match indexes
WARNING: cannot find a hash collision for 'HandS', generating garbage,
it won't match indexes
WARNING: cannot find a hash collision for 'dm-0', generating garbage, it
won't match indexes
Now seems to stopped producing output. Can't see if it is doing
something useful. (note, started again, more such messages)
I don't know about other developers, normally I don't like btrfs-image
-ss at all.
Even plain btrfs-image isn't so helpful, especially considering its size.
Anyway, from all the data you collected, I suspect it's a corruption in
tree blocks allocation, maybe a btrfs bug in older kernels, which buried
a dangerous seed into the fs, breaking the metadata CoW.
And one day, an unexpected powerloss makes the seed grow and screw up
the fs.
Just a personal recommendation, for btrfs especially used with older
kernels, after a powerloss, it's highly recommended to run btrfs check
--readonly before mounting it.
Thanks,
Qu
There is no log listed in the super so zero-log isn't indicated, and
also tells me there were no fsync's still flushing at the time of the
crash. The loss should be at most a minute of data, not an
inconsistent file system that can't be mounted anymore. Pretty weird.
I think I ran zero-log to see if that helped. Given that there was no
important data and I'd assume I'd either easily fix it, or wipe it and
start over I may have taken the 'monkey radomly pounding the buttons'
approach, short of 'btrfs check --repair'. I only posted here as I
though I'd fixed it apart from the one error! If it were a simple fix
then it was worth asking.
What were your mount options? Defaults? Anything custom like discard,
commit=, notreelog? Any non-default mount options themselves would not
be the cause of the problem, but might suggest partial ideas for what
might have happened.
fstab states:
autodefrag,ssd,discard,noatime,defaults,subvol=_r_sl14.
2,compress=lzo
However, I used an initrd, so I'm not sure if that is correct?
Ok, digging into init within my initrd, the line where the root partion
is mounted:
mount -o ro -t $ROOTFS $ROOTDEV /mnt
Where $ROOTFS is:
btrfs -o subvol=_r_sl14.2
and $ROOTDEV is:
/dev/disk/by-uuid/6496aabd-d6aa-49e0-96ca-e49c316edd8e
Pete