On 2018/12/24 下午7:31, Peter Chant wrote: > On 12/24/18 12:58 AM, Chris Murphy wrote: >> On Sat, Dec 22, 2018 at 10:22 AM Peter Chant <pete@xxxxxxxxxxxxxxx> wrote: >> >>> btrfs rescue super -v /dev/sdb2 >> ... >>> All supers are valid, no need to recover >>> >>> >>> btrfs insp dump-s -f <dev> >> ... >>> generation 7937947 >> ... >>> backup 0: >>> backup_tree_root: 1113909100544 gen: 7937935 level: 1 >> ... >>> backup 1: >>> backup_tree_root: 1113907347456 gen: 7937936 level: 1 >> ... >>> backup 2: >>> backup_tree_root: 1113911951360 gen: 7937937 level: 1 >> ... >>> backup 3: >>> backup_tree_root: 1113907494912 gen: 7937934 level: 1 >> ... >> >> >> The kernel wrote out three valid checksummed supers, with what seems >> to be a rather significant sanity violation. The super generation and >> tree root address do not match any of the backup tree roots. The >> *current* tree root is supposed to be in one of the backups as well. >> > > I wonder if this is a result of my trying to fix things? E.g. btrfs > rescue super-recover or my attempts using the tools (and kernel) in Mint > 18.1 at one point? At least super-recover is not responsible for this. While btrfs check --repair could indeed cause problems. So it may be the case. > > I must admit, early on I had assumed that either this file system was a > simple fix or was completely trashed, so I thought I'd have a quick go > at fixing it, or wipe it and start again. But then I seemed to get > close with only the one error, but unmountable. > > >> Qu, any idea how this is even theoretically possible? Bit flip right >> before the super is computed and checksummed? Seems like some kind of >> corruption before checksum is computed. >> >> >>> I'm getting suspicious of the drive as when I was trying the various >>> btrfs rescue * tools I saw a 'bad block', or similar, error displayed. >>> I also have a separate basic install on ext4 on the same disk. Though >>> e2fsck shows no errors and mounts fine I cannot log into that install. >>> Maybe a coincidence, but too many bad things thrown up make me >>> suspicious. Whatever is happening this seems to be really fighting me. >> >> I'm not sure how even a bad device accounts for the super generation >> and backup mismatches. That's damn strange. > > I'm less suspicious of the drive now. I've been using an ext4 partition > on the same drive for a few days now, having reinstalled on that and > everything _seems_ fine. Mind you, apart from usb sticks, I've not > experienced a ssd failure. Perhaps my hdd failure experience is not > relevent, i.e. they work until they start throwing errors and then > rapidly fail? I don't really believe a drive can be so easily corrupted to certain bits while all other bits are OK. > > >> >> If you get bored with the back and forth and just want to give up, >> that's fine. I suggest that if you have the time and space, to take a >> btrfs-image in case Qu or some other developer wants to look at this >> file system at some point. The btrfs-image is a read only process, can >> be set to scrub filenames, and only contains metadata. Size of the >> resulting file is around 1/2 of the size of metadata, when doing >> 'btrfs filesystem usage' or 'btrfs filesystem df'. So you'll need that >> much free space to direct the command to. >> >> btrfs-image -ss -c9 -t4 <devicetoimage> pathtofile > > Just done that: > bash-4.3# btrfs-image -ss -c9 -t4 /dev/sdd2 > /mnt/backup/btrfs_issue_dec_2018/btrfs_root_image_error_20181224.img > WARNING: cannot find a hash collision for '..', generating garbage, it > won't match indexes > > > >> >> It might fail, if so you can try adding -w and see if that helps. > > > OK, try with -w: > > OK, many many complaints about hash collisions: > ... > ARNING: cannot find a hash collision for 'ifup', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'catv', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'FDPC', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'LIBS', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'INTC', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'SPI', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'PDCA', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'EBI', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'SMC', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'WIFI', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'LWIP', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'HID', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'yun', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'avr4', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'avr6', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'WiFi', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'TFT', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'Knob', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'FP.h', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'SD.h', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'Beep', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'FORK', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'CHM', generating garbage, it > won't match indexes > WARNING: cannot find a hash collision for 'HandS', generating garbage, > it won't match indexes > WARNING: cannot find a hash collision for 'dm-0', generating garbage, it > won't match indexes > > > Now seems to stopped producing output. Can't see if it is doing > something useful. (note, started again, more such messages) I don't know about other developers, normally I don't like btrfs-image -ss at all. Even plain btrfs-image isn't so helpful, especially considering its size. Anyway, from all the data you collected, I suspect it's a corruption in tree blocks allocation, maybe a btrfs bug in older kernels, which buried a dangerous seed into the fs, breaking the metadata CoW. And one day, an unexpected powerloss makes the seed grow and screw up the fs. Just a personal recommendation, for btrfs especially used with older kernels, after a powerloss, it's highly recommended to run btrfs check --readonly before mounting it. Thanks, Qu > > >> >> There is no log listed in the super so zero-log isn't indicated, and >> also tells me there were no fsync's still flushing at the time of the >> crash. The loss should be at most a minute of data, not an >> inconsistent file system that can't be mounted anymore. Pretty weird. >> > > I think I ran zero-log to see if that helped. Given that there was no > important data and I'd assume I'd either easily fix it, or wipe it and > start over I may have taken the 'monkey radomly pounding the buttons' > approach, short of 'btrfs check --repair'. I only posted here as I > though I'd fixed it apart from the one error! If it were a simple fix > then it was worth asking. > > >> What were your mount options? Defaults? Anything custom like discard, >> commit=, notreelog? Any non-default mount options themselves would not >> be the cause of the problem, but might suggest partial ideas for what >> might have happened. >> > fstab states: > autodefrag,ssd,discard,noatime,defaults,subvol=_r_sl14. > 2,compress=lzo > > However, I used an initrd, so I'm not sure if that is correct? > > Ok, digging into init within my initrd, the line where the root partion > is mounted: > mount -o ro -t $ROOTFS $ROOTDEV /mnt > > Where $ROOTFS is: > btrfs -o subvol=_r_sl14.2 > > and $ROOTDEV is: > /dev/disk/by-uuid/6496aabd-d6aa-49e0-96ca-e49c316edd8e > > > > Pete >
Attachment:
signature.asc
Description: OpenPGP digital signature
