On 12/24/18 12:58 AM, Chris Murphy wrote: > On Sat, Dec 22, 2018 at 10:22 AM Peter Chant <pete@xxxxxxxxxxxxxxx> wrote: > >> btrfs rescue super -v /dev/sdb2 > ... >> All supers are valid, no need to recover >> >> >> btrfs insp dump-s -f <dev> > ... >> generation 7937947 > ... >> backup 0: >> backup_tree_root: 1113909100544 gen: 7937935 level: 1 > ... >> backup 1: >> backup_tree_root: 1113907347456 gen: 7937936 level: 1 > ... >> backup 2: >> backup_tree_root: 1113911951360 gen: 7937937 level: 1 > ... >> backup 3: >> backup_tree_root: 1113907494912 gen: 7937934 level: 1 > ... > > > The kernel wrote out three valid checksummed supers, with what seems > to be a rather significant sanity violation. The super generation and > tree root address do not match any of the backup tree roots. The > *current* tree root is supposed to be in one of the backups as well. > I wonder if this is a result of my trying to fix things? E.g. btrfs rescue super-recover or my attempts using the tools (and kernel) in Mint 18.1 at one point? I must admit, early on I had assumed that either this file system was a simple fix or was completely trashed, so I thought I'd have a quick go at fixing it, or wipe it and start again. But then I seemed to get close with only the one error, but unmountable. > Qu, any idea how this is even theoretically possible? Bit flip right > before the super is computed and checksummed? Seems like some kind of > corruption before checksum is computed. > > >> I'm getting suspicious of the drive as when I was trying the various >> btrfs rescue * tools I saw a 'bad block', or similar, error displayed. >> I also have a separate basic install on ext4 on the same disk. Though >> e2fsck shows no errors and mounts fine I cannot log into that install. >> Maybe a coincidence, but too many bad things thrown up make me >> suspicious. Whatever is happening this seems to be really fighting me. > > I'm not sure how even a bad device accounts for the super generation > and backup mismatches. That's damn strange. I'm less suspicious of the drive now. I've been using an ext4 partition on the same drive for a few days now, having reinstalled on that and everything _seems_ fine. Mind you, apart from usb sticks, I've not experienced a ssd failure. Perhaps my hdd failure experience is not relevent, i.e. they work until they start throwing errors and then rapidly fail? > > If you get bored with the back and forth and just want to give up, > that's fine. I suggest that if you have the time and space, to take a > btrfs-image in case Qu or some other developer wants to look at this > file system at some point. The btrfs-image is a read only process, can > be set to scrub filenames, and only contains metadata. Size of the > resulting file is around 1/2 of the size of metadata, when doing > 'btrfs filesystem usage' or 'btrfs filesystem df'. So you'll need that > much free space to direct the command to. > > btrfs-image -ss -c9 -t4 <devicetoimage> pathtofile Just done that: bash-4.3# btrfs-image -ss -c9 -t4 /dev/sdd2 /mnt/backup/btrfs_issue_dec_2018/btrfs_root_image_error_20181224.img WARNING: cannot find a hash collision for '..', generating garbage, it won't match indexes > > It might fail, if so you can try adding -w and see if that helps. OK, try with -w: OK, many many complaints about hash collisions: ... ARNING: cannot find a hash collision for 'ifup', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'catv', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'FDPC', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'LIBS', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'INTC', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'SPI', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'PDCA', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'EBI', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'SMC', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'WIFI', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'LWIP', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'HID', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'yun', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'avr4', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'avr6', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'WiFi', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'TFT', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'Knob', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'FP.h', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'SD.h', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'Beep', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'FORK', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'CHM', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'HandS', generating garbage, it won't match indexes WARNING: cannot find a hash collision for 'dm-0', generating garbage, it won't match indexes Now seems to stopped producing output. Can't see if it is doing something useful. (note, started again, more such messages) > > There is no log listed in the super so zero-log isn't indicated, and > also tells me there were no fsync's still flushing at the time of the > crash. The loss should be at most a minute of data, not an > inconsistent file system that can't be mounted anymore. Pretty weird. > I think I ran zero-log to see if that helped. Given that there was no important data and I'd assume I'd either easily fix it, or wipe it and start over I may have taken the 'monkey radomly pounding the buttons' approach, short of 'btrfs check --repair'. I only posted here as I though I'd fixed it apart from the one error! If it were a simple fix then it was worth asking. > What were your mount options? Defaults? Anything custom like discard, > commit=, notreelog? Any non-default mount options themselves would not > be the cause of the problem, but might suggest partial ideas for what > might have happened. > fstab states: autodefrag,ssd,discard,noatime,defaults,subvol=_r_sl14. 2,compress=lzo However, I used an initrd, so I'm not sure if that is correct? Ok, digging into init within my initrd, the line where the root partion is mounted: mount -o ro -t $ROOTFS $ROOTDEV /mnt Where $ROOTFS is: btrfs -o subvol=_r_sl14.2 and $ROOTDEV is: /dev/disk/by-uuid/6496aabd-d6aa-49e0-96ca-e49c316edd8e Pete
