On Mon, Dec 24, 2018 at 4:31 AM Peter Chant <pete@xxxxxxxxxxxxxxx> wrote: > > Just done that: > bash-4.3# btrfs-image -ss -c9 -t4 /dev/sdd2 > /mnt/backup/btrfs_issue_dec_2018/btrfs_root_image_error_20181224.img > WARNING: cannot find a hash collision for '..', generating garbage, it > won't match indexes normal for some files including .. and files with really short names > Now seems to stopped producing output. Can't see if it is doing > something useful. (note, started again, more such messages) You can watch if the file size of the image is increasing. With SSD it's maybe around 4MB/s throughput, depending on the CPU. > > > > > > There is no log listed in the super so zero-log isn't indicated, and > > also tells me there were no fsync's still flushing at the time of the > > crash. The loss should be at most a minute of data, not an > > inconsistent file system that can't be mounted anymore. Pretty weird. > > > > I think I ran zero-log to see if that helped. Given that there was no > important data and I'd assume I'd either easily fix it, or wipe it and > start over I may have taken the 'monkey radomly pounding the buttons' > approach, short of 'btrfs check --repair'. I only posted here as I > though I'd fixed it apart from the one error! If it were a simple fix > then it was worth asking. Yeah I don't know how zero-log actually works, if it just removes the log tree address in the superblock or if it's more involved than that, and also changes root tree address+generation. > > > > What were your mount options? Defaults? Anything custom like discard, > > commit=, notreelog? Any non-default mount options themselves would not > > be the cause of the problem, but might suggest partial ideas for what > > might have happened. > > > fstab states: > autodefrag,ssd,discard,noatime,defaults,subvol=_r_sl14. > 2,compress=lzo I suggest not using discard when it passes through to the SSD. There are firmware bugs abound that can cause weird problems. And Btrfs does not delay discarding any backup metadata once it's dereferenced. So all the backup trees in the super, once dereferenced also get TRIM'd by the SSD, rendering them useless. I was running with discard mount option on an NVMe drive in my laptop, with no problems. But once I learned how aggressively it TRIMs metdata subject to the discard mount option, I dropped it. These days I just depend on the included systemd fstrim.service to issue TRIM once a week. I supposed it's possible your problem could be discard/TRIM related, but somehow I kinda doubt it. Usually such bugs show up with entire block ranges being wiped out when they shouldn't be (either with zeros or returning corrupt data). In the meantime, drop it for all file systems. And also check to make sure the SSD has the latest firmware version. -- Chris Murphy
