On Fri, Feb 7, 2020 at 2:22 PM Chris Murphy <lists@xxxxxxxxxxxxxxxxx> wrote: > > On Fri, Feb 7, 2020 at 10:52 AM John Hendy <jw.hendy@xxxxxxxxx> wrote: > > > As an update, I'm now running off of a different drive (ssd, not the > > nvme) and I got the error again! I'm now inclined to think this might > > not be hardware after all, but something related to my setup or a bug > > with chromium. > > Even if there's a Chromium bug, it should result in file system > corruption like what you're seeing. I'm assuming you meant "*shouldn't* result in file system corruption"? > > > dmesg after trying to start chromium: > > - https://pastebin.com/CsCEQMJa > > Could you post the entire dmesg, start to finish, for the boot in > which this first occurred? Indeed. Just reproduced it: - https://pastebin.com/UJ8gbgFE Aside: is there a preferred way for sharing these? The page I read about this list said text couldn't exceed 100kb, but my original appears to have bounced and the dmesg alone is >100kb... Just want to make sure pastebin is cool and am happy to use something better/preferred. > This transid isn't realistic, in particular for a filesystem this new. Clarification, and apologies for the confusion: - the m2.sata in my original post was my primary drive and had an issue, then I wiped, mkfs.btrfs from scratch, reinstalled linux, etc. and it happened again. - the ssd I'm now running on was the former boot drive in my last computer which I was using as a backup drive for /mnt/vault pool but still had the old root fs. After the m2.sata failure, I started booting from it. It is not a new fs but >2yrs old. If you'd like, let's stick to troubleshooting the ssd for now. > [ 60.697438] BTRFS error (device dm-0): parent transid verify failed > on 202711384064 wanted 68719924810 found 448074 > [ 60.697457] BTRFS info (device dm-0): no csum found for inode 19064 > start 2392064 > [ 60.697777] BTRFS warning (device dm-0): csum failed root 339 ino > 19064 off 2392064 csum 0x8941f998 expected csum 0x00000000 mirror 1 > > Expected csum null? Are these files using chattr +C? Something like > this might help figure it out: > > $ sudo btrfs insp inod -v 19064 /home $ sudo btrfs insp inod -v 19056 /home/jwhendy ioctl ret=0, bytes_left=4039, bytes_missing=0, cnt=1, missed=0 /home/jwhendy/.config/chromium/Default/Cookies > $ lsattr /path/to/that/file/ $ lsattr /home/jwhendy/.config/chromium/Default/Cookies -------------------- /home/jwhendy/.config/chromium/Default/Cookies > Report output for both. > > > > Thanks for any pointers, as it would now seem that my purchase of a > > new m2.sata may not buy my way out of this problem! While I didn't > > want to reinstall, at least new hardware is a simple fix. Now I'm > > worried there is a deeper issue bound to recur :( > > Yep. And fixing Btrfs is not simple. > > > > nvme0n1p3 is encrypted with dm-crypt/LUKS. > > I don't think the problem is here, except that I sooner believe > there's a regression in dm-crypt or Btrfs with discards, than I > believe two different drives have discard related bugs. > > > > > The only thing I've stumbled on is that I have been mounting with > > > rd.luks.options=discard and that manually running fstrim is preferred. > > This was the case for both the NVMe and SSD drives? Yes, though I have turned that off for the SSD ever since I started booting from it. That said, I realized that discard is still in my fstab... is this a potential source of the transid/csum issues? I've now removed that and am about to reboot after I send this. $ cat /etc/fstab /dev/mapper/luks-0712af67-3f01-4dde-9d45-194df9d29d14 on / type btrfs (rw,relatime,compress=lzo,ssd,discard,space_cache,subvolid=263,subvol=/arch) /dev/mapper/luks-0712af67-3f01-4dde-9d45-194df9d29d14 on /home/jwhendy type btrfs (rw,relatime,compress=lzo,ssd,discard,space_cache,subvolid=339,subvol=/jwhendy) /dev/mapper/luks-0712af67-3f01-4dde-9d45-194df9d29d14 on /mnt/vault type btrfs (rw,relatime,compress=lzo,ssd,discard,space_cache,subvolid=265,subvol=/vault) > What was the kernel version this problem first appeared on with NVMe? > For the (new) SSD you're using 5.5.1, correct? I just updated today which put me at 5.5.2, but in theory yes. And as I went to check that I get an Input/Output error trying to check the pacman log! Here's the dmesg with those new errors included: - https://pastebin.com/QzYQ2RRg I'm still mounted rw, but my gosh... what the heck is happening. The output is for a different root/inode: $ sudo btrfs insp inod -v 273 / ioctl ret=0, bytes_left=4053, bytes_missing=0, cnt=1, missed=0 //var/log/pacman.log Is the double // a concern for that file? $ sudo lsattr /var/log/pacman.log -------------------- /var/log/pacman.log > Can you correlate both corruption events to recent use of fstrim? I've never used fstrim manually on either drive. > What are the make/model of both drives? - ssd: Samsung 850 evo, 250G - m2.sata: nvme Samsung 960 evo, 250G > In the meantime, I suggest refreshing backups. Btrfs won't allow files > with checksums that it knows are corrupt to be copied to user space. > But it sounds like so far the only files affected are Chrome cache > files? If so this is relatively straight forward to get back to a > healthy file system. And then it's time to start iterating some of the > setup to find out what's causing the problem. So far, it seemed limited to chromium. I'm not sure about the new input/output error trying to cat/grep on /var/log/pacman.log. I can also ro my old drive just fine and have not done anything significant on the new one. If/when we get to potential destructive operations, I'll certainly re-up prior to doing those. Really appreciate the help! John > > -- > Chris Murphy
