On 2020/2/8 下午12:48, John Hendy wrote: > On Fri, Feb 7, 2020 at 5:42 PM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote: >> >> >> >> On 2020/2/8 上午1:52, John Hendy wrote: >>> Greetings, >>> >>> I'm resending, as this isn't showing in the archives. Perhaps it was >>> the attachments, which I've converted to pastebin links. >>> >>> As an update, I'm now running off of a different drive (ssd, not the >>> nvme) and I got the error again! I'm now inclined to think this might >>> not be hardware after all, but something related to my setup or a bug >>> with chromium. >>> >>> After a reboot, chromium wouldn't start for me and demsg showed >>> similar parent transid/csum errors to my original post below. I used >>> btrfs-inspect-internal to find the inode traced to >>> ~/.config/chromium/History. I deleted that, and got a new set of >>> errors tracing to ~/.config/chromium/Cookies. After I deleted that and >>> tried starting chromium, I found that my btrfs /home/jwhendy pool was >>> mounted ro just like the original problem below. >>> >>> dmesg after trying to start chromium: >>> - https://pastebin.com/CsCEQMJa >> >> So far, it's only transid bug in your csum tree. >> >> And two backref mismatch in data backref. >> >> In theory, you can fix your problem by `btrfs check --repair >> --init-csum-tree`. >> > > Now that I might be narrowing in on offending files, I'll wait to see > what you think from my last response to Chris. I did try the above > when I first ran into this: > - https://lore.kernel.org/linux-btrfs/CA+M2ft8FpjdDQ7=XwMdYQazhyB95aha_D4WU_n15M59QrimrRg@xxxxxxxxxxxxxx/ That RO is caused by the missing data backref. Which can be fixed by btrfs check --repair. Then you should be able to delete offending files them. (Or the whole chromium cache, and switch to firefox if you wish :P ) But also please keep in mind that, the transid mismatch looks happen in your csum tree, which means your csum tree is no longer reliable, and may cause -EIO reading unrelated files. Thus it's recommended to re-fill the csum tree by --init-csum-tree. It can be done altogether by --repair --init-csum-tree, but to be safe, please run --repair only first, then make sure btrfs check reports no error after that. Then go --init-csum-tree. > >> But I'm more interesting in how this happened. > > Me too :) > >> Have your every experienced any power loss for your NVME drive? >> I'm not say btrfs is unsafe against power loss, all fs should be safe >> against power loss, I'm just curious about if mount time log replay is >> involved, or just regular internal log replay. >> >> From your smartctl, the drive experienced 61 unsafe shutdown with 2144 >> power cycles. > > Uhhh, hell yes, sadly. I'm a dummy running i3 and every time I get > caught off gaurd by low battery and instant power-off, I kick myself > and mean to set up a script to force poweroff before that happens. So, > indeed, I've lost power a ton. Surprised it was 61 times, but maybe > not over ~2 years. And actually, I mis-stated the age. I haven't > *booted* from this drive in almost 2yrs. It's a corporate laptop, > issued every 3, so the ssd drive is more like 5 years old. > >> Not sure if it's related. >> >> Another interesting point is, did you remember what's the oldest kernel >> running on this fs? v5.4 or v5.5? > > Hard to say, but arch linux maintains a package archive. The nvme > drive is from ~May 2018. The archives only go back to Jan 2019 and the > kernel/btrfs-progs was at 4.20 then: > - https://archive.archlinux.org/packages/l/linux/ There is a known bug in v5.2.0~v5.2.14 (fixed in v5.2.15), which could cause metadata corruption. And the symptom is transid error, which also matches your problem. Thanks, Qu > > Searching my Amazon orders, the SSD was in the 2015 time frame, so the > kernel version would have been even older. > > Thanks for your input, > John > >> >> Thanks, >> Qu >>> >>> Thanks for any pointers, as it would now seem that my purchase of a >>> new m2.sata may not buy my way out of this problem! While I didn't >>> want to reinstall, at least new hardware is a simple fix. Now I'm >>> worried there is a deeper issue bound to recur :( >>> >>> Best regards, >>> John >>> >>> On Wed, Feb 5, 2020 at 10:01 AM John Hendy <jw.hendy@xxxxxxxxx> wrote: >>>> >>>> Greetings, >>>> >>>> I've had this issue occur twice, once ~1mo ago and once a couple of >>>> weeks ago. Chromium suddenly quit on me, and when trying to start it >>>> again, it complained about a lock file in ~. I tried to delete it >>>> manually and was informed I was on a read-only fs! I ended up biting >>>> the bullet and re-installing linux due to the number of dead end >>>> threads and slow response rates on diagnosing these issues, and the >>>> issue occurred again shortly after. >>>> >>>> $ uname -a >>>> Linux whammy 5.5.1-arch1-1 #1 SMP PREEMPT Sat, 01 Feb 2020 16:38:40 >>>> +0000 x86_64 GNU/Linux >>>> >>>> $ btrfs --version >>>> btrfs-progs v5.4 >>>> >>>> $ btrfs fi df /mnt/misc/ # full device; normally would be mounting a subvol on / >>>> Data, single: total=114.01GiB, used=80.88GiB >>>> System, single: total=32.00MiB, used=16.00KiB >>>> Metadata, single: total=2.01GiB, used=769.61MiB >>>> GlobalReserve, single: total=140.73MiB, used=0.00B >>>> >>>> This is a single device, no RAID, not on a VM. HP Zbook 15. >>>> nvme0n1 259:5 0 232.9G 0 disk >>>> ├─nvme0n1p1 259:6 0 512M 0 >>>> part (/boot/efi) >>>> ├─nvme0n1p2 259:7 0 1G 0 part (/boot) >>>> └─nvme0n1p3 259:8 0 231.4G 0 part (btrfs) >>>> >>>> I have the following subvols: >>>> arch: used for / when booting arch >>>> jwhendy: used for /home/jwhendy on arch >>>> vault: shared data between distros on /mnt/vault >>>> bionic: root when booting ubuntu bionic >>>> >>>> nvme0n1p3 is encrypted with dm-crypt/LUKS. >>>> >>>> dmesg, smartctl, btrfs check, and btrfs dev stats attached. >>> >>> Edit: links now: >>> - btrfs check: https://pastebin.com/nz6Bc145 >>> - dmesg: https://pastebin.com/1GGpNiqk >>> - smartctl: https://pastebin.com/ADtYqfrd >>> >>> btrfs dev stats (not worth a link): >>> >>> [/dev/mapper/old].write_io_errs 0 >>> [/dev/mapper/old].read_io_errs 0 >>> [/dev/mapper/old].flush_io_errs 0 >>> [/dev/mapper/old].corruption_errs 0 >>> [/dev/mapper/old].generation_errs 0 >>> >>> >>>> If these are of interested, here are reddit threads where I posted the >>>> issue and was referred here. >>>> 1) https://www.reddit.com/r/btrfs/comments/ejqhyq/any_hope_of_recovering_from_various_errors_root/ >>>> 2) https://www.reddit.com/r/btrfs/comments/erh0f6/second_time_btrfs_root_started_remounting_as_ro/ >>>> >>>> It has been suggested this is a hardware issue. I've already ordered a >>>> replacement m2.sata, but for sanity it would be great to know >>>> definitively this was the case. If anything stands out above that >>>> could indicate I'm not setup properly re. btrfs, that would also be >>>> fantastic so I don't repeat the issue! >>>> >>>> The only thing I've stumbled on is that I have been mounting with >>>> rd.luks.options=discard and that manually running fstrim is preferred. >>>> >>>> >>>> Many thanks for any input/suggestions, >>>> John >>
Attachment:
signature.asc
Description: OpenPGP digital signature
