On 2020/5/6 下午11:29, John Hendy wrote: > On Wed, May 6, 2020 at 1:13 AM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote: >> >> >> >> On 2020/5/6 下午12:37, John Hendy wrote: >>> Greetings, >>> >>> >>> I'm following up to the below as this just occurred again. I think >>> there is something odd between btrfs behavior and browsers. Since the >>> last time, I was able to recover my drive, and have disabled >>> continuous trim (and have not manually trimmed for that matter). >>> >>> I've switched to firefox almost exclusively (I can think of a handful >>> of times using it), but the problem was related chromium cache and the >>> problem this time was the file: >>> >>> .cache/mozilla/firefox/tqxxilph.default-release/cache2/entries/D8FD7600C30A3A68D18D98B233F9C5DD3F7DDAD0 >>> >>> In this particular instance, I suspended my computer, and resumed to >>> find it read only. I opened it to reboot into windows, finding I >>> couldn't save my open file in emacs. >>> >>> The dmesg is here: https://pastebin.com/B8nUkYzB >> >> The reason is write time tree checker, surprised it get triggered: >> >> [68515.682152] BTRFS critical (device dm-0): corrupt leaf: root=257 >> block=156161818624 slot=22 ino=1312604, name hash mismatch with key, >> have 0x000000007a63c07f expect 0x00000000006820bc >> >> In the dump included in the dmesg, unfortunately it doesn't include the >> file name so I'm not sure which one is the culprit, but it has the inode >> number, 1312604. > > Thanks for the input. The inode resolves to this path, but it's the > same base path as the problematic file for btrfs scrub. > > $ sudo btrfs inspect-internal inode-resolve 1312604 /home/jwhendy > /home/jwhendy/.cache/mozilla/firefox/tqxxilph.default-release/cache2/entries > >> But consider this is from write time tree checker, not from read time >> tree checker, this means, it's not your on-disk data corrupted from the >> very beginning, but possibly your RAM (maybe related to suspension?) >> causing the problem. > > Interesting. I suspend al the time and have never encountered this, > but I do recall sending an email (in firefox) and quickly closing my > computer afterward as the last thing I did. > >>> >>> The file above was found uncorrectable via btrfs scrub, but after I >>> manually deleted it the scrub succeeded on the second try with no >>> errors. >> >> Unfortunately, it may not related to that file, unless that file has the >> inode number 1312604. >> >> That to say, this is a completely different case. >> >> Considering your previous csum corruption, have you considered a full >> memtest? > > I can certainly do this. At what point could hardware be ruled out and > something else pursued or troubleshot? Or is this a lost cause to try > and understand? If a full memtest run finishes without problem, then we're hitting something impossible. As there shouldn't be anything causing write time tree checker error, especially for name hash. Thanks, Qu > > Many thanks, > John > >> Thanks, >> Qu >> >>> >>> $ btrfs --version >>> btrfs-progs v5.6 >>> >>> $ uname -a >>> Linux voltaur 5.6.10-arch1-1 #1 SMP PREEMPT Sat, 02 May 2020 19:11:54 >>> +0000 x86_64 GNU/Linux >>> >>> I don't know how to reproduce this at all, but it's always been >>> browser cache related. There are similar issues out there, but no >>> obvious pattern/solutions. >>> - https://forum.manjaro.org/t/root-and-home-become-read-only/46944 >>> - https://bbs.archlinux.org/viewtopic.php?id=224243 >>> >>> Anything else to check on why this might occur? >>> >>> Best regards, >>> John >>> >>> >>> On Wed, Feb 5, 2020 at 10:01 AM John Hendy <jw.hendy@xxxxxxxxx> wrote: >>>> >>>> Greetings, >>>> >>>> I've had this issue occur twice, once ~1mo ago and once a couple of >>>> weeks ago. Chromium suddenly quit on me, and when trying to start it >>>> again, it complained about a lock file in ~. I tried to delete it >>>> manually and was informed I was on a read-only fs! I ended up biting >>>> the bullet and re-installing linux due to the number of dead end >>>> threads and slow response rates on diagnosing these issues, and the >>>> issue occurred again shortly after. >>>> >>>> $ uname -a >>>> Linux whammy 5.5.1-arch1-1 #1 SMP PREEMPT Sat, 01 Feb 2020 16:38:40 >>>> +0000 x86_64 GNU/Linux >>>> >>>> $ btrfs --version >>>> btrfs-progs v5.4 >>>> >>>> $ btrfs fi df /mnt/misc/ # full device; normally would be mounting a subvol on / >>>> Data, single: total=114.01GiB, used=80.88GiB >>>> System, single: total=32.00MiB, used=16.00KiB >>>> Metadata, single: total=2.01GiB, used=769.61MiB >>>> GlobalReserve, single: total=140.73MiB, used=0.00B >>>> >>>> This is a single device, no RAID, not on a VM. HP Zbook 15. >>>> nvme0n1 259:5 0 232.9G 0 disk >>>> ├─nvme0n1p1 259:6 0 512M 0 >>>> part (/boot/efi) >>>> ├─nvme0n1p2 259:7 0 1G 0 part (/boot) >>>> └─nvme0n1p3 259:8 0 231.4G 0 part (btrfs) >>>> >>>> I have the following subvols: >>>> arch: used for / when booting arch >>>> jwhendy: used for /home/jwhendy on arch >>>> vault: shared data between distros on /mnt/vault >>>> bionic: root when booting ubuntu bionic >>>> >>>> nvme0n1p3 is encrypted with dm-crypt/LUKS. >>>> >>>> dmesg, smartctl, btrfs check, and btrfs dev stats attached. >>>> >>>> If these are of interested, here are reddit threads where I posted the >>>> issue and was referred here. >>>> 1) https://www.reddit.com/r/btrfs/comments/ejqhyq/any_hope_of_recovering_from_various_errors_root/ >>>> 2) https://www.reddit.com/r/btrfs/comments/erh0f6/second_time_btrfs_root_started_remounting_as_ro/ >>>> >>>> It has been suggested this is a hardware issue. I've already ordered a >>>> replacement m2.sata, but for sanity it would be great to know >>>> definitively this was the case. If anything stands out above that >>>> could indicate I'm not setup properly re. btrfs, that would also be >>>> fantastic so I don't repeat the issue! >>>> >>>> The only thing I've stumbled on is that I have been mounting with >>>> rd.luks.options=discard and that manually running fstrim is preferred. >>>> >>>> >>>> Many thanks for any input/suggestions, >>>> John >>
Attachment:
signature.asc
Description: OpenPGP digital signature
