Re: btrfs root fs started remounting ro

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/5/6 下午11:29, John Hendy wrote:
> On Wed, May 6, 2020 at 1:13 AM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote:
>>
>>
>>
>> On 2020/5/6 下午12:37, John Hendy wrote:
>>> Greetings,
>>>
>>>
>>> I'm following up to the below as this just occurred again. I think
>>> there is something odd between btrfs behavior and browsers. Since the
>>> last time, I was able to recover my drive, and have disabled
>>> continuous trim (and have not manually trimmed for that matter).
>>>
>>> I've switched to firefox almost exclusively (I can think of a handful
>>> of times using it), but the problem was related chromium cache and the
>>> problem this time was the file:
>>>
>>> .cache/mozilla/firefox/tqxxilph.default-release/cache2/entries/D8FD7600C30A3A68D18D98B233F9C5DD3F7DDAD0
>>>
>>> In this particular instance, I suspended my computer, and resumed to
>>> find it read only. I opened it to reboot into windows, finding I
>>> couldn't save my open file in emacs.
>>>
>>> The dmesg is here: https://pastebin.com/B8nUkYzB
>>
>> The reason is write time tree checker, surprised it get triggered:
>>
>> [68515.682152] BTRFS critical (device dm-0): corrupt leaf: root=257
>> block=156161818624 slot=22 ino=1312604, name hash mismatch with key,
>> have 0x000000007a63c07f expect 0x00000000006820bc
>>
>> In the dump included in the dmesg, unfortunately it doesn't include the
>> file name so I'm not sure which one is the culprit, but it has the inode
>> number, 1312604.
> 
> Thanks for the input. The inode resolves to this path, but it's the
> same base path as the problematic file for btrfs scrub.
> 
> $ sudo btrfs inspect-internal inode-resolve 1312604 /home/jwhendy
> /home/jwhendy/.cache/mozilla/firefox/tqxxilph.default-release/cache2/entries
> 
>> But consider this is from write time tree checker, not from read time
>> tree checker, this means, it's not your on-disk data corrupted from the
>> very beginning, but possibly your RAM (maybe related to suspension?)
>> causing the problem.
> 
> Interesting. I suspend al the time and have never encountered this,
> but I do recall sending an email (in firefox) and quickly closing my
> computer afterward as the last thing I did.
> 
>>>
>>> The file above was found uncorrectable via btrfs scrub, but after I
>>> manually deleted it the scrub succeeded on the second try with no
>>> errors.
>>
>> Unfortunately, it may not related to that file, unless that file has the
>> inode number 1312604.
>>
>> That to say, this is a completely different case.
>>
>> Considering your previous csum corruption, have you considered a full
>> memtest?
> 
> I can certainly do this. At what point could hardware be ruled out and
> something else pursued or troubleshot? Or is this a lost cause to try
> and understand?

If a full memtest run finishes without problem, then we're hitting
something impossible.

As there shouldn't be anything causing write time tree checker error,
especially for name hash.

Thanks,
Qu

> 
> Many thanks,
> John
> 
>> Thanks,
>> Qu
>>
>>>
>>> $ btrfs --version
>>> btrfs-progs v5.6
>>>
>>> $ uname -a
>>> Linux voltaur 5.6.10-arch1-1 #1 SMP PREEMPT Sat, 02 May 2020 19:11:54
>>> +0000 x86_64 GNU/Linux
>>>
>>> I don't know how to reproduce this at all, but it's always been
>>> browser cache related. There are similar issues out there, but no
>>> obvious pattern/solutions.
>>> - https://forum.manjaro.org/t/root-and-home-become-read-only/46944
>>> - https://bbs.archlinux.org/viewtopic.php?id=224243
>>>
>>> Anything else to check on why this might occur?
>>>
>>> Best regards,
>>> John
>>>
>>>
>>> On Wed, Feb 5, 2020 at 10:01 AM John Hendy <jw.hendy@xxxxxxxxx> wrote:
>>>>
>>>> Greetings,
>>>>
>>>> I've had this issue occur twice, once ~1mo ago and once a couple of
>>>> weeks ago. Chromium suddenly quit on me, and when trying to start it
>>>> again, it complained about a lock file in ~. I tried to delete it
>>>> manually and was informed I was on a read-only fs! I ended up biting
>>>> the bullet and re-installing linux due to the number of dead end
>>>> threads and slow response rates on diagnosing these issues, and the
>>>> issue occurred again shortly after.
>>>>
>>>> $ uname -a
>>>> Linux whammy 5.5.1-arch1-1 #1 SMP PREEMPT Sat, 01 Feb 2020 16:38:40
>>>> +0000 x86_64 GNU/Linux
>>>>
>>>> $ btrfs --version
>>>> btrfs-progs v5.4
>>>>
>>>> $ btrfs fi df /mnt/misc/ # full device; normally would be mounting a subvol on /
>>>> Data, single: total=114.01GiB, used=80.88GiB
>>>> System, single: total=32.00MiB, used=16.00KiB
>>>> Metadata, single: total=2.01GiB, used=769.61MiB
>>>> GlobalReserve, single: total=140.73MiB, used=0.00B
>>>>
>>>> This is a single device, no RAID, not on a VM. HP Zbook 15.
>>>> nvme0n1                                       259:5    0 232.9G  0 disk
>>>> ├─nvme0n1p1                                   259:6    0   512M  0
>>>> part  (/boot/efi)
>>>> ├─nvme0n1p2                                   259:7    0     1G  0 part  (/boot)
>>>> └─nvme0n1p3                                   259:8    0 231.4G  0 part (btrfs)
>>>>
>>>> I have the following subvols:
>>>> arch: used for / when booting arch
>>>> jwhendy: used for /home/jwhendy on arch
>>>> vault: shared data between distros on /mnt/vault
>>>> bionic: root when booting ubuntu bionic
>>>>
>>>> nvme0n1p3 is encrypted with dm-crypt/LUKS.
>>>>
>>>> dmesg, smartctl, btrfs check, and btrfs dev stats attached.
>>>>
>>>> If these are of interested, here are reddit threads where I posted the
>>>> issue and was referred here.
>>>> 1) https://www.reddit.com/r/btrfs/comments/ejqhyq/any_hope_of_recovering_from_various_errors_root/
>>>> 2)  https://www.reddit.com/r/btrfs/comments/erh0f6/second_time_btrfs_root_started_remounting_as_ro/
>>>>
>>>> It has been suggested this is a hardware issue. I've already ordered a
>>>> replacement m2.sata, but for sanity it would be great to know
>>>> definitively this was the case. If anything stands out above that
>>>> could indicate I'm not setup properly re. btrfs, that would also be
>>>> fantastic so I don't repeat the issue!
>>>>
>>>> The only thing I've stumbled on is that I have been mounting with
>>>> rd.luks.options=discard and that manually running fstrim is preferred.
>>>>
>>>>
>>>> Many thanks for any input/suggestions,
>>>> John
>>

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux