Re: btrfs root fs started remounting ro

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/2/8 下午12:48, John Hendy wrote:
> On Fri, Feb 7, 2020 at 5:42 PM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote:
>>
>>
>>
>> On 2020/2/8 上午1:52, John Hendy wrote:
>>> Greetings,
>>>
>>> I'm resending, as this isn't showing in the archives. Perhaps it was
>>> the attachments, which I've converted to pastebin links.
>>>
>>> As an update, I'm now running off of a different drive (ssd, not the
>>> nvme) and I got the error again! I'm now inclined to think this might
>>> not be hardware after all, but something related to my setup or a bug
>>> with chromium.
>>>
>>> After a reboot, chromium wouldn't start for me and demsg showed
>>> similar parent transid/csum errors to my original post below. I used
>>> btrfs-inspect-internal to find the inode traced to
>>> ~/.config/chromium/History. I deleted that, and got a new set of
>>> errors tracing to ~/.config/chromium/Cookies. After I deleted that and
>>> tried starting chromium, I found that my btrfs /home/jwhendy pool was
>>> mounted ro just like the original problem below.
>>>
>>> dmesg after trying to start chromium:
>>> - https://pastebin.com/CsCEQMJa
>>
>> So far, it's only transid bug in your csum tree.
>>
>> And two backref mismatch in data backref.
>>
>> In theory, you can fix your problem by `btrfs check --repair
>> --init-csum-tree`.
>>
> 
> Now that I might be narrowing in on offending files, I'll wait to see
> what you think from my last response to Chris. I did try the above
> when I first ran into this:
> - https://lore.kernel.org/linux-btrfs/CA+M2ft8FpjdDQ7=XwMdYQazhyB95aha_D4WU_n15M59QrimrRg@xxxxxxxxxxxxxx/

That RO is caused by the missing data backref.

Which can be fixed by btrfs check --repair.

Then you should be able to delete offending files them. (Or the whole
chromium cache, and switch to firefox if you wish :P )

But also please keep in mind that, the transid mismatch looks happen in
your csum tree, which means your csum tree is no longer reliable, and
may cause -EIO reading unrelated files.

Thus it's recommended to re-fill the csum tree by --init-csum-tree.

It can be done altogether by --repair --init-csum-tree, but to be safe,
please run --repair only first, then make sure btrfs check reports no
error after that. Then go --init-csum-tree.

> 
>> But I'm more interesting in how this happened.
> 
> Me too :)
> 
>> Have your every experienced any power loss for your NVME drive?
>> I'm not say btrfs is unsafe against power loss, all fs should be safe
>> against power loss, I'm just curious about if mount time log replay is
>> involved, or just regular internal log replay.
>>
>> From your smartctl, the drive experienced 61 unsafe shutdown with 2144
>> power cycles.
> 
> Uhhh, hell yes, sadly. I'm a dummy running i3 and every time I get
> caught off gaurd by low battery and instant power-off, I kick myself
> and mean to set up a script to force poweroff before that happens. So,
> indeed, I've lost power a ton. Surprised it was 61 times, but maybe
> not over ~2 years. And actually, I mis-stated the age. I haven't
> *booted* from this drive in almost 2yrs. It's a corporate laptop,
> issued every 3, so the ssd drive is more like 5 years old.
> 
>> Not sure if it's related.
>>
>> Another interesting point is, did you remember what's the oldest kernel
>> running on this fs? v5.4 or v5.5?
> 
> Hard to say, but arch linux maintains a package archive. The nvme
> drive is from ~May 2018. The archives only go back to Jan 2019 and the
> kernel/btrfs-progs was at 4.20 then:
> - https://archive.archlinux.org/packages/l/linux/

There is a known bug in v5.2.0~v5.2.14 (fixed in v5.2.15), which could
cause metadata corruption. And the symptom is transid error, which also
matches your problem.

Thanks,
Qu

> 
> Searching my Amazon orders, the SSD was in the 2015 time frame, so the
> kernel version would have been even older.
> 
> Thanks for your input,
> John
> 
>>
>> Thanks,
>> Qu
>>>
>>> Thanks for any pointers, as it would now seem that my purchase of a
>>> new m2.sata may not buy my way out of this problem! While I didn't
>>> want to reinstall, at least new hardware is a simple fix. Now I'm
>>> worried there is a deeper issue bound to recur :(
>>>
>>> Best regards,
>>> John
>>>
>>> On Wed, Feb 5, 2020 at 10:01 AM John Hendy <jw.hendy@xxxxxxxxx> wrote:
>>>>
>>>> Greetings,
>>>>
>>>> I've had this issue occur twice, once ~1mo ago and once a couple of
>>>> weeks ago. Chromium suddenly quit on me, and when trying to start it
>>>> again, it complained about a lock file in ~. I tried to delete it
>>>> manually and was informed I was on a read-only fs! I ended up biting
>>>> the bullet and re-installing linux due to the number of dead end
>>>> threads and slow response rates on diagnosing these issues, and the
>>>> issue occurred again shortly after.
>>>>
>>>> $ uname -a
>>>> Linux whammy 5.5.1-arch1-1 #1 SMP PREEMPT Sat, 01 Feb 2020 16:38:40
>>>> +0000 x86_64 GNU/Linux
>>>>
>>>> $ btrfs --version
>>>> btrfs-progs v5.4
>>>>
>>>> $ btrfs fi df /mnt/misc/ # full device; normally would be mounting a subvol on /
>>>> Data, single: total=114.01GiB, used=80.88GiB
>>>> System, single: total=32.00MiB, used=16.00KiB
>>>> Metadata, single: total=2.01GiB, used=769.61MiB
>>>> GlobalReserve, single: total=140.73MiB, used=0.00B
>>>>
>>>> This is a single device, no RAID, not on a VM. HP Zbook 15.
>>>> nvme0n1                                       259:5    0 232.9G  0 disk
>>>> ├─nvme0n1p1                                   259:6    0   512M  0
>>>> part  (/boot/efi)
>>>> ├─nvme0n1p2                                   259:7    0     1G  0 part  (/boot)
>>>> └─nvme0n1p3                                   259:8    0 231.4G  0 part (btrfs)
>>>>
>>>> I have the following subvols:
>>>> arch: used for / when booting arch
>>>> jwhendy: used for /home/jwhendy on arch
>>>> vault: shared data between distros on /mnt/vault
>>>> bionic: root when booting ubuntu bionic
>>>>
>>>> nvme0n1p3 is encrypted with dm-crypt/LUKS.
>>>>
>>>> dmesg, smartctl, btrfs check, and btrfs dev stats attached.
>>>
>>> Edit: links now:
>>> - btrfs check: https://pastebin.com/nz6Bc145
>>> - dmesg: https://pastebin.com/1GGpNiqk
>>> - smartctl: https://pastebin.com/ADtYqfrd
>>>
>>> btrfs dev stats (not worth a link):
>>>
>>> [/dev/mapper/old].write_io_errs    0
>>> [/dev/mapper/old].read_io_errs     0
>>> [/dev/mapper/old].flush_io_errs    0
>>> [/dev/mapper/old].corruption_errs  0
>>> [/dev/mapper/old].generation_errs  0
>>>
>>>
>>>> If these are of interested, here are reddit threads where I posted the
>>>> issue and was referred here.
>>>> 1) https://www.reddit.com/r/btrfs/comments/ejqhyq/any_hope_of_recovering_from_various_errors_root/
>>>> 2)  https://www.reddit.com/r/btrfs/comments/erh0f6/second_time_btrfs_root_started_remounting_as_ro/
>>>>
>>>> It has been suggested this is a hardware issue. I've already ordered a
>>>> replacement m2.sata, but for sanity it would be great to know
>>>> definitively this was the case. If anything stands out above that
>>>> could indicate I'm not setup properly re. btrfs, that would also be
>>>> fantastic so I don't repeat the issue!
>>>>
>>>> The only thing I've stumbled on is that I have been mounting with
>>>> rd.luks.options=discard and that manually running fstrim is preferred.
>>>>
>>>>
>>>> Many thanks for any input/suggestions,
>>>> John
>>

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux