On 2020/1/21 上午10:06, Robbie Smith wrote: > On Tue, 21 Jan 2020 at 12:49, Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote: >> >> >> >> On 2020/1/21 上午9:39, Robbie Smith wrote: >>> On Tue, 21 Jan 2020 at 11:10, Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote: >>>> >>>> >>>> >>>> On 2020/1/20 下午10:45, Robbie Smith wrote: >>>>> I put my laptop into hibernation mode for a few days so I could boot >>>>> up into Windows 10 to do some things, and upon waking up BTRFS has >>>>> borked itself, spitting out errors and locking itself into read-only >>>>> mode. Is there any up-to-date information on how to fix it, short of >>>>> wiping the partition and reinstalling (which is what I ended up >>>>> resorting to last time after none of the attempts to fix it worked)? >>>>> The error messages in my journal are: >>>>> >>>>> BTRFS error (device dm-0): parent transid verify failed on >>>>> 223458705408 wanted 144360 found 144376 >>>> >>>> The fs is already corrupted at this point. >>>> >>>>> BTRFS critical (device dm-0): corrupt leaf: block=223455346688 slot=23 >>>>> extent bytenr=223451267072 len=16384 invalid generation, have 144376 >>>>> expect (0, 144375] >>>> >>>> This is one newer tree-checker added in latest kernel. >>>> >>>> It can be fixed with btrfs check in this branch: >>>> https://github.com/adam900710/btrfs-progs/tree/extent_gen_repair >>>> >>>> But that transid error can't be repair, so it doesn't make much sense. >>>> >>>>> BTRFS error (device dm-0): block=223455346688 read time tree block >>>>> corruption detected >>>>> BTRFS error (device dm-0): error loading props for ino 1032412 (root 258): -5 >>>>> >>>>> The parent transid messages are repeated a few times. There's nothing >>>>> fancy about my BTRFS setup: subvolumes are used to emulate my root and >>>>> home partition. No RAID, no compression, though the partition does sit >>>>> beneath a dm-crypt layer using LUKS. Hibernation is done onto a >>>>> separate swap partion on the same drive. >>>> >>>> Please provide the output of "btrfs check" and kernel version. >>> >>> Here's the kernel and btrfs information: >>> >>>> # uname -a >>>> Linux rocinante 5.4.10-arch1-1 #1 SMP PREEMPT Thu, 09 Jan 2020 10:14:29 +0000 x86_64 GNU/Linux >>>> >>>> # btrfs --version >>>> btrfs-progs v5.4 >>>> >>>> # btrfs fi df / >>>> Data, single: total=541.01GiB, used=538.54GiB >>>> System, DUP: total=8.00MiB, used=80.00KiB >>>> Metadata, DUP: total=3.00GiB, used=1.56GiB >>>> GlobalReserve, single: total=512.00MiB, used=0.00B >>>> >>>> # btrfs fi show >>>> Label: 'rootfs' uuid: 25ac1f63-5986-4eb8-920f-ed7a5354c076 >>>> Total devices 1 FS bytes used 540.11GiB >>>> devid 1 size 794.25GiB used 547.02GiB path /dev/mapper/cryptroot >>> >>> I tried a btrfs check and it failed almost immediately. >>> >>>> # btrfs check /dev/mapper/cryptroot >>>> Opening filesystem to check... >>>> ERROR: /dev/mapper/cryptroot is currently mounted, use --force if you really intend to check the filesystem >>>> >>>> # btrfs check --force /dev/mapper/cryptroot >>>> Opening filesystem to check... >>>> WARNING: filesystem mounted, continuing because of --force >>>> Checking filesystem on /dev/mapper/cryptroot >>>> UUID: 25ac1f63-5986-4eb8-920f-ed7a5354c076 >>>> [1/7] checking root items >>>> parent transid verify failed on 223455674368 wanted 144355 found 144376 >>>> parent transid verify failed on 223455674368 wanted 144355 found 144376 >>>> parent transid verify failed on 223455674368 wanted 144355 found 144376 >>>> Ignoring transid failure >>>> parent transid verify failed on 223452872704 wanted 144358 found 144376 >>>> parent transid verify failed on 223452872704 wanted 144358 found 144376 >>>> parent transid verify failed on 223452872704 wanted 144358 found 144376 >>>> Ignoring transid failure >>>> ERROR: child eb corrupted: parent bytenr=223602655232 item=233 parent level=1 child level=2 >>>> ERROR: failed to repair root items: Input/output error >> >> The corruption looks happened on root tree. Which is mostly ensured to >> cause problem for next mount. >> >> It's highly recommended to start data salvage. >> >>> >>> I haven't rebooted the laptop, in case this issue makes the laptop >>> unbootable, but I could try re-running the check from a live USB and >>> an unmounted filesystem. My Arch Live USB is from June last year, and >>> it's got kernel 4.20 and btrfs-progs 4.19.1 on it—will they be new >>> enough, or should I fetch the latest Arch disk and flash a new one? >> >> I don't believe newer btrfs-progs can handle it at all. >> But you can still consider it as a last try. >> >> BTW did you have anything weird in dmesg? > > dmesg is full of errors from journalctl because the filesystem is > read-only. Journalctl had paused after resume due to this, and I > thought I could catch newer messages by running it (isn't it supposed > to temporarily store logs in volatile storage?), and that made my > laptop completely die. Every program I had open segfaulted at once, > and now it's just spooling through dmesg with thousands (if not > millions) of lines about journalctl being unable to rotate the logs. > Amazingly enough, I'm still logged in remotely via ssh/mosh, but I > can't run any commands due to a bus error. I can't even su to root. Well, when a fs get fully corrupted, everything can happen. > > I guess I try rebooting it with a Live USB, and running the check > again, and if that fails, looks like I'll be spending my day > reinstalling everything. Do I have any better options? The only thing > that isn't backed up on this machine is my music collection, but > that's a local lossy copy generated from my lossless library on my > other machine, so I can recreate it if I need to (I'd rather not—if I > can mount the fs readonly, I might be able to copy that to a separate > drive). > > What on Earth could possibly cause BTRFS to fail so badly like this, > with this specific error? I've been using BTRFS for years without > problems, except this and the exact same error on the same machine six > months ago. Really hard to say, there are at least 3 things related to this problem. - Btrfs itself - Hibernation - Dm-crypt (less possible) For btrfs, if you have used kernel between version v5.2.0 and v5.2.15, then it's possible the fs is already corrupted but not detected. For the hibernation part, Linux is not the best place to utilize it for the first place. (My ThinkPad X1 Carbon 6th suffers from hibernation, so I rarely use suspension/hiberation) Since linux development is mostly server oriented, such daily consumer operation may not be that well tested. Things like Windows updating certain firmware could break the controller behavior and cause unexpected behavior. So my personal recommendation is, to avoid hibernation/suspension, use Windows as little as possible. Thanks, Qu > >> >>> >>> In answer to Nikolay's questions, both Windows and Linux share a disk >>> but are on separate partitions, and Windows did update itself. I've >>> had Windows updates occur while Linux is hibernated before, and it has >>> no reason to touch a partition it can't read and never mounts. >> >> For the cause, I don't believe it's related to Windows, but the >> hibernation part. >> >> Not sure how hibernation would interact with fs, but my guess is it >> should at least sync the fs. >> >> Anyway, if something extra happened, dmesg should have some clue. >> >> >> Another possible cause is, some older (still v5.x) upstream kernel had >> some bug, e.g. before v5.2.15/v5.3 there is a bug in btrfs which could >> cause part of metadata not synced to disk, causing the same transid >> corruption. >> >> And since you're not rebooting, but only hibernate, the problem remains >> undetected until today... >> >> Thanks, >> Qu >> >>> >>> Robbie >>>> >>>> Thanks, >>>> Qu >>>> >>>>> >>>>> This is the second time in six months this has happened on this >>>>> laptop. The only other thing I can think of is that the laptop BIOS >>>>> reported that the charger wasn't supplying the correct wattage, and I >>>>> have no idea why it would do that—both laptop and charger are nearly >>>>> brand-new, less than a year old. The laptop model is a Lenovo Thinkpad >>>>> T470. >>>>> >>>>> I've got backups, but reinstalling is a nuisance and I really don't >>>>> want to spend a couple of days getting the laptop working again. I >>>>> don't have a conveniently large drive lying around to mirror this one >>>>> onto. >>>>> >>>>> Robbie >>>>> >>>> >>
Attachment:
signature.asc
Description: OpenPGP digital signature
