On Tue, 21 Jan 2020 at 13:26, Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote: > > > > On 2020/1/21 上午10:06, Robbie Smith wrote: > > On Tue, 21 Jan 2020 at 12:49, Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote: > >> > >> > >> > >> On 2020/1/21 上午9:39, Robbie Smith wrote: > >>> On Tue, 21 Jan 2020 at 11:10, Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote: > >>>> > >>>> > >>>> > >>>> On 2020/1/20 下午10:45, Robbie Smith wrote: > >>>>> I put my laptop into hibernation mode for a few days so I could boot > >>>>> up into Windows 10 to do some things, and upon waking up BTRFS has > >>>>> borked itself, spitting out errors and locking itself into read-only > >>>>> mode. Is there any up-to-date information on how to fix it, short of > >>>>> wiping the partition and reinstalling (which is what I ended up > >>>>> resorting to last time after none of the attempts to fix it worked)? > >>>>> The error messages in my journal are: > >>>>> > >>>>> BTRFS error (device dm-0): parent transid verify failed on > >>>>> 223458705408 wanted 144360 found 144376 > >>>> > >>>> The fs is already corrupted at this point. > >>>> > >>>>> BTRFS critical (device dm-0): corrupt leaf: block=223455346688 slot=23 > >>>>> extent bytenr=223451267072 len=16384 invalid generation, have 144376 > >>>>> expect (0, 144375] > >>>> > >>>> This is one newer tree-checker added in latest kernel. > >>>> > >>>> It can be fixed with btrfs check in this branch: > >>>> https://github.com/adam900710/btrfs-progs/tree/extent_gen_repair > >>>> > >>>> But that transid error can't be repair, so it doesn't make much sense. > >>>> > >>>>> BTRFS error (device dm-0): block=223455346688 read time tree block > >>>>> corruption detected > >>>>> BTRFS error (device dm-0): error loading props for ino 1032412 (root 258): -5 > >>>>> > >>>>> The parent transid messages are repeated a few times. There's nothing > >>>>> fancy about my BTRFS setup: subvolumes are used to emulate my root and > >>>>> home partition. No RAID, no compression, though the partition does sit > >>>>> beneath a dm-crypt layer using LUKS. Hibernation is done onto a > >>>>> separate swap partion on the same drive. > >>>> > >>>> Please provide the output of "btrfs check" and kernel version. > >>> > >>> Here's the kernel and btrfs information: > >>> > >>>> # uname -a > >>>> Linux rocinante 5.4.10-arch1-1 #1 SMP PREEMPT Thu, 09 Jan 2020 10:14:29 +0000 x86_64 GNU/Linux > >>>> > >>>> # btrfs --version > >>>> btrfs-progs v5.4 > >>>> > >>>> # btrfs fi df / > >>>> Data, single: total=541.01GiB, used=538.54GiB > >>>> System, DUP: total=8.00MiB, used=80.00KiB > >>>> Metadata, DUP: total=3.00GiB, used=1.56GiB > >>>> GlobalReserve, single: total=512.00MiB, used=0.00B > >>>> > >>>> # btrfs fi show > >>>> Label: 'rootfs' uuid: 25ac1f63-5986-4eb8-920f-ed7a5354c076 > >>>> Total devices 1 FS bytes used 540.11GiB > >>>> devid 1 size 794.25GiB used 547.02GiB path /dev/mapper/cryptroot > >>> > >>> I tried a btrfs check and it failed almost immediately. > >>> > >>>> # btrfs check /dev/mapper/cryptroot > >>>> Opening filesystem to check... > >>>> ERROR: /dev/mapper/cryptroot is currently mounted, use --force if you really intend to check the filesystem > >>>> > >>>> # btrfs check --force /dev/mapper/cryptroot > >>>> Opening filesystem to check... > >>>> WARNING: filesystem mounted, continuing because of --force > >>>> Checking filesystem on /dev/mapper/cryptroot > >>>> UUID: 25ac1f63-5986-4eb8-920f-ed7a5354c076 > >>>> [1/7] checking root items > >>>> parent transid verify failed on 223455674368 wanted 144355 found 144376 > >>>> parent transid verify failed on 223455674368 wanted 144355 found 144376 > >>>> parent transid verify failed on 223455674368 wanted 144355 found 144376 > >>>> Ignoring transid failure > >>>> parent transid verify failed on 223452872704 wanted 144358 found 144376 > >>>> parent transid verify failed on 223452872704 wanted 144358 found 144376 > >>>> parent transid verify failed on 223452872704 wanted 144358 found 144376 > >>>> Ignoring transid failure > >>>> ERROR: child eb corrupted: parent bytenr=223602655232 item=233 parent level=1 child level=2 > >>>> ERROR: failed to repair root items: Input/output error > >> > >> The corruption looks happened on root tree. Which is mostly ensured to > >> cause problem for next mount. > >> > >> It's highly recommended to start data salvage. > >> > >>> > >>> I haven't rebooted the laptop, in case this issue makes the laptop > >>> unbootable, but I could try re-running the check from a live USB and > >>> an unmounted filesystem. My Arch Live USB is from June last year, and > >>> it's got kernel 4.20 and btrfs-progs 4.19.1 on it—will they be new > >>> enough, or should I fetch the latest Arch disk and flash a new one? > >> > >> I don't believe newer btrfs-progs can handle it at all. > >> But you can still consider it as a last try. > >> > >> BTW did you have anything weird in dmesg? > > > > dmesg is full of errors from journalctl because the filesystem is > > read-only. Journalctl had paused after resume due to this, and I > > thought I could catch newer messages by running it (isn't it supposed > > to temporarily store logs in volatile storage?), and that made my > > laptop completely die. Every program I had open segfaulted at once, > > and now it's just spooling through dmesg with thousands (if not > > millions) of lines about journalctl being unable to rotate the logs. > > Amazingly enough, I'm still logged in remotely via ssh/mosh, but I > > can't run any commands due to a bus error. I can't even su to root. > > Well, when a fs get fully corrupted, everything can happen. > > > > > I guess I try rebooting it with a Live USB, and running the check > > again, and if that fails, looks like I'll be spending my day > > reinstalling everything. Do I have any better options? The only thing > > that isn't backed up on this machine is my music collection, but > > that's a local lossy copy generated from my lossless library on my > > other machine, so I can recreate it if I need to (I'd rather not—if I > > can mount the fs readonly, I might be able to copy that to a separate > > drive). > > > > What on Earth could possibly cause BTRFS to fail so badly like this, > > with this specific error? I've been using BTRFS for years without > > problems, except this and the exact same error on the same machine six > > months ago. > > Really hard to say, there are at least 3 things related to this problem. > > - Btrfs itself > - Hibernation > - Dm-crypt (less possible) > > For btrfs, if you have used kernel between version v5.2.0 and v5.2.15, > then it's possible the fs is already corrupted but not detected. > > For the hibernation part, Linux is not the best place to utilize it for > the first place. > (My ThinkPad X1 Carbon 6th suffers from hibernation, so I rarely use > suspension/hiberation) > > Since linux development is mostly server oriented, such daily consumer > operation may not be that well tested. > > Things like Windows updating certain firmware could break the controller > behavior and cause unexpected behavior. > > So my personal recommendation is, to avoid hibernation/suspension, use > Windows as little as possible. > > Thanks, > Qu Suspension works flawlessly for me, and hibernation usually does as well. The one thing that has happened both times I've had a failure has been something weird with the power: first time was a static shock from walking on carpet and then touching the laptop, second time was the BIOS reporting a wattage error with the charger. I tried mounting the FS from a live USB and the mount said: "can't read superblock on /dev/mapper/cryptroot" in addition to the transid failures. Should I try running a `btrfs check --repair`? At this point I'm pretty much resigned to reinstalling today, so I can't make things any worse, can I? I've also used kernel between version 5.2.0 and 5.2.15 on both my machines, so does that mean there's a risk of undetected disk errors on my desktop as well? I don't have backups of my backups, and all my drives use BTRFS because I like the subvolume/snapshot features. I also don't have a backup of my music/video library because I don't have another 5 TB HDD. > > > > >> > >>> > >>> In answer to Nikolay's questions, both Windows and Linux share a disk > >>> but are on separate partitions, and Windows did update itself. I've > >>> had Windows updates occur while Linux is hibernated before, and it has > >>> no reason to touch a partition it can't read and never mounts. > >> > >> For the cause, I don't believe it's related to Windows, but the > >> hibernation part. > >> > >> Not sure how hibernation would interact with fs, but my guess is it > >> should at least sync the fs. > >> > >> Anyway, if something extra happened, dmesg should have some clue. > >> > >> > >> Another possible cause is, some older (still v5.x) upstream kernel had > >> some bug, e.g. before v5.2.15/v5.3 there is a bug in btrfs which could > >> cause part of metadata not synced to disk, causing the same transid > >> corruption. > >> > >> And since you're not rebooting, but only hibernate, the problem remains > >> undetected until today... > >> > >> Thanks, > >> Qu > >> > >>> > >>> Robbie > >>>> > >>>> Thanks, > >>>> Qu > >>>> > >>>>> > >>>>> This is the second time in six months this has happened on this > >>>>> laptop. The only other thing I can think of is that the laptop BIOS > >>>>> reported that the charger wasn't supplying the correct wattage, and I > >>>>> have no idea why it would do that—both laptop and charger are nearly > >>>>> brand-new, less than a year old. The laptop model is a Lenovo Thinkpad > >>>>> T470. > >>>>> > >>>>> I've got backups, but reinstalling is a nuisance and I really don't > >>>>> want to spend a couple of days getting the laptop working again. I > >>>>> don't have a conveniently large drive lying around to mirror this one > >>>>> onto. > >>>>> > >>>>> Robbie > >>>>> > >>>> > >> >
