Re: BTRFS failure after resume from hibernate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 21, 2020 at 2:01 PM Robbie Smith <zoqaeski@xxxxxxxxx> wrote:
>
> I think I have a hunch as to why this issue has occurred. I've had two
> btrfs partition failures, and both times it was upon resuming from
> hibernation. The key file for the encrypted swap was stored in
> /root/key-file, and the openswap hook unlocks the encrypted root,
> mounts it, reads the keyfile for the swap partition, and then unmounts
> it again. Could this action be causing the transid to be incremented
> somehow?
>

Of course. This means on-disk state is different from in-memory state
after resuming. You must not access filesystem stored in hibernation
image before resuming.

File bug report against whatever component does it.

> > /etc/initcpio/hooks/openswap
> > run_hook ()
> > {
> >     ## Optional: To avoid race conditions
> >     x=0;
> >     while [ ! -b /dev/mapper/cryptroot ] && [ $x -le 10 ]; do
> >        x=$((x+1))
> >        sleep .2
> >     done
> >     ## End of optional
> >
> >     mkdir crypto_key_device
> >     mount /dev/mapper/cryptroot crypto_key_device

What /may/ work is to mount read-only, although even in this case
btrfs may replay previous transaction. "mount -o ro,nologreplay" may
work.

> >     cryptsetup open --key-file crypto_key_device/root/key-file /dev/disk/by-uuid/<UUID> swapDevice
> >     umount crypto_key_device
> > }
>
> The very first line of swsusp[1] has a big fat warning about touching
> data on the disk between suspend and resume, and in hindsight I
> imagine this action may count. The openswap hook doesn't write
> anything, but it's still accessing the disk (however, atime is
> disabled in my mount options).
>
> [1]https://www.kernel.org/doc/Documentation/power/swsusp.txt
>
> On Tue, 21 Jan 2020 at 14:51, Robbie Smith <zoqaeski@xxxxxxxxx> wrote:
> >
> > On Tue, 21 Jan 2020 at 14:05, Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote:
> > >
> > >
> > >
> > > On 2020/1/21 上午10:58, Robbie Smith wrote:
> > > [...]
> > > >>
> > > >> Really hard to say, there are at least 3 things related to this problem.
> > > >>
> > > >> - Btrfs itself
> > > >> - Hibernation
> > > >> - Dm-crypt (less possible)
> > > >>
> > > >> For btrfs, if you have used kernel between version v5.2.0 and v5.2.15,
> > > >> then it's possible the fs is already corrupted but not detected.
> > > >>
> > > >> For the hibernation part, Linux is not the best place to utilize it for
> > > >> the first place.
> > > >> (My ThinkPad X1 Carbon 6th suffers from hibernation, so I rarely use
> > > >> suspension/hiberation)
> > > >>
> > > >> Since linux development is mostly server oriented, such daily consumer
> > > >> operation may not be that well tested.
> > > >>
> > > >> Things like Windows updating certain firmware could break the controller
> > > >> behavior and cause unexpected behavior.
> > > >>
> > > >> So my personal recommendation is, to avoid hibernation/suspension, use
> > > >> Windows as little as possible.
> > > >>
> > > >> Thanks,
> > > >> Qu
> > > >
> > > > Suspension works flawlessly for me, and hibernation usually does as
> > > > well. The one thing that has happened both times I've had a failure
> > > > has been something weird with the power: first time was a static shock
> > > > from walking on carpet and then touching the laptop, second time was
> > > > the BIOS reporting a wattage error with the charger.
> > >
> > > This doesn't look correct for ThinkPad T series machine...
> > >
> > > >
> > > > I tried mounting the FS from a live USB and the mount said: "can't
> > > > read superblock on /dev/mapper/cryptroot" in addition to the transid
> > > > failures. Should I try running a `btrfs check --repair`? At this point
> > > > I'm pretty much resigned to reinstalling today, so I can't make things
> > > > any worse, can I?
> > >
> > > Full output please.
> >
> > I can't get the output from that mount run as it's lost in the shell
> > history. Attempting to mount now does nothing and just spits out:
> > > # mount -t btrfs -o ro,usebackuproot /dev/mapper/cryptroot /mnt/cryptroot
> > > [dmesg timestamp] BTRFS error (device dm-0): parent transid verify failed on 223452889088 wanted 144360 found 144376
> > > [dmesg timestamp] BTRFS error (device dm-0): parent transid verify failed on 223452889088 wanted 144360 found 144376
> >
> > btrfs check prints the UUID, and that's it.
> > > # btrfs check /dev/mapper/cryptroot
> > > Opening filesystem to check...
> > > Checking filesystem on /dev/mapper/cryptroot
> > > UUID: 25ac1f63-5986-4eb8-920f-ed7a5354c076
> >
> > Attempting a dry-run of btrfs restore gave me these messages. The fact
> > that it can read some files and find my /home subvolume gives me some
> > hope.
> > > # btrfs restore -D /dev/mapper/cryptroot /mnt/restore
> > > This is a dry-run, no files are going to be restored
> > > We have looped trying to restore files in /@home/robbie/.cache/chromium/Default/Code Cache/js too many times to be making progress, stopping
> > > We have looped trying to restore files in /@home/robbie/.cache/chromium/Default/Cache too many times to be making progress, stopping
> > > We have looped trying to restore files in /@home/robbie/.cache/chromium/Profile 1/Cache too many times to be making progress, stopping
> > > We have looped trying to restore files in /@home/robbie/.cache/chromium/Profile 2/Code Cache/js too many times to be making progress, stopping
> > > We have looped trying to restore files in /@home/robbie/.cache/chromium/Profile 2/Cache too many times to be making progress, stopping
> > > We have looped trying to restore files in /@home/robbie/.cache/thumbnails/large too many times to be making progress, stopping
> > > We have looped trying to restore files in /@home/robbie/.cache/mozilla/firefox/eedh8ma4.default-release/cache2/entries too many times to be making progress, stopping
> > > We have looped trying to restore files in /@home/robbie/.config/discord/Cache too many times to be making progress, stopping
> >
> > I'm going to go get myself a new external drive, reformat it as ext4
> > or something (what would be the best filesystem to use?—they always
> > come out of the box as NTFS for Windows), and then try restoring my
> > filesystem to that. Maybe I can recover things before attempting a
> > `btrfs check --repair`. Worst case scenario then is that I have a few
> > corrupted files on a spare disk.
> >
> > >
> > > >
> > > > I've also used kernel between version 5.2.0 and 5.2.15 on both my
> > > > machines, so does that mean there's a risk of undetected disk errors
> > > > on my desktop as well?
> > >
> > > It's possible.
> > >
> > > > I don't have backups of my backups, and all my
> > > > drives use BTRFS because I like the subvolume/snapshot features. I
> > > > also don't have a backup of my music/video library because I don't
> > > > have another 5 TB HDD.
> > >
> > > You can just run "btrfs check" from a liveUSB to check if the fs is OK.
> > >
> > > Thanks,
> > > Qu
> > >




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux