Re: BTRFS failure after resume from hibernate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I think I have a hunch as to why this issue has occurred. I've had two
btrfs partition failures, and both times it was upon resuming from
hibernation. The key file for the encrypted swap was stored in
/root/key-file, and the openswap hook unlocks the encrypted root,
mounts it, reads the keyfile for the swap partition, and then unmounts
it again. Could this action be causing the transid to be incremented
somehow?

> /etc/initcpio/hooks/openswap
> run_hook ()
> {
>     ## Optional: To avoid race conditions
>     x=0;
>     while [ ! -b /dev/mapper/cryptroot ] && [ $x -le 10 ]; do
>        x=$((x+1))
>        sleep .2
>     done
>     ## End of optional
>
>     mkdir crypto_key_device
>     mount /dev/mapper/cryptroot crypto_key_device
>     cryptsetup open --key-file crypto_key_device/root/key-file /dev/disk/by-uuid/<UUID> swapDevice
>     umount crypto_key_device
> }

The very first line of swsusp[1] has a big fat warning about touching
data on the disk between suspend and resume, and in hindsight I
imagine this action may count. The openswap hook doesn't write
anything, but it's still accessing the disk (however, atime is
disabled in my mount options).

[1]https://www.kernel.org/doc/Documentation/power/swsusp.txt

On Tue, 21 Jan 2020 at 14:51, Robbie Smith <zoqaeski@xxxxxxxxx> wrote:
>
> On Tue, 21 Jan 2020 at 14:05, Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote:
> >
> >
> >
> > On 2020/1/21 上午10:58, Robbie Smith wrote:
> > [...]
> > >>
> > >> Really hard to say, there are at least 3 things related to this problem.
> > >>
> > >> - Btrfs itself
> > >> - Hibernation
> > >> - Dm-crypt (less possible)
> > >>
> > >> For btrfs, if you have used kernel between version v5.2.0 and v5.2.15,
> > >> then it's possible the fs is already corrupted but not detected.
> > >>
> > >> For the hibernation part, Linux is not the best place to utilize it for
> > >> the first place.
> > >> (My ThinkPad X1 Carbon 6th suffers from hibernation, so I rarely use
> > >> suspension/hiberation)
> > >>
> > >> Since linux development is mostly server oriented, such daily consumer
> > >> operation may not be that well tested.
> > >>
> > >> Things like Windows updating certain firmware could break the controller
> > >> behavior and cause unexpected behavior.
> > >>
> > >> So my personal recommendation is, to avoid hibernation/suspension, use
> > >> Windows as little as possible.
> > >>
> > >> Thanks,
> > >> Qu
> > >
> > > Suspension works flawlessly for me, and hibernation usually does as
> > > well. The one thing that has happened both times I've had a failure
> > > has been something weird with the power: first time was a static shock
> > > from walking on carpet and then touching the laptop, second time was
> > > the BIOS reporting a wattage error with the charger.
> >
> > This doesn't look correct for ThinkPad T series machine...
> >
> > >
> > > I tried mounting the FS from a live USB and the mount said: "can't
> > > read superblock on /dev/mapper/cryptroot" in addition to the transid
> > > failures. Should I try running a `btrfs check --repair`? At this point
> > > I'm pretty much resigned to reinstalling today, so I can't make things
> > > any worse, can I?
> >
> > Full output please.
>
> I can't get the output from that mount run as it's lost in the shell
> history. Attempting to mount now does nothing and just spits out:
> > # mount -t btrfs -o ro,usebackuproot /dev/mapper/cryptroot /mnt/cryptroot
> > [dmesg timestamp] BTRFS error (device dm-0): parent transid verify failed on 223452889088 wanted 144360 found 144376
> > [dmesg timestamp] BTRFS error (device dm-0): parent transid verify failed on 223452889088 wanted 144360 found 144376
>
> btrfs check prints the UUID, and that's it.
> > # btrfs check /dev/mapper/cryptroot
> > Opening filesystem to check...
> > Checking filesystem on /dev/mapper/cryptroot
> > UUID: 25ac1f63-5986-4eb8-920f-ed7a5354c076
>
> Attempting a dry-run of btrfs restore gave me these messages. The fact
> that it can read some files and find my /home subvolume gives me some
> hope.
> > # btrfs restore -D /dev/mapper/cryptroot /mnt/restore
> > This is a dry-run, no files are going to be restored
> > We have looped trying to restore files in /@home/robbie/.cache/chromium/Default/Code Cache/js too many times to be making progress, stopping
> > We have looped trying to restore files in /@home/robbie/.cache/chromium/Default/Cache too many times to be making progress, stopping
> > We have looped trying to restore files in /@home/robbie/.cache/chromium/Profile 1/Cache too many times to be making progress, stopping
> > We have looped trying to restore files in /@home/robbie/.cache/chromium/Profile 2/Code Cache/js too many times to be making progress, stopping
> > We have looped trying to restore files in /@home/robbie/.cache/chromium/Profile 2/Cache too many times to be making progress, stopping
> > We have looped trying to restore files in /@home/robbie/.cache/thumbnails/large too many times to be making progress, stopping
> > We have looped trying to restore files in /@home/robbie/.cache/mozilla/firefox/eedh8ma4.default-release/cache2/entries too many times to be making progress, stopping
> > We have looped trying to restore files in /@home/robbie/.config/discord/Cache too many times to be making progress, stopping
>
> I'm going to go get myself a new external drive, reformat it as ext4
> or something (what would be the best filesystem to use?—they always
> come out of the box as NTFS for Windows), and then try restoring my
> filesystem to that. Maybe I can recover things before attempting a
> `btrfs check --repair`. Worst case scenario then is that I have a few
> corrupted files on a spare disk.
>
> >
> > >
> > > I've also used kernel between version 5.2.0 and 5.2.15 on both my
> > > machines, so does that mean there's a risk of undetected disk errors
> > > on my desktop as well?
> >
> > It's possible.
> >
> > > I don't have backups of my backups, and all my
> > > drives use BTRFS because I like the subvolume/snapshot features. I
> > > also don't have a backup of my music/video library because I don't
> > > have another 5 TB HDD.
> >
> > You can just run "btrfs check" from a liveUSB to check if the fs is OK.
> >
> > Thanks,
> > Qu
> >




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux