Re: Unbootable root btrfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 17, 2019 at 10:39 PM Robert White <rwhite@xxxxxxxxx> wrote:
>
> On 5/18/19 4:06 AM, Chris Murphy wrote:
> > On Fri, May 17, 2019 at 2:18 AM Lee Fleming <leeflemingster@xxxxxxxxx> wrote:
> >>
> >> I didn't see that particular warning. I did see a warning that it could cause damage and should be tried after trying some other things which I did. The data on this drive isn't important. I just wanted to see if it could be recovered before reinstalling.
> >>
> >> There was no crash, just a reboot. I was setting up KVM and I rebooted into a different kernel to see if some performance problems were kernel related. And it just didn't boot.
> >
> > OK the corrupted Btrfs volume is a guest file system?
>
> Was the reboot a reboot of the guest instance or the host? The reboot of
> the host can be indistinguishable from a crash to the guest file system
> images if shutdown is taking a long time. That megear fifteen second gap
> between SIGTERM and SIGKILL can be a real VM killer even in an orderly
> shutdown. If you don't have a qemu shutdown script in your host
> environment then every orderly shutdown is a risk to any running VM.

Yep it's a good point.


>
> The question that comes to my mind is to ask what -blockdev and/or
> -drive parameters you are using? Some of the combinations of features
> and flags can, in the name of speed, "helpfully violate" the necessary
> I/O orderings that filesystems depend on.

In particular unsafe caching. But it does make for faster writes, in
particular NTFS and Btrfs in the VM guest.


> So if the crash kills qemu before qemu has flushed and completed a
> guest-system-critical write to the host store you've suffered a
> corruption that has nothing to do with the filesystem code base.

For Btrfs, I think the worst case scenario should be you lose up to
30s of writes. The super block should still point to a valid,
completely committed set of trees that point to valid data extents.
But yeah I have no idea what the write ordering could be if say the
guest has written data>metadata>super, and then the host, not honoring
fsync (some cache policies do ignore it), maybe it ends up writing out
a new super before it writes out metadata - of course the host has no
idea what these writes are for from the guest. And before all metadata
is written by the host, the host reboots. So now you have a superblock
that's pointing to a partial metadata write and that will show up as
corruption.

What *should* still be true is Btrfs can be made to fallback to a
previous root tree by using mount option -o usebackuproot



-- 
Chris Murphy




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux