On Fri, May 17, 2019 at 10:39 PM Robert White <rwhite@xxxxxxxxx> wrote: > > On 5/18/19 4:06 AM, Chris Murphy wrote: > > On Fri, May 17, 2019 at 2:18 AM Lee Fleming <leeflemingster@xxxxxxxxx> wrote: > >> > >> I didn't see that particular warning. I did see a warning that it could cause damage and should be tried after trying some other things which I did. The data on this drive isn't important. I just wanted to see if it could be recovered before reinstalling. > >> > >> There was no crash, just a reboot. I was setting up KVM and I rebooted into a different kernel to see if some performance problems were kernel related. And it just didn't boot. > > > > OK the corrupted Btrfs volume is a guest file system? > > Was the reboot a reboot of the guest instance or the host? The reboot of > the host can be indistinguishable from a crash to the guest file system > images if shutdown is taking a long time. That megear fifteen second gap > between SIGTERM and SIGKILL can be a real VM killer even in an orderly > shutdown. If you don't have a qemu shutdown script in your host > environment then every orderly shutdown is a risk to any running VM. Yep it's a good point. > > The question that comes to my mind is to ask what -blockdev and/or > -drive parameters you are using? Some of the combinations of features > and flags can, in the name of speed, "helpfully violate" the necessary > I/O orderings that filesystems depend on. In particular unsafe caching. But it does make for faster writes, in particular NTFS and Btrfs in the VM guest. > So if the crash kills qemu before qemu has flushed and completed a > guest-system-critical write to the host store you've suffered a > corruption that has nothing to do with the filesystem code base. For Btrfs, I think the worst case scenario should be you lose up to 30s of writes. The super block should still point to a valid, completely committed set of trees that point to valid data extents. But yeah I have no idea what the write ordering could be if say the guest has written data>metadata>super, and then the host, not honoring fsync (some cache policies do ignore it), maybe it ends up writing out a new super before it writes out metadata - of course the host has no idea what these writes are for from the guest. And before all metadata is written by the host, the host reboots. So now you have a superblock that's pointing to a partial metadata write and that will show up as corruption. What *should* still be true is Btrfs can be made to fallback to a previous root tree by using mount option -o usebackuproot -- Chris Murphy
