On 5/18/19 4:06 AM, Chris Murphy wrote:
On Fri, May 17, 2019 at 2:18 AM Lee Fleming <leeflemingster@xxxxxxxxx> wrote:
I didn't see that particular warning. I did see a warning that it could cause damage and should be tried after trying some other things which I did. The data on this drive isn't important. I just wanted to see if it could be recovered before reinstalling.
There was no crash, just a reboot. I was setting up KVM and I rebooted into a different kernel to see if some performance problems were kernel related. And it just didn't boot.
OK the corrupted Btrfs volume is a guest file system?
Was the reboot a reboot of the guest instance or the host? The reboot of
the host can be indistinguishable from a crash to the guest file system
images if shutdown is taking a long time. That megear fifteen second gap
between SIGTERM and SIGKILL can be a real VM killer even in an orderly
shutdown. If you don't have a qemu shutdown script in your host
environment then every orderly shutdown is a risk to any running VM.
The question that comes to my mind is to ask what -blockdev and/or
-drive parameters you are using? Some of the combinations of features
and flags can, in the name of speed, "helpfully violate" the necessary
I/O orderings that filesystems depend on.
So if the crash kills qemu before qemu has flushed and completed a
guest-system-critical write to the host store you've suffered a
corruption that has nothing to do with the filesystem code base.
So, for example, you shutdown your host system. I sends SIGTERM to qemu.
The guest system sends SIGTERM to its processes. The guest is still
waiting its nominal 15 seconds, when the host evicts it from memory with
a SIGKILL because it's 15 second timer started sooner.
(15 seconds is the canonical time from my UNIX days, I don't know what
the real times are for every distribution.)
Upping the caching behaviours for writes can be just as deadly in some
conditions.
None of this my apply to OP, but it's the thing I'd check before before
digging too far.