On Tue, Jul 17, 2012 at 12:29:33AM +0300, Sami Liedes wrote:
> So, currently my idea is to boot the machine with a live USB stick,
> install kvm and make qemu qcow images backed by the real (2*1.1T)
> devices, but writing changes to the qcow images (I dare not mess with
> the actual devices, and don't happen to have quite 2.2T extra space
> outside of them...), and try to run scrub there. If that succeeds and
> the bug happens there too, debugging *should* be easier, and it
> *should* be possible to run it under KMEMCHECK too. If the bug doesn't
> happen inside a virtual machine, that would be interesting information
> too.
I have now been able to reproduce the bug in KVM with the setup
described above.
I think it's safe to say now that the bug depends on some kind of
interaction between btrfs and dm-crypt. With the following setup, the
bug does NOT happen:
* kvm, single cpu
* sees 3 disks, /dev/vda=root, /dev/vdb=btrfs-dev1, /dev/vdc=btrfs-dev2
* The btrfs devices are essentially snapshots of the real btrfs
devices in raid-1 configuration (2*1.1T). As the real devices are
encrypted, the decryption is done outside the KVM, i.e. the KVM
snapshots are backed by the decrypted devices.
With the following setup, the bug DOES happen:
* kvm, single cpu
* sees 3 disks, /dev/vda=root, /dev/vdb=part1, /dev/vdc=part2, where
part[12] is are LUKS containers containing the individual btrfs
devices
* inside kvm, they are opened using
cryptsetup luksOpen /dev/vdb root1
cryptsetup luksOpen /dev/vdc root2
* after this, the filesystem is mounted with
mount /dev/mapper/root1 /media -o device=/dev/mapper/root1,device=/dev/mapper/root2
* The devices are snapshots of the actual physical encrypted
partitions containing the btrfs devices.
I have not yet figured out if this can be reproduced using a pristine,
smaller btrfs filesystem in raid-1 configuration inside KVM or if
there's something about my specific filesystem that causes this. I can
investigate that too; it's easier to do for me than the above testing,
as I don't need to have continuous physical access to the computer to
do that.
Here's the .config of the kernel I used inside KVM to reproduce this:
http://www.niksula.hut.fi/~sliedes/btrfs/config.3.4.4
I also ran the same tests with KMEMCHECK. Both with and without
crypto, there were quite a number of (of course possibly false)
warnings from btrfs code. I doubt any of them are related to this bug
- there were no KMEMCHECK warnings during the scrub operation. Here
are the logs, anyway:
http://www.niksula.hut.fi/~sliedes/btrfs/screenlog.nocrypto.gz
http://www.niksula.hut.fi/~sliedes/btrfs/screenlog.crypto.gz
Sami
Attachment:
signature.asc
Description: Digital signature
