On 06/07/2019 02.38, Chris Murphy wrote:
On Fri, Jul 5, 2019 at 6:05 PM Vladimir Panteleev
<thecybershadow@xxxxxxxxx> wrote:
Unfortunately as mentioned before that wasn't an option. I was
performing this operation on a DM snapshot target backed by a file that
certainly could not fit the result of a RAID10-to-RAID1 rebalance.
Then the total operation isn't possible. Maybe you could have made the
volume a seed, and then create a single device sprout on a new single
target, and later convert that sprout to raid1. But I'm not sure of
the state of multiple device seeds.
That's an interesting idea, thanks; I'll be sure to explore it if I run
into this situation again.
What I found surprising, was that "btrfs device delete missing" deletes
exactly one device, instead of all missing devices. But, that might be
simply because a device with RAID10 blocks should not have been
mountable rw with two missing drives in the first place.
It's a really good question for developers if there is a good reason
to permit rw mount of a volume that's missing two or more devices for
raid 1, 10, or 5; and missing three or more for raid6. I cannot think
of a good reason to allow degraded,rw mounts for a raid10 missing two
devices.
Sorry, the code currently indeed does not permit mounting a RAID10
filesystem with more than one missing device in rw. I needed to patch my
kernel to force it to allow it, as I was working on the assumption that
the two remaining drives contained a copy of all data (which turned out
to be true).
Wow that's really interesting. So you did 'btrfs replace start' for
one of the missing drive devid's, with a loop device as the
replacement, and that worked and finished?!
Yes, that's right.
Does this three device volume mount rw and not degraded? I guess it
must have because 'btrfs fi us' worked on it.
devid 1 size 7.28TiB used 2.71TiB path /dev/sdd1
devid 2 size 7.28TiB used 22.01GiB path /dev/loop0
devid 3 size 7.28TiB used 2.69TiB path /dev/sdf1
Indeed - with the loop device attached, I can mount the filesystem rw
just fine without any mount flags, with a stock kernel.
OK so what happens now if you try to 'btrfs device remove /dev/loop0' ?
Unfortunately it fails in the same way (warning followed by "kernel
BUG"). The same thing happens if I try to rebalance the metadata.
Well there's definitely something screwy if Btrfs needs something on a
missing drive, which is indicated by its refusal to remove it from the
volume, and yet at same time it's possible to e.g. rsync every file to
/dev/null without any errors. That's a bug somewhere.
As I understand, I don't think it actually "needs" any data from that
device, it's just having trouble updating some metadata as it tries to
move one redundant copy of the data from there to somewhere else. It's
not refusing to remove the device either, rather it tries and fails at
doing so.
I'm not a developer but a dev very well might need to have a simple
reproducer for this in order to locate the problem. But the call trace
might tell them what they need to know. I'm not sure.
What I'm going to try to do next is to create another COW layer on top
of the three devices I have, attach them to a virtual machine, and boot
that (as it's not fun to reboot the physical machine each time the code
crashes). Then I could maybe poke the related kernel code to try to
understand the problem better.
--
Best regards,
Vladimir