Re: "kernel BUG" and segmentation fault with "device delete"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/07/2019 02.38, Chris Murphy wrote:
On Fri, Jul 5, 2019 at 6:05 PM Vladimir Panteleev
<thecybershadow@xxxxxxxxx> wrote:
Unfortunately as mentioned before that wasn't an option. I was
performing this operation on a DM snapshot target backed by a file that
certainly could not fit the result of a RAID10-to-RAID1 rebalance.

Then the total operation isn't possible. Maybe you could have made the
volume a seed, and then create a single device sprout on a new single
target, and later convert that sprout to raid1. But I'm not sure of
the state of multiple device seeds.

That's an interesting idea, thanks; I'll be sure to explore it if I run into this situation again.

What I found surprising, was that "btrfs device delete missing" deletes
exactly one device, instead of all missing devices. But, that might be
simply because a device with RAID10 blocks should not have been
mountable rw with two missing drives in the first place.

It's a really good question for developers if there is a good reason
to permit rw mount of a volume that's missing two or more devices for
raid 1, 10, or 5; and missing three or more for raid6. I cannot think
of a good reason to allow degraded,rw mounts for a raid10 missing two
devices.

Sorry, the code currently indeed does not permit mounting a RAID10 filesystem with more than one missing device in rw. I needed to patch my kernel to force it to allow it, as I was working on the assumption that the two remaining drives contained a copy of all data (which turned out to be true).

Wow that's really interesting. So you did 'btrfs replace start' for
one of the missing drive devid's, with a loop device as the
replacement, and that worked and finished?!

Yes, that's right.

Does this three device volume mount rw and not degraded? I guess it
must have because 'btrfs fi us' worked on it.

         devid    1 size 7.28TiB used 2.71TiB path /dev/sdd1
         devid    2 size 7.28TiB used 22.01GiB path /dev/loop0
         devid    3 size 7.28TiB used 2.69TiB path /dev/sdf1

Indeed - with the loop device attached, I can mount the filesystem rw just fine without any mount flags, with a stock kernel.

OK so what happens now if you try to 'btrfs device remove /dev/loop0' ?

Unfortunately it fails in the same way (warning followed by "kernel BUG"). The same thing happens if I try to rebalance the metadata.

Well there's definitely something screwy if Btrfs needs something on a
missing drive, which is indicated by its refusal to remove it from the
volume, and yet at same time it's possible to e.g. rsync every file to
/dev/null without any errors. That's a bug somewhere.

As I understand, I don't think it actually "needs" any data from that device, it's just having trouble updating some metadata as it tries to move one redundant copy of the data from there to somewhere else. It's not refusing to remove the device either, rather it tries and fails at doing so.

I'm not a developer but a dev very well might need to have a simple
reproducer for this in order to locate the problem. But the call trace
might tell them what they need to know. I'm not sure.

What I'm going to try to do next is to create another COW layer on top of the three devices I have, attach them to a virtual machine, and boot that (as it's not fun to reboot the physical machine each time the code crashes). Then I could maybe poke the related kernel code to try to understand the problem better.

--
Best regards,
 Vladimir



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux