Re: "kernel BUG" and segmentation fault with "device delete"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Chris,

First, thank you very much for taking the time to reply. I greatly appreciate it.

On 05/07/2019 21.43, Chris Murphy wrote:
There's no parity on either raid10 or raid1.

Right, thank you for the correction. Of course, I meant the duplicate copies of the RAID1 data.

But I can't tell from the
above exactly when each drive was disconnected. In this scenario you
need to convert to raid1 first, wait for that to complete successfully
before you can do a device remove. That's clear.  Also clear is you
must use 'btrfs device remove' and it must complete before that device
is disconnected.

Unfortunately as mentioned before that wasn't an option. I was performing this operation on a DM snapshot target backed by a file that certainly could not fit the result of a RAID10-to-RAID1 rebalance.

What I've never tried, but the man page implies, is you can specify
two devices at one time for 'btrfs device remove' if the profile and
the number of devices permits it.

What I found surprising, was that "btrfs device delete missing" deletes exactly one device, instead of all missing devices. But, that might be simply because a device with RAID10 blocks should not have been mountable rw with two missing drives in the first place.

After stubbing out btrfs_check_rw_degradable (because btrfs currently
can't realize when it has all drives needed for RAID10),

Uhh? This implies it was still raid10 when you disconnected two drives
of a four drive raid10. That's definitely data loss territory.
However, your 'btrfs fi us' command suggests only raid1 chunks. What
I'm suspicious of is this:

Data,RAID1: Size:2.66TiB, Used:2.66TiB
  /dev/sdd1   2.66TiB
  /dev/sdf1   2.66TiB

All data block groups are only on sdf1 and sdd1.

Metadata,RAID1: Size:57.00GiB, Used:52.58GiB
   /dev/sdd1  57.00GiB
  /dev/sdf1  37.00GiB
   missing  20.00GiB

There's metadata still on one of the missing devices. You need to
physically reconnect this device. The device removal did not complete
before this device was physically disconnected.

System,RAID1: Size:8.00MiB, Used:416.00KiB
   /dev/sdd1   8.00MiB
   missing   8.00MiB

This is actually worse, potentially because it means there's only one
copy of the system chunk on sdd1. It has not been replicated to sdf1,
but is on the missing device.

I'm sorry, but that's not right. As I mentioned in my second email, if I use btrfs device replace, then it successfully rebuilds all missing data. So, there is no lost data with no remaining copies; btrfs is simply having some trouble moving it off of that device.

Here is the filesystem info with a loop device replacing the missing drive:

https://dump.thecybershadow.net/9a0c88c3720c55bcf7fee98630c2a8e1/00%3A02%3A17-upload.txt

Depending on degraded operation for this task is the wrong strategy.
You needed to 'btrfs device delete/remove' before physically
disconnecting these drives.

OK you definitely did this incorrectly if you're expecting to
disconnect two devices at the same time, and then "btrfs device delete
missing" instead of explicitly deleting drives by ID before you
physically disconnect them.

I don't disagree in general, however, I did make sure that all data was accessible with two devices before proceeding with this endeavor.

It sounds to me like you had a successful conversion from 4 disk
raid10 to a 4 disk raid1. But then you're assuming there are
sufficient copies of all data and metadata on each drive. That is not
the case with Btrfs. The drives are not mirrored. The block groups are
mirrored. Btrfs raid1 tolerates exactly 1 device loss. Not two.

Whether it was through dumb luck or just due to the series of steps I've happened to have taken, it doesn't look like I've lost any data so far. But thank you for the correction regarding how RAID1 works in btrfs, I've indeed been misunderstanding it.

We need to see a list of commands issued in order, along with the
physical connected state of each drive. I thought I understood what
you did from the previous email, but this paragraph contradicts my
understanding, especially when you say "correct approach would be
first to convert all RAID 10 to RAID1 and then remove devices but that
wasn't an option"

OK so what did you do, in order, each command, interleaving the
physical device removals.

Well, at this point, I'm still quite confident that the BTRFS kernel bug is unrelated to this entire RAID10 thing, but I'll do so if you like. Unfortunately I do not have an exact record of this, but I can do my best to reconstruct it from memory.

The reason I'm doing this in the first place is that I'm trying to split a 4-drive RAID10 array that was getting full. The goal was to move some data off of it to a new array, then delete it from its original location. I couldn't use rsync because most of the data was in snapshots, and I couldn't use btrfs send/receive because it bugs out with the old "chown oXXX-XXXXXXX-0 failed: No such file or directory" bug. So, my idea was:

1. Use device mapper to create a COW copy of all four devices, and operate on those (make the SATA devices read-only to ensure they're not touched)
2. Use btrfs-tune to change the UUID of the new filesystem
3. Delete 75%-ish of data off of the COW copy
4. Somehow convert the 4-disk RAID10 to 2-disk RAID1 without incurring a ton of writes to the COW copies
5. dd the contents of the COW copies to two new real disks
6. After ensuring the remaining data is safe on the new disks, delete it from the original array.

For steps 2 and 3, I needed to specify the exact devices to work with. It's possible to specify the device list when mounting with -o device=, but for btrfstune, I had to bind-mount a fake partitions file over /proc/partitions. I can share the scripts I used for all this if you like.

Anyway, step 4 is the one I was having trouble with. After a few failed approaches, what I did was: 1. Find a pair of disks that had a copy of all data (there was no guarantee that this was possible, but it did happen to be for me)
2. dd them to real disks
3. mount them as rw+degraded with a kernel patch
4. btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/a
5. sudo btrfs device delete missing /mnt/a
6. sudo btrfs device delete missing /mnt/a

and that's when the kernel bug happens.

To put a really fine point on this, in my estimation the data on sdf
and sdd are hanging by a thread. It's likely you have partial data
loss because clearly some Btrfs metadata is missing and there are no
other copies of it. For sure you do not want to try to repair the file
system, or do anything that will cause any writes to happen. Any
changes to the file system now will almost certainly make problems
worse, and the chance of recovery will be reduced.

Thank you for the concern. As mentioned above, I'm pretty sure all the data is safe and accessible. Also, I have a copy of all data of the filesystem still.

Have you had a chance to look at the kernel stack trace yet? It looks like it's running out of temporary space to perform a relocation. I think that is where we should be concentrating on.

--
Best regards,
 Vladimir



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux