Re: timed out waiting for device dev-disk-by\x2duuid after disk failure on btrfs raid1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jan 4, 2020 at 3:46 PM Georg Großmann
<georg@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Dear btrfs community,
>
> I wanted to use a setup with Open Suse Tumbleweed together with with a
> btrfs raid 1 on two disks in my virtual box. I want a system that can
> still boot if one of the disks fails so I installed a bootloader to each
> of the disks in /dev/sda1 and /dev/sdb1.
>
> I then used /dev/sda2 and /dev/sdb2 for the btrfs raid 1. After
> unplugging one disk, the boot process always fails with the message
> "timed out waiting for device dev-disk-by\x2duuid". I found a mailing
> list here
> https://lists.freedesktop.org/archives/systemd-devel/2014-May/019217.html
> which pretty well describes my problem. Unfortunately, I can't find an
> appropriate solution there. Since this mailing list is from 2014, has
> there been some progress in the meantime? Or is this the expected
> behaviour and the user has to help himself out manually?

It's the same situation.

Most distributions have a udev rule that waits indefinitely for all
Btrfs member devices to appear. This is done because Btrfs doesn't
have automatic degraded mount. If mount is attempted, and any device
is missing, mount fails - even if there is a tiny delay (somewhat
common) rather than a device failure that causes a device to be
missing. So instead, udev waits. Mount isn't even attempted.

Should the udev rule wait for 1-2 minutes, similar to the dracut
script for mdadm arrays? Even if it did, it just means we get to mount
after the wait, and now mount fails because Btrfs doesn't have
automatic degraded mount. What's the trouble with deleting this udev
rule, and then always using degraded mount option in fstab or as a
kernel rootflags parameter? If there is any small delay with any
device becoming available at mount time, you get a degraded mount. And
however briefly, the drives can be out of sync. There is no automatic
resync once all devices do become available, and Btrfs has no concept
of becoming "undegraded". All of this makes things messy for the
casual user, so the decision so far is to just wait indefinitely,
using this udev rule.

And the open question is what should this look like in 5 or 10 years?
The btrfs on-disk format has enough information to figure out how to
do a partial resync to catch up a slow device, similar to the mdadm
write intent bitmap + resync. But does this need some enhancement so
it can be totally unattended? Like a partial scrub capability?

There are more questions than answers so far, that's why it requires
intervention.


-- 
Chris Murphy




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux