Hugo Mills posted on Tue, 20 May 2014 23:26:09 +0100 as excerpted:
> On Wed, May 21, 2014 at 12:00:24AM +0200, Goffredo Baroncelli wrote:
>> On 05/19/2014 02:54 AM, Chris Murphy wrote:
>>>
>>> It's insufficient to pass rootflags=degraded to get the system root
>>> to mount when a device is missing. It looks like when a device is
>>> missing, udev doesn't [...]
>>>
>>> This is the current udev rule:
>>>
>>> # cat /usr/lib/udev/rules.d/64-btrfs.rules
>>> # do not edit this file, it will be overwritten on update
>>>
>>> SUBSYSTEM!="block", GOTO="btrfs_end"
>>> ACTION=="remove", GOTO="btrfs_end"
>>> ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end"
>>>
>>> # let the kernel know about this btrfs filesystem, and check if it is
>>> # complete
>>> IMPORT{builtin}="btrfs ready $devnode"
>>>
>>> # mark the device as not ready to be used by the system
>>> ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0"
>>>
>>> LABEL="btrfs_end"
>>
>> The key is the line
>>
>> IMPORT{builtin}="btrfs ready $devnode"
>>
>> This line sets ID_BTRFS_READY=0 if a filesystem is not ready; otherwise
>> set ID_BTRFS_READY=1 [1].
>> The next line
>>
>> ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0"
>>
>> sets SYSTEMD_READY=0 if the filesystem is not ready so the "plug" event
>> is not raised to systemd.
>>
>> This is my understanding.
Looks correct to me. =:^)
>>> How this works with raid:
>>>
>>> RAID assembly is separate from filesystem mount. The volume UUID
>>> isn't available until the RAID is successfully assembled.
>>>
>>> On at least Fedora (dracut) systems with the system root on an md
>>> device, the initramfs contains 30-parse-md.sh [with a sleep loop and
>>> a timeout]
>>
>>> The approximate Btrfs equivalent down the road would be a similar
>>> initrd script, or maybe a user space daemon, that causes btrfs device
>>> ready to confirm/deny all devices are present. And after x number of
>>> failures, then it's issue an equivalent to mdadm -R which right now
>>> we don't seem to have.
>>
>> I suggest to implement a mount.btrfs command, which waits all the
>> needed disks until a timeout expires. After this timeout it could try a
>> "degraded" mount until a second timeout. Only then it fails.
>>
>> Each time a device appear, the system may start mount.btrfs. Each
>> invocation has to test if there is another instance of mount.btrfs
>> related to the same filesystem; if so it ends, otherwise it follows the
>> above behavior.
>
> Don't we already have something approaching this functionality with
> btrfs device ready? (i.e. this is exactly what it was designed for).
Well, sort of.
btrfs device ready is used directly in the udev rule quoted above. And
in the non-degraded case it works as intended, checking if the filesystem
is complete and only letting the udev plug event complete when all
devices are available.
But this thread is about a degraded state mount, with devices missing.
In that case, the missing devices never appear so the plug event never
happens, so systemd will never mount the device, despite the fact that
degraded was specifically passed as an option, indicating that the admin
wants the mount to happen anyway.
In dracut[1] (on gentoo), the result is an eventual timeout on rootfs
appearing and a kick to the initr* rescue shell prompt. Where an admin
can manually mount using the degraded option, and continue from there.
I'd actually argue that's functioning as it should, since I see forced
manual intervention in ordered to mount degraded as a FEATURE, NOT A BUG.
But never-the-less, being able to effectively pass degraded either as
part of rootflags or in the fstab that dracut (and systemd in dracut)
use, such that degraded-mount could still be automated, could I suppose
be seen as a feature, to some.
To do that would require a script with a countdown and timeout, first for
undegraded ready (and thus mount), then if all devices don't appear,
bypassing the ready test and plugging it anyway, to let mount try it if
the degraded option was passed, and only if THAT fails falling back to
the emergency shell prompt.
Note that such a script wouldn't have to actually check for degraded in
the mount options, only fall back to plugging without all devices if the
complete timeout triggered, since mount would then take care of success/
failure on its own based on whether the degraded option was passed, just
as it does if a mount is attempted on an incomplete btrfs at other times.
---
[1] dracut: I use it here on gentoo as well, because my rootfs is a multi-
device btrfs and a kernel rootflags=device= line won't parse correctly,
apparently due to splitting at the wrong =, so I must use an initr*
despite my preference for a direct initr*-less boot, and I use dracut to
generate it.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html