On 2017-05-03 14:12, Andrei Borzenkov wrote:
03.05.2017 14:26, Austin S. Hemmelgarn пишет:
On 2017-05-02 15:50, Goffredo Baroncelli wrote:
On 2017-05-02 20:49, Adam Borowski wrote:
It could be some daemon that waits for btrfs to become complete. Do we
have something?
Such a daemon would also have to read the chunk tree.
I don't think that a daemon is necessary. As proof of concept, in the
past I developed a mount helper [1] which handled the mount of a btrfs
filesystem:
this handler first checks if the filesystem is a multivolume devices,
if so it waits that all the devices are appeared. Finally mount the
filesystem.
It's not so simple -- such a btrfs device would have THREE states:
1. not mountable yet (multi-device with not enough disks present)
2. mountable ro / rw-degraded
3. healthy
My mount.btrfs could be "programmed" to wait a timeout, then it mounts
the filesystem as degraded if not all devices are present. This is a
very simple strategy, but this could be expanded.
I am inclined to think that the current approach doesn't fit well the
btrfs requirements. The roles and responsibilities are spread to too
much layer (udev, systemd, mount)... I hoped that my helper could be
adopted in order to concentrate all the responsibility to only one
binary; this would reduce the interface number with the other
subsystem (eg systemd, udev).
The primary problem is that systemd treats BTRFS like a block-layer
instead of a filesystem (so it assumes all devices need to be present),
and that it doesn't trust the kernel's mount function to work correctly.
My understanding is that before kernel mount can succeed for
multi-device btrfs, kernel must be made aware of devices that comprise
this filesystem. This is done by using (equivalent of) "btrfs device
scan" or "btrfs device ready". Am I wrong here?
That is correct, the kernel needs to be notified about the devices via
'btrfs device scan' (or directly with the ioctl that calls). Udev calls
this automatically on newly connected block devices though, so currently
there is no reason manually run it on most systems. Ideally, this
should be in a mount helper and possibly triggered by 'btrfs filesystem
show'. Unless you're mounting a BTRFS volume or listing what the kernel
knows about, there is no reason the kernel needs to be tracking the FS,
so there is no point in regularly wasting time in udev processing by
scanning all newly connected devices.
As far as 'btrfs device ready', that only tells you if the kernel thinks
the filesystem is mountable _and_ not degraded. It's usually correct,
but watching that has the usual TOCTOU races present in any kind of
status checking system, and it's useless if you want to mount degraded.
As a result, it assumes that the mount operation will fail if it
doesn't see all the devices instead of just trying it like it should.
So do you suggest that mount will succeed even if kernel is not made
aware of all devices? If not, could you elaborate how btrfs should be
mounted on boot - we must give mount command some device, right? How
should we chose this device?
See my above comment on kernel awareness.
If you have 'degraded' in the mount options, the mount can succeed even
if not all the devices are present. Systemd refuses to even try the
mount if it doesn't see all the devices, and then *unmounts* the FS if
it gets mounted manually and not all devices are present. Both of these
are undesired behaviors for many people (the second more than the first).
I think I've outlined my thoughts on all of this somewhere before, but I
can't find them, so I might as well do so here:
1. Device scanning should be done by a mount helper, not udev. This
closes a serious data safety/security issue present in the current
combined implementation (if you plug in a device that has the same UUID
as an existing BTRFS volume on the system and both volumes are marked as
multi-device, you can cause data loss in the existing volume), allows
for more concise tracking of devices, and also eliminates the need for
system-wide scanning in some cases (if you use 'device=' mount options
that cover all the devices in the filesystem). It also saves some time
in processing of uevents for hot-plugged devices.
2. Systemd should not default to unmounting filesystems it thinks aren't
ready yet when they've been manually mounted. This behavior is highly
counter-intuitive for most users ('The mount command didn't complain and
returned 0 and dmesg has no errors, why the hell is the filesystem I
just mounted not mounted?'), and more importantly in this context, makes
it impossible to manually repair a BTRFS filesystem that's listed in a
mount unit without dropping to emergency mode, which largely defeats the
purpose of using a multi-device filesystem that can be repaired online.
3. For BTRFS, and possibly under special circumstances with other
filesystems (partially present ZFS pool, partially assembled LVM or MD
array that can run degraded, etc), systemd should try to mount the FS
when it times out waiting for devices, and there should be an option to
control this behavior. While I don't advocate mounting filesystems
degraded then letting the system run, some people do, and I still expect
it to work, but currently it does not when using systemd.
Alternatively, it could do a polling loop with a delay to call mount
instead of using 'btrfs device ready'.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html