[PATCH] fix a race condition on udev worker shutdown

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi there,

A race condition in udev has turned up in Ubuntu's initramfs, causing
60-second delays in the boot sequence because udevd loses track of one of
its workers: it's waiting for a pending event to be processed, but the
worker it was dispatched to has already exited because the event and signal
were received by the worker at the same time and the signal took precedence.

There are a few ways we could deal with this.

 - Let udevd clear its list harder when a worker exits, since if the process
   hasn't already sent back its information it's not going to do so from
   beyond the grave and we might as well clean up the references to it.
   This seems to be safe because fd_worker is processed before fd_signal in
   the main event loop, so any last words will be duly recorded.

 - Reorder the event loop so that signals and the control socket are
   processed before the event queue.  This way, udev would never dispatch a
   new event to a worker after it's been told to exit, which is what happens
   now.  However, there may be reasons for the current order that I don't
   see offhand; and anyway, I don't think the reordering is a guarantee that
   the event and signal wouldn't arrive at the same time, it just makes it
   orders of magnitude less likely.

 - When a worker sees that an event has arrived, process it immediately.
   This ensures everything udevd has delegated is finished before the worker
   handles the signal (i.e., exits).  This means the worker will take longer
   to finish up in some cases, but probably not 60 seconds longer...

The below patch implements option three.  Maybe one of the other options
should also be implemented, but in my testing #3 seems to be sufficient in
its own right to solve the problem.

An implicit assumption in this patch is that, when an event and signal
arrive together, the event is returned *first* from epoll_wait().  This has
been the case in my tests, but I haven't spotted any reason that it's
guaranteed to always be the case.

Cheers,
-- 
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
Ubuntu Developer                                    http://www.debian.org/
slangasek@xxxxxxxxxx                                     vorlon@xxxxxxxxxx


From 84ead317ea041efa066b5134d5df0fd6fc075aa9 Mon Sep 17 00:00:00 2001
From: Steve Langasek <steve.langasek@xxxxxxxxxxxxx>
Date: Sat, 8 Oct 2011 01:34:32 -0700
Subject: [PATCH] Process events before signals in worker

When a worker receives both a signal and a udev event in the same epoll_wait
run, the event must be processed first because the udev parent considers the
event already dispatched.  If we process the signal first and exit, udevd
times out after 60 seconds waiting for a response from an already-dead
worker.

Ref: https://bugs.launchpad.net/bugs/818177

Signed-off-by: Steve Langasek <steve.langasek@xxxxxxxxxxxxx>
---
 udev/udevd.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/udev/udevd.c b/udev/udevd.c
index 77aec9d..b65b53f 100644
--- a/udev/udevd.c
+++ b/udev/udevd.c
@@ -347,6 +347,7 @@ static void worker_new(struct event *event)
 				for (i = 0; i < fdcount; i++) {
 					if (ev[i].data.fd == fd_monitor && ev[i].events & EPOLLIN) {
 						dev = udev_monitor_receive_device(worker_monitor);
+						break;
 					} else if (ev[i].data.fd == fd_signal && ev[i].events & EPOLLIN) {
 						struct signalfd_siginfo fdsi;
 						ssize_t size;
-- 
1.7.5.4

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux Kernel]     [Linux DVB]     [Asterisk Internet PBX]     [DCCP]     [Netdev]     [X.org]     [Util Linux NG]     [Fedora Women]     [ALSA Devel]     [Linux USB]

  Powered by Linux