On Tue, May 29, 2012 at 09:15:00AM +0800, Asias He wrote:
> After hot-unplug a stressed disk, I found that rl->wait[] is not empty
> while rl->count[] is empty and there are theads still sleeping on
> get_request after the queue cleanup. With simple debug code, I found
> there are exactly nr_sleep - nr_wakeup of theads in D state. So there
> are missed wakeup.
>   $ dmesg | grep nr_sleep
>   [   52.917115] ---> nr_sleep=1046, nr_wakeup=873, delta=173
>   $ vmstat 1
>   1 173  0 712640  24292  96172 0 0  0  0  419  757  0  0  0 100  0
> To quote Tejun:
>   Ah, okay, freed_request() wakes up single waiter with the assumption
>   that after the wakeup there will at least be one successful allocation
>   which in turn will continue the wakeup chain until the wait list is
>   empty - ie. waiter wakeup is dependent on successful request
>   allocation happening after each wakeup.  With queue marked dead, any
>   woken up waiter fails the allocation path, so the wakeup chaining is
>   lost and we're left with hung waiters. What we need is wake_up_all()
>   after drain completion.
> This patch fixes the missed wakeup by waking up all the theads which
> are sleeping on wait queue after queue drain.
> Changes in v2: Drop waitqueue_active() optimization
> Signed-off-by: Asias He <asias@xxxxxxxxxx>

Acked-by: Tejun Heo <tj@xxxxxxxxxx>

Jens, this one wants Cc: stable.


