On 07/10/2010 12:41 PM, Mike Galbraith wrote:
On Fri, 2010-07-09 at 15:33 -0700, Darren Hart wrote:The requeue_pi mechanism introduced proxy locking of the rtmutex. This creates a scenario where a task can wake-up, not knowing it has been enqueued on an rtmutex. In order to detect this, the task would have to be able to take either task->pi_blocked_on->lock->wait_lock and/or the hb->lock. Unfortunately, without already holding one of these, the pi_blocked_on variable can change from NULL to valid or from valid to NULL. Therefor, the task cannot be allowed to take a sleeping lock after wakeup or it could end up trying to block on two locks, the second overwriting a valid pi_blocked_on value. This obviously breaks the pi mechanism.copy/paste offline query/reply at Darren's request.. On Sat, 2010-07-10 at 10:26 -0700, Darren Hart wrote: On 07/09/2010 09:32 PM, Mike Galbraith wrote:On Fri, 2010-07-09 at 13:05 -0700, Darren Hart wrote:The core of the problem is that the proxy_lock blocks a task on a lock the task knows nothing about. So when it wakes up inside of futex_wait_requeue_pi, it immediately tries to block on hb->lock to check why it woke up. This has the potential to block the task on two locks (thus overwriting the pi_blocked_on). Any attempt preventing this involves a lock, and ultimiately the hb->lock. The only solution I see is to make the hb->locks raw locks (thanks to Steven Rostedt for original idea and batting this around with me in IRC).Hm, so wakee _was_ munging his own state after all. Out of curiosity, what's wrong with holding his pi_lock across the wakeup? He can _try_ to block, but can't until pi state is stable. I presume there's a big fat gotcha that's just not obvious to futex locking newbie :)
Nor to some of us that have been engrossed in futexes for the last couple years! I discussed the pi_lock across the wakeup issue with Thomas. While this fixes the problem for this particular failure case, it doesn't protect against:
<tglx> assume the following: <tglx> t1 is on the condvar <tglx> t2 does the requeue dance and t1 is now blocked on the outer futex <tglx> t3 takes hb->lock for a futex in the same bucket <tglx> t2 wakes due to signal/timeout <tglx> t2 blocks on hb->lockYou are likely to have not hit the above scenario because you only had one condvar, so the hash_buckets were not heavily shared and you weren't likely to hit:
<tglx> t3 takes hb->lock for a futex in the same bucketI'm going to roll up a patchset with your (Mike) spin_trylock patch and run it through some tests. I'd still prefer a way to detect early wakeup without having to grab the hb->lock(), but I haven't found it yet.
+ while(!spin_trylock(&hb->lock)) + cpu_relax(); ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to); spin_unlock(&hb->lock); Thanks, -- Darren Hart IBM Linux Technology Center Real-Time Linux Team -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html