On Fri, Apr 27, 2012 at 10:26:27AM +0800, Daniel J Blueman wrote: > In 3.4-rc4, I've come across worker list corruption while scrubbing, > leading to (in two separate cases) warning [1] and crashing [2]. The > connection with scrubbing is likely the increased rate of worker > threads starting and stopping. > > In btrfs_stop_workers, access to worker->worker_list is done without > holding worker->lock (it is in all other callsites). We can't take > worker->lock there due to lock inversion deadlock (as it is the outer > lock), and if we drop the workers->lock to acquire worker->lock and > then workers->lock, we can't guarantee worker is still valid. > > If feels like a global workers list pointer should be used and it's > lock should be the outer one to avoid this scenario, or maybe I'm > missing something? > I think you are missing something, as I read it we're always holding workers->lock when we touch the worker_list, so we should be safe, so I wonder what could be going on here... Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
