On Wed, Jul 03, 2019 at 11:12:10PM +0200, Peter Zijlstra wrote: > On Wed, Jul 03, 2019 at 09:54:06AM -0400, Josef Bacik wrote: > > Hello, > > > > I've been seeing a variation of the following splat recently and I have no > > earthly idea what it's trying to tell me. > > That you have a lock cycle; aka. deadlock, obviously :-) > > > I either get this one, or I get one > > that tells me the same thing except it's complaining about &cpuctx_mutex instead > > of sb_pagefaults. > > Timing on where the cycle closes, most likely. > > > There is no place we take the reloc_mutex and then do the > > pagefaults stuff, > > That's not needed, What the below tells us, is that: > > btrfs->bio/mq->cpuhotplug->perf->mmap_sem->btrfs > > is a cycle. The basic problem seems to be that btrfs, through blk_mq, > have the cpuhotplug lock inside mmap_sem, while perf (and I imagine > quite a few other places) allow pagefaults while holding cpuhotplug > lock. > > This then presents a deadlock potential. Some of the actual chains are > clarified below; hope this helps. > Thanks Peter, that was immensely helpful. I _think_ I see what's happening now, this box has btrfs as the root fs, so we get the normal lock dependencies built up from that. But then we start a container that's a loopback device of a btrfs image, and then that sucks in all of the loopback dependencies when we do the device open, and then we get this splat. I'm not sure how I'd fix this, the blk_mq_init_queue() has to happen under the ctl_mutex I think. I could probably figure out how to take that part away to break the dependency chain between that mutex and the hotplug lock. I'll stare at it some more and see what I can come up with. Thanks again for looking at this, I was very confused, usually my lockdep splats are a much more direct "uh, you're an idiot" and less a complex chain of dependencies like this. Josef
