[...] > > I was testing a kernel with this patch on one of my systems today I > saw a splat like this on boot: Interesting and thanks for reporting. Although, could you be a bit more detailed about the what kernel and what system. Especially I am interested to look at the code which is initializing the genpds and adding subdomains. > > [ 1.893134] > [ 1.893137] ============================================= > [ 1.893139] [ INFO: possible recursive locking detected ] > [ 1.893143] 3.18.0 #531 Not tainted > [ 1.893145] --------------------------------------------- > [ 1.893148] kworker/u8:4/113 is trying to acquire lock: > [ 1.893167] (&genpd->lock){+.+...}, at: [<ffffffc000573818>] > genpd_poweron+0x30/0x70 > [ 1.893169] > [ 1.893169] but task is already holding lock: > [ 1.893179] (&genpd->lock){+.+...}, at: [<ffffffc000573818>] > genpd_poweron+0x30/0x70 > [ 1.893182] > [ 1.893182] other info that might help us debug this: > [ 1.893184] Possible unsafe locking scenario: > [ 1.893184] > [ 1.893185] CPU0 > [ 1.893187] ---- > [ 1.893191] lock(&genpd->lock); > [ 1.893195] lock(&genpd->lock); > [ 1.893196] > [ 1.893196] *** DEADLOCK *** > [ 1.893196] > [ 1.893198] May be due to missing lock nesting notation > [ 1.893198] > [ 1.893201] 4 locks held by kworker/u8:4/113: > [ 1.893217] #0: ("%s""deferwq"){++++.+}, at: [<ffffffc00023b4e0>] > process_one_work+0x1f8/0x50c > [ 1.893229] #1: (deferred_probe_work){+.+.+.}, at: > [<ffffffc00023b4e0>] process_one_work+0x1f8/0x50c > [ 1.893241] #2: (&dev->mutex){......}, at: [<ffffffc000560920>] > __device_attach+0x40/0x12c > [ 1.893251] #3: (&genpd->lock){+.+...}, at: [<ffffffc000573818>] > genpd_poweron+0x30/0x70 > [ 1.893253] > [ 1.893253] stack backtrace: > [ 1.893259] CPU: 2 PID: 113 Comm: kworker/u8:4 Not tainted 3.18.0 #531 > [ 1.893261] Hardware name: Mediatek Oak rev4 board (DT) > [ 1.893269] Workqueue: deferwq deferred_probe_work_func > [ 1.893271] Call trace: > [ 1.893278] [<ffffffc000208cf0>] dump_backtrace+0x0/0x140 > [ 1.893283] [<ffffffc000208e4c>] show_stack+0x1c/0x28 > [ 1.893289] [<ffffffc00084ab48>] dump_stack+0x80/0xb4 > [ 1.893295] [<ffffffc000269dcc>] __lock_acquire+0x68c/0x19a8 > [ 1.893299] [<ffffffc00026b954>] lock_acquire+0x128/0x164 > [ 1.893304] [<ffffffc00084e090>] mutex_lock_nested+0x90/0x3b4 > [ 1.893308] [<ffffffc000573814>] genpd_poweron+0x2c/0x70 > [ 1.893312] [<ffffffc0005738ac>] __genpd_poweron.part.14+0x54/0xcc > [ 1.893316] [<ffffffc000573834>] genpd_poweron+0x4c/0x70 > [ 1.893321] [<ffffffc00057447c>] genpd_dev_pm_attach+0x160/0x19c > [ 1.893326] [<ffffffc00056931c>] dev_pm_domain_attach+0x1c/0x2c > [ 1.893331] [<ffffffc000562754>] platform_drv_probe+0x3c/0xa4 > [ 1.893336] [<ffffffc000560604>] driver_probe_device+0x124/0x2c8 > [ 1.893340] [<ffffffc000560824>] __device_attach_driver+0x7c/0x90 > [ 1.893344] [<ffffffc00055e6fc>] bus_for_each_drv+0x90/0xc4 > [ 1.893348] [<ffffffc000560988>] __device_attach+0xa8/0x12c > [ 1.893352] [<ffffffc000560a5c>] device_initial_probe+0x20/0x30 > [ 1.893356] [<ffffffc00055f994>] bus_probe_device+0x34/0x9c > [ 1.893360] [<ffffffc00055fe38>] deferred_probe_work_func+0x7c/0xb0 > [ 1.893365] [<ffffffc00023b5d4>] process_one_work+0x2ec/0x50c > [ 1.893370] [<ffffffc00023c340>] worker_thread+0x350/0x470 > [ 1.893374] [<ffffffc00024165c>] kthread+0xf0/0xfc > > I have not been able to reproduce, but by looking at how > genpd_poweron() was modified by this patch, it looks like it might > have been introduced by this hunk: > >> @@ -291,20 +239,8 @@ static int __pm_genpd_poweron(struct generic_pm_domain *genpd) >> */ >> list_for_each_entry(link, &genpd->slave_links, slave_node) { >> genpd_sd_counter_inc(link->master); >> - genpd->status = GPD_STATE_WAIT_MASTER; >> - >> - mutex_unlock(&genpd->lock); >> >> ret = pm_genpd_poweron(link->master); >> - >> - mutex_lock(&genpd->lock); >> - >> - /* >> - * The "wait for parent" status is guaranteed not to change >> - * while the master is powering on. >> - */ >> - genpd->status = GPD_STATE_POWER_OFF; >> - wake_up_all(&genpd->status_wait_queue); > > AFAICT, the mutex_*() calls here that were removed unlock > (&genpd->lock) and allowed pm_genpd_poweron() to be recursively called > without the above splat in the case where a pm_genpd_poweron() on the > master would take a lock of the same lock class as the current (slave) > genpd. > > Other places in domain.c use something like this to lock a second > genpd while already holding one genpd's lock: > mutex_lock_nested(&genpd->lock, SINGLE_DEPTH_NESTING); > > However, it seems like in this case we might actually need to walk up > several layers of a hierarchy. > > So, AFAICT the best thing to do would be to restore the > mutex_unlock()/_lock() around the call to pm_genpd_poweron(). > > It isn't clear to me whether this was removed by this patch on purpose > or as an accident. The removal of the unlock|lock were done on purpose. The reason was simply because there are no intermediate power states to consider. Regarding re-entrancy with a gendp struct which is already locked, I am wondering whether the configuration of the subdomain/masters are properly done. To reach the deadlock described above, one subdomain's master must be a subdomain of its own child. Is that really correct? Perhaps I have already entered Christmas "mode", so I might be wrong in my analyse and simplifying things too much. :-) I will give this some more thinking after the holidays. Kind regards Uffe > > Thanks for your help! > -Dan _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/linux-arm-kernel