On Fri, 2014-05-30 at 22:41 -0400, Neil Horman wrote: > On Fri, May 30, 2014 at 01:58:33PM -0700, Michael Chan wrote: > > On Fri, 2014-05-30 at 16:38 -0400, Neil Horman wrote: > > > On Fri, May 30, 2014 at 01:13:40PM -0700, Michael Chan wrote: > > > > On Fri, 2014-05-30 at 16:03 -0400, Neil Horman wrote: > > > > > On Fri, May 30, 2014 at 10:58:11AM -0700, Michael Chan wrote: > > > > > > On Fri, 2014-05-30 at 11:00 -0400, Neil Horman wrote: > > > > > > > The Cnic driver handles lots of ulp operations in its netdevice event hanlder. > > > > > > > To do this, it accesses the ulp_ops array, which is an rcu protected array. > > > > > > > However, some ulp operations (like bnx2fc_indicate_netevent) try to lock > > > > > > > mutexes, which might sleep (somthing that you can't do while holding rcu read > > > > > > > side locks if you've configured non-preemptive rcu. > > > > > > > > > > > > > > Fix this by changing the dereference method. All accesses to the ulp_ops array > > > > > > > for a cnic dev are modified under the protection of the rtnl lock, and so we can > > > > > > > safely just use rcu_dereference_rtnl, and remove the rcu_read_lock here > > > > > > > > > > > > Because the bnx2fc function can sleep, we need a more complete fix to > > > > > > prevent the ulp_ops from going away when the device is unregistered. > > > > > > synchronize_rcu() won't be able to protect it. I'll post the patch > > > > > > later today. Thanks. > > > > > > > > > > > The device can't be unregistered while we hold rtnl, can it? Since we hold it > > > > > in this path it seems safe to me, even if we sleep, or am I missing something? > > > > > Neil > > > > > > > > > The netdev cannot be unregistered of course, but I am talking about > > > > bnx2fc unregistering the cnic device. For example if someone does > > > > fcoeadm -d or bnx2fc gets unloaded. > > > > > > I don't think the latter can happen, as creating an fcoe transport places a hold > > > on the bnx2fc module (see bnx2fc_create), and the former operation (fcoeadm -d) > > > will block in bnx2fc_destroy as it requires holding the rtnl_lock, which will > > > already be held by the netevent notifer, and confirmed by the > > > rcu_dereference_rtnl in my patch. > > > > > > I really think we're safe here > > > > Take a look at bnx2fc_mod_exit(). It doesn't look safe to me as it goes > > through the adapter_list unregistering all cnic devices not under > > rtnl_lock. > > > Right, but you can't get into the module removal code at all until all > transports are unregistered. I suppose if you have no registered transports and > remove the bnx2fc module while a netdevice event occurs, there might be a > problem, but I think that problem is bigger than what we're talking about here, > as you don't want to remove the module at all while running a netdevice > notifier, as you'll wind up potentially executing garbage. As long as we take care of the race conditions, I don't think there is a bigger problem. During bnx2fc module removal, it will unregister all cnic devices. If there is a netdev event, we will synchronize and the unregister call will wait for all pending netdev event handling to be done before completing. The alternate patch that I sent out should take care of this condition. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html