Re: [RFC] Proposal to change Node Description naming scheme for HCA's

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

On Thu, Apr 05, 2012 at 01:46:19PM -0400, Doug Ledford wrote:

> > The purpose of the udev hook path is to set the node description on
> > initial device insertion, which may be before, or after the DHCP
> > process completes, in such cases.
> Well, a udev rule is guaranteed to be before dhcp completes *on that
> device*.  It might be that dhcp has completed on another device and that
> the other device is actually where the hostname is pulled, but the udev
> rule will always be before this given device in the rule has completed dhcp.

The case where the hostname comes from management ethernet DHCP is
what I was thinking of..
> > IMHO we should have a udev rule that also loads the higher level IB
> > modules when any RDMA capable device is discovered, including the mlx4
> > IB layer, uverbs, umad, etc. This way systems that have RDMA will load
> > the right modules and systems that don't, won't. Fully supporting hot
> > plug, of course.
> One of the niceties of the rdma/openibd init script is the ability to
> completely reload the stack.  That goes away if we switch to a udev load
> of the stack (well, unless you now create a script in /sbin and have the
> udev rule call that script, but the script should no longer be in the
> initrddir if it's not an init script).

I'm not sure how valuable this really is, in practice we shouldn't be
unloading/reloading kernel modules. Every time I've run into problems
an unload of the mlx4 drivers alone was enough to fix them. The verbs,
mad, srp, etc, don't often need to be unloaded in my experience,
certainly never all as a unit. It is also difficult to make this
actually work since you need to be aware of all RDMA using stuff in
the system. You may also need to restart MPI? DB2? RAC? etc?

Do other large scale kernel stacks in RHEL have module restart script?
If so a rdma-stack-restart script in /usr/sbin would seem appropriate
to me.

This would also fix the bug of trying to unload modules on shutdown
via the init script, that is unnecessary/sketchy, IMHO.

> >> I agree there might be better ways but I am not sure I follow your
> >> proposal.  Furthermore, I don't know that a start up script of some
> >> sort is all that evil.
> >>
> >> Finally, I think Michael brings up a good point about which package
> >> should own any such scripts.
> > 
> > udev is like if-up.d/, there is a rules directory packages can drop
> > hook scripts into that run at the appropriate time.
> Correct.  On Red Hat Enterprise Linux I could have the rdma package drop
> in an /sbin/rdma script that would bring the stack up (and possibly
> reload it, but I'm a bit sketchy on that idea given that this would no
> longer be an init script but something else), have it drop a small
> script in /etc/dhcp/dhclient.d/ to set the node description after dhcp
> completes (the script in /sbin would also have to set the node
> description from whatever information is available on load just in case
> the machine doesn't even use dhcp though), have it drop a rules file in
> /etc/udev.d/rules.d for bringing up the stack on device discovery
> (this

> one is a bit tricky though, you basically have to match against all
> possible RDMA devices and bring up the stack on the presence of any one
> of them, and your script that you call to bring the stack up needs
> to be

There are a couple independent things here..

1) PCI auto probe will start mlx4_core (and other core drivers, that
   is getting common). This is already done..

2) When mlx4_core/etc is loaded udev should auto load mlx4_en/mlx4_ib/etc
   There is currently no auto-probe system at all for these layered
   drivers, it is annoying it doesn't work automatically :)

3) When a driver is loaded that has the sysfs child attributes
   {node_guid,node_desc,node_type} load the IB user space modules:
   Which pull in the core kernel modules too

4) 'some other sysfs test for iWarp/ROCEE' - load ib_uverbs. If there
    isn't a sysfs already we should add one.

5) Various ULPs can be loaded contingent on the ib_verbs/ib_cm module being
   loaded, or on-demand from other init scripts (e.g. load srp/iser in the
   same way the system loads iscsi)

udev alread has modprobe calling rules, so I think we are good without
any external scripts or complex serialization.

I don't have any rules already that do this, every time I've tangled
with udev I end up screaming at my monitor :(

> So, I'd say it's doable to change this over, but I'm not sure I would
> recommend it in a minor point release.  I'd probably save this sort of
> change for a major release update.

Sure, but it is the sort of thing I think is appropriate for OFED to
prototype so everyone can have something well tested for their next
major releases.

To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at

[Home]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Free Online Dating]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Devices]

Add to Google Powered by Linux