Re: root_lock vs. device's TX lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

On Thu, Nov 17, 2011 at 12:19 PM, Dave Taht <dave.taht@xxxxxxxxx> wrote:
> On Thu, Nov 17, 2011 at 6:26 PM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
>> Le jeudi 17 novembre 2011 à 17:34 +0100, Eric Dumazet a écrit :
>>> Le jeudi 17 novembre 2011 à 08:10 -0800, Tom Herbert a écrit :
>>> > From sch_direct_xmit:
>>> >
>>> >         /* And release qdisc */
>>> >         spin_unlock(root_lock);
>>> >
>>> >         HARD_TX_LOCK(dev, txq, smp_processor_id());
>>> >         if (!netif_tx_queue_frozen_or_stopped(txq))
>>> >                 ret = dev_hard_start_xmit(skb, dev, txq);
>>> >
>>> >         HARD_TX_UNLOCK(dev, txq);
>>> >
>>> >         spin_lock(root_lock);
>>> >
>>> > This is a lot of lock manipulation to basically switch from one lock
>>> > to another and possibly thrashing just to send a packet.  I am
>>> > thinking that if the there is a 1-1 correspondence between qdisc and
>>> > device queue then we could actually use the device's lock as the root
>>> > lock for the qdisc.  So in that case, we would need to touch any locks
>>> > from sch_direct_xmit (just hold root lock which is already device lock
>>> > for the duration).
>>> >
>>> > Is there any reason why this couldn't work?
>>> But we have to dirty part of Qdisc anyway ?
>>> (state, bstats, q, ...)
>> Also we want to permit other cpus to enqueue packets to Qdisc while our
>> cpu is busy in network driver ndo_start_xmit()
>> For complex Qdisc / tc setups (potentially touching a lot of cache
>> lines), we could eventually add a small ring buffer so that the cpu
>> doing the ndo_start_xmit() also queues the packets into Qdisc.
>> This ringbuffer could use a lockless algo. (we currently use the
>> secondary 'busylock' to serialize other cpus, but each cpu calls qdisc
>> enqueue itself.)
> I was thinking ringbuffering might also help in adding a 'grouper'
> abstraction to the dequeuing side.

Actually, I'm interested in circumventing *both* locks. Our SoC has
some quite-versatile queueing infrastructure, such that (for many
queueing setups) we can do all of the queueing in hardware, using
per-cpu access portals. By hacking around the qdisc lock, and using a
tx queue per core, we were able to achieve a significant speedup.

Right now, it's a big hack, but I'd like to find a way that we could
provide support for this. My initial thought was to craft a qdisc
implementation for our hardware, but the root_lock means doing so
would not yield any performance gain.

To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at

[Linux Kernel Discussion]     [Ethernet Bridging]     [Linux Wireless Networking]     [Linux Bluetooth Networking]     [Linux Networking Users]     [VLAN]     [Git]     [IETF Annouce]     [Linux Assembly]     [Security]     [Bugtraq]     [Photo]     [Singles Social Networking]     [Yosemite Information]     [MIPS Linux]     [ARM Linux Kernel]     [ARM Linux]     [Linux Virtualization]     [Linux Security]     [Linux IDE]     [Linux RAID]     [Linux SCSI]     [Free Dating]

Add to Google Powered by Linux