Re: Heavy spin_lock contention in __udp4_lib_mcast_deliver increase

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

On Thu, 2012-04-26 at 10:15 -0500, Shawn Bohrer wrote:
> I've been doing some UDP multicast benchmarking and noticed that as we
> increase the number of sockets/multicast addresses the performance
> degrades.  The test I'm running has multiple machines sending packets
> on multiple multicast addresses.  A single receiving machine opens one
> socket per multicast address to receive all the packets.  The
> receiving process is bound to a core that is not processing
> interrupts.
> Running this test with 300 multicast addresses and sockets and
> profiling the receiving machine with 'perf -a -g' I can see the
> following:
> # Events: 45K cycles
> #
> # Overhead
> # ........  .....................................
> #
>     52.56%  [k] _raw_spin_lock
>             |
>             |--99.09%-- __udp4_lib_mcast_deliver
>     20.10%  [k] __udp4_lib_mcast_deliver
>             |
>             --- __udp4_lib_rcv
> So if I understand this correctly 52.56% of the time is spent
> contending for the spin_lock in __udp4_lib_mcast_deliver.  If I
> understand the code correctly it appears that for every packet
> received we walk the list of all UDP sockets while holding the
> spin_lock.  Therefore I believe the thing that hurts so much in this
> case is that we have a lot of UDP sockets.
> Are there any ideas on how we can improve the performance in this
> case?  Honestly I have two ideas though my understanding of the
> network stack is limited and it is unclear to me how to implement
> either of them.
> The first idea is to use RCU instead of acquiring the spin_lock.  This
> is what the Unicast path does however looking back to 271b72c7 "udp:
> RCU handling for Unicast packets." Eric points out that the multicast
> path is difficult.  It appears from that commit description that the
> problem is that since we have to find all sockets interested in
> receiving the packet instead of just one that restarting the scan of
> the hlist could lead us to deliver the packet twice to the same
> socket.  That commit is rather old though I believe things may have
> changed.  Looking at commit 1240d137 "ipv4: udp: Optimise multicast
> reception" I can see that Eric also has already done some work to
> reduce how long the spin_lock is held in __udp4_lib_mcast_deliver().
> That commit also says "It's also a base for a future RCU conversion of
> multicast recption".  Is the idea that you could remove duplicate
> sockets within flush_stack()?  Actually I don't think that would work
> since flush_stack() can be called multiple times if the stack gets
> full.
> The second idea would be to hash the sockets to reduce the number of
> sockets to walk for each packet.  Once again it looks like the Unicast
> path already does this in commits 512615b6b "udp: secondary hash on
> (local port, local address)" and 5051ebd27 "ipv4: udp: optimize
> unicast RX path".  Perhaps these hash lists could be used, however I
> don't think they can since they currently use RCU and thus it might
> depend on converting to RCU first.

Let me understand

You have 300 sockets bound to the same port, so a single message must be
copied 300 times and delivered to those sockets ?

To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at

[Linux Kernel Discussion]     [Ethernet Bridging]     [Linux Wireless Networking]     [Linux Bluetooth Networking]     [Linux Networking Users]     [VLAN]     [Git]     [IETF Annouce]     [Linux Assembly]     [Security]     [Bugtraq]     [Photo]     [Singles Social Networking]     [Yosemite Information]     [MIPS Linux]     [ARM Linux Kernel]     [ARM Linux]     [Linux Virtualization]     [Linux Security]     [Linux IDE]     [Linux RAID]     [Linux SCSI]     [Free Dating]

Add to Google Powered by Linux