Re: [PATCH] net: fix for a race condition in the inet frag code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Nikolay Aleksandrov <nikolay@xxxxxxxxxx>
Date: Mon,  3 Mar 2014 15:05:20 +0100

> I stumbled upon this very serious bug while hunting for another one,
> it's a very subtle race condition between inet_frag_evictor,
> inet_frag_intern and the IPv4/6 frag_queue and expire functions (basically
> the users of inet_frag_kill/inet_frag_put).
> What happens is that after a fragment has been added to the hash chain but
> before it's been added to the lru_list (inet_frag_lru_add), it may get
> deleted (either by an expired timer if the system load is high or the
> timer sufficiently low, or by the fraq_queue function for different
> reasons) before it's added to the lru_list, then after it gets added
> it's a matter of time for the evictor to get to a piece of memory which
> has been freed leading to a number of different bugs depending on what's
> left there. I've been able to trigger this on both IPv4 and IPv6 (which
> is normal as the frag code is the same), but it's been much more
> difficult to trigger on IPv4 due to the protocol differences about how
> fragments are treated. The setup I used to reproduce this is:
> 2 machines with 4 x 10G bonded in a RR bond, so the same flow can be
> seen on multiple cards at the same time. Then I used multiple instances
> of ping/ping6 to generate fragmented packets and flood the machines with
> them while running other processes to load the attacked machine.
> *It is very important to have the _same flow_ coming in on multiple CPUs
> concurrently. Usually the attacked machine would die in less than 30
> minutes, if configured properly to have many evictor calls and timeouts
> it could happen in 10 minutes or so.
> 
> The fix is simple, just move the lru_add under the hash chain locked
> region so when a removing function is called it'll have to wait for the
> fragment to be added to the lru_list, and then it'll remove it (it works
> because the hash chain removal is done before the lru_list one and
> there's no window between the two list adds when the frag can get
> dropped). With this fix applied I couldn't kill the same machine in 24
> hours with the same setup.
> 
> Fixes: 3ef0eb0db4bf ("net: frag, move LRU list maintenance outside of
> rwlock")
> 
> CC: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
> CC: David S. Miller <davem@xxxxxxxxxxxxx>
> 
> Signed-off-by: Nikolay Aleksandrov <nikolay@xxxxxxxxxx>

Nik, please beef up the commit message a bit and resubmit.

Some of your replies explained the situation a bit better.  Don't
be afraid of making the commit message too long or too verbose :-)

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Discussion]     [TCP Instrumentation]     [Ethernet Bridging]     [Linux Wireless Networking]     [Linux WPAN Networking]     [Linux Host AP]     [Linux WPAN Networking]     [Linux Bluetooth Networking]     [Linux ATH6KL Networking]     [Linux Networking Users]     [Linux Coverity]     [VLAN]     [Git]     [IETF Annouce]     [Linux Assembly]     [Security]     [Bugtraq]     [Yosemite Information]     [MIPS Linux]     [ARM Linux Kernel]     [ARM Linux]     [Linux Virtualization]     [Linux IDE]     [Linux RAID]     [Linux SCSI]