On Wed, 2012-08-08 at 22:59 +0200, Eric Dumazet wrote: > On Wed, 2012-08-08 at 22:37 +0200, Jesper Dangaard Brouer wrote: > > Hi NetDev > > > > I think I have found a problem/bug with IPv6-UDP address binding. > > > > I found this problem while playing with IPVS and IPv6-UDP, but its also > > present in more basic/normal situations. > > > > If you have two IPv6 addresses, within the same IPv6 subnet, then one > > of the IPv6 addrs takes precedence over the other (for UDP only). > > > > Meaning that, if connecting to the "secondary" IPv6 via UDP, will > > result in userspace see/bind the connection as being created to the > > "primary" IP, even-though tcpdump shows that the IPv6-UDP packets are > > dest the "secondary". > > > > The result is; that only the first IPv6-UDP packet is delivered to > > userspace, and the next packets are denied by the kernel as the UDP > > socket is "established" with the "primary" IPv6 addr. > > > > I would appreciate some hints to where in the IPv6 code I should look > > for this bug. If any one else wants to fix it, I'm also fine with > > that ;-) > > > > > > Its quite easy to reproduce, using netcat (nc). > > > > Add two addresses to the "server" e.g.: > > ip addr add fee0:cafe::102/64 dev eth0 > > ip addr add fee0:cafe::bad/64 dev eth0 > > > > Run a netcat listener on "server": > > nc -6 -u -l 2000 > > (Notice restart the listener between runs, due to limitation in nc) > > > > On the client add an IPv6 addr e.g.: > > ip addr add fee0:cafe::101/64 dev eth0 > > > > Run a netcat UDP-IPv6 producer on "client": > > nc -6 -u fee0:cafe::bad 2000 > > > > Notice that first packet, will get through, but second packets will > > not (nc: Write error: Connection refused). Running a tcpdump shows > > that the kernel is sending back ICMP6, destination unreachable, > > unreachable port. > > > > Its also possible to see the problem, simply running "netstat -uan" on > > "server", which will show that the "established" UDP connection, is > > bound to the wrong "Local Address". > > > > (Tested on both latest net-next kernel at commit 79cda75a1, and also > > on RHEL6 approx 2.6.32) > > > > Hi Jesper > > Thats because the "nc -6 -u -l 2000" on server does : > > bind(3, {sa_family=AF_INET6, sin6_port=htons(2000), inet_pton(AF_INET6, > "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0 > > recvfrom(3, "\n", 1024, MSG_PEEK, {sa_family=AF_INET6, > sin6_port=htons(53696), inet_pton(AF_INET6, "fee0:cafe::101", > &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 1 > > connect(3, {sa_family=AF_INET6, sin6_port=htons(53696), > inet_pton(AF_INET6, "fee0:cafe::101", &sin6_addr), sin6_flowinfo=0, > sin6_scope_id=0}, 28) = 0 > > And the kernel automatically chooses a SOURCE address (fee0:cafe::102) > that is not what you expected (fee0:cafe::bad) Okay I see. And this is also the case for IPv4. Guess I should have read Stephens[1] first, as this problem with multihomed hosts is described (on page 219). He also states, that this is a problem/feature related to Berkely-derived implementations. E.g. Solaris handle this, the way I expected. That is, the source IP address for the server's reply is the dest IP of the client's request. > So its a bug in the application. Yes, I guess its an application bug, because Berkely-derived implementations don't handle multihomeing well for UDP. Why are we keeping this, counter-intuitive behavior? What about changing the implementation to act like Solaris, which IMHO makes much more sense? (BTW, iperf also have this "bug") > UDP connect() is tricky : In this case, nc should learn on what IP > address the client sent the frame. (using recvmsg() and appropriate > ancillary message) Reading through howto use recvmsg() and parsing of the ancillary messages. See [1] "Advanced UDP sockets" page 531-538. Its quite an extensive task to extract destination IP address. No wonder, netcat missed this part. > Then nc should bind a new socket on this address, then do the connect() Yes, after the difficult extraction of the dest IP of the UDP packet. Now I better understand, why the DNS server named/bind is so annoying, that is requires a restart after adding IPs. I guess they didn't implement this recvmsg(), and instead chooses to bind to all avail IPs on init/start. Hints for readers: For IPv4 is easy to see which is the "secondary" IP via the command "ip addr" (look for the word "secondary") For IPv6 I cannot tell which one is the secondary/primary from the "ip addr" output. But you can instead do a route lookup via the command e.g: "ip route get fee0:cafe::102" and look for the "src" field. [1] UNIX network programming Vol.1 (Networking APIs) by W. Richard Stevens -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html