On Tue, 3 Dec 2013, Or Gerlitz wrote: > On Tue, Dec 3, 2013 at 11:11 PM, Joseph Gasparakis > <joseph.gasparakis@xxxxxxxxx> wrote: > > >>> lack of GRO : receiver seems to not be able to receive as fast as you want. > >>>> TCPOFOQueue: 3167879 > >>> So many packets are received out of order (because of losses) > > >> I see that there's no GRO also for the non-veth tests which involve > >> vxlan, and over there the receiving side is capable to consume the > >> packets, do you have rough explaination why adding veth to the chain > >> is such game changer which makes things to start falling out? > > > I have seen this before. Here are my findings: > > > > The gso_type is different if the skb comes from veth or not. From veth, > > you will see the SKB_GSO_DODGY set. This breaks things as when the > > skb with DODGY set moves from vxlan to the driver through dev_xmit_hard, > > the stack drops it silently. I never got the time to find the root cause > > for this, but I know it causes re-transmissions and big performance > > degregation. > > > > I went as far as just quickly hacking a one liner unsetting the DODGY bit > > in vxlan.c and that bypassed the issue and recovered the performance > > problem, but obviously this is not a real fix. > > thanks for the heads up, few quick questions/clafications -- > > -- you are talking on drops done @ the sender side, correct? Eric was > saying we have evidences that the drops happen on the receiver. I am *guessing* drops on the Rx are due to the drops at the Tx. See my answer to your next question for more info. > > -- without the hack you did, still packets are sent/received, so what > makes the stack to drop only some of them? > What I had seen is GSOs getting dropped on the Tx side. Basically the GSOs never made it to the driver, they were broken into non GSO smaller skbs by the stack. I think the stack is not handling well the GSO with the DODGY bit set, and that causes it to maybe partially the packet to be emitted, causing the re-transmits (and maybe the drops on your Rx end)? Of course all this is speculation, the fact that I know is that as soon as I was forcing the gso type I saw offloaded VXLAN encapsulated traffic at decent speeds. > -- why packets coming from veth would have the SKB_GSO_DODGY bit set? That is something I would love to know too. I am guessing this is a way for the VM to say it is a non-trusted packet? And maybe all this can be fixed by maybe setting something on the VM through a userspace tool that will stop the veth to set the DODGY bit? > > -- so where is now (say net.git or 3.12.x) this one line you commented > out? I don't see in vxlan.c or in ip_tunnel_core.c / ip_tunnel.c > explicit setting of SKB_GSO_DODGY I did not commit it, as this was just a workaround to prove to myself that the problem I was seing was due to the gso_type, and it would actually just hide the problem and not give a proper solution to it. > > Also, I am pretty sure the problem exists also when sending/receiving > guest traffic through tap/macvtap <--> vhost/virtio-net and friends, I > just sticked to the veth flavour b/c its one (== the hypervisor) > network stack to debug and not two (+ the guest one). > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html