On 9/24/2013 4:44 PM, Doug Ledford wrote: > When using very large numbers of connections (10,000 was in use here), > we ran into a problem where when we resolved a performance problem in > the kernel cma.c code, we suddenly developed a new problem. That new > problem turned out to be the fact that with the underlying kernel issue > resolved, 10,000 connect requests would flood the server side of the > test and the cmtime application would respond as quickly as possible. > However, the client side would not bother to check any of the returns > until after having sent all 10,000 connect requests. When the kernel > had a serializing performance problem, this was OK. When it was fixed, > this caused a general slowdown in connect operations due to overruns in > the event processing. This patch causes the client side to fire off > threads that will handle responses to connect requests as they come in > instead of allowing them to backlog uncontrollably. Times for a 10,000 > connect run changed from this: > > [root@rdma-dev-01 ~]# more > 3.12.0-rc1.cached_gids+optimized_connect+trimmed_cache+.output > ib1: > step total ms max ms min us us / conn > create id : 46.64 0.10 1.00 4.66 > bind addr : 89.61 0.04 7.00 8.96 > resolve addr : 50.63 26.18 23976.00 5.06 > resolve route: 565.44 538.77 26736.00 56.54 > create qp : 4028.31 5.70 326.00 402.83 > connect : 50077.42 49990.49 90734.00 5007.74 > disconnect : 5277.25 4850.35 380017.00 527.72 > destroy : 42.15 0.04 2.00 4.21 > > ib0: > step total ms max ms min us us / conn > create id : 34.82 0.04 1.00 3.48 > bind addr : 25.94 0.02 1.00 2.59 > resolve addr : 48.18 25.01 22779.00 4.82 > resolve route: 501.28 476.26 25071.00 50.13 > create qp : 3274.12 6.05 257.00 327.41 > connect : 55549.64 55490.32 62150.00 5554.96 > disconnect : 5263.64 4851.18 375628.00 526.36 > destroy : 47.20 0.07 2.00 4.72 > > to this: > > [root@rdma-dev-01 ~]# more > 3.12.0-rc1.cached_gids+optimized_connect+trimmed_cache+-fixed-cmtime.output > ib1: > step total ms max ms min us us / conn > create id : 34.45 0.08 1.00 3.44 > bind addr : 88.41 0.04 7.00 8.84 > resolve addr : 33.59 4.65 612.00 3.36 > resolve route: 618.68 0.61 97.00 61.87 > create qp : 4024.03 6.30 341.00 402.40 > connect : 6983.35 6886.33 8509.00 698.33 > disconnect : 5066.47 230.34 831.00 506.65 > destroy : 37.02 0.03 2.00 3.70 > > ib0: > step total ms max ms min us us / conn > create id : 42.61 0.14 1.00 4.26 > bind addr : 27.05 0.03 2.00 2.70 > resolve addr : 40.65 10.73 869.00 4.06 > resolve route: 626.75 0.60 103.00 62.68 > create qp : 3334.50 6.48 273.00 333.45 > connect : 6310.29 6251.59 13298.00 631.03 > disconnect : 5111.12 365.87 867.00 511.11 > destroy : 36.57 0.02 2.00 3.66 > > with this patch. > > Signed-off-by: Doug Ledford <dledford@xxxxxxxxxx> > --- > examples/cmtime.c | 227 +++++++++++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 189 insertions(+), 38 deletions(-) Ping. I noticed this never got picked up. Was there a problem, or just overlooked?
Attachment:
signature.asc
Description: OpenPGP digital signature