Custom Search

irq latency regression post af5ab277 - was Re: [patch] clockevents: Reinstate the per cpu tick skew

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



Greetings,

On Tue, 2012-01-03 at 07:20 +0100, Mike Galbraith wrote: 
> On Wed, 2011-12-28 at 16:10 +0100, Mike Galbraith wrote:
> > On Wed, 2011-12-28 at 14:32 +0100, Arjan van de Ven wrote:
> > > 
> > > I think we need to just say no to this, and kill the nohz=off option
> > > entirely.
> > > 
> > > Seriously, are people still running with ticks for any legitimate
> > > reasons? (and not just because they goofed their config file)
> > 
> > Yup.  Realtime loads sometimes need it.  Even without contention
> > problems, entering/leaving nohz is a latency source.  If every little
> > bit counts, you may have the choice of letting the electric meter spin
> > or not getting the job done at all.

There are other facets to tick skew removal that have turned up while
looking into an irq latency regression 2.6.32->3.0.  Not only does skew
removal induce jitter woes for moderate sized boxen running RT kernels,
it's a jitter source for large machines in general. 

More interestingly, that skew removal also appears to be indirectly
responsible for a rather large irq latency regression.  I bisected the
source of same to..

0209f649 rcu: limit rcu_node leaf-level fanout

.._but_, the source of the lock contention it addressed appears to be
the very tick skew removal that caused my xtime_lock jitter woes in RT.
Revert 0209f649 in CONFIG_MAXSMP CONFIG_PREEMPT_NONE kernel, contention
appears, restore skew, it disappears virtually entirely.  So it would
appear that we induced a ~400% latency regression to combat contention
that was itself induced by tick skew removal.

In enterprise, I can revert 0209f649 and enable tick skew across the
board instead of selectively, and kill the regression at the cost of
losing whatever power savings killing skew brought us.  May have to do
that.  In another thread, Paul suggested limiting GP initialization to
CPUs that have been online, which indeed turned the regression into a
modest progression.  That's highly attractive long term, but doing that
in a stable kernel before it's baked in mainline is not the least bit
attractive.  Hohum, rock or hard spot, pick one.

Anyway, I thought I should summarize the linkage of RCU induced latency
regression to tick skew removal.  Seems likely I'm not the only sod who
will have this land in their bug list.

> Patch making tick skew a boot option below, and hard numbers below that.
> 
> Test setup:
> 60 isolated cores running a synchronized frame scheduler model for 1
> hour, scheduling worker-bees at three frequencies.  (The testcase is
> supposed to "good enough" simulate a real frame rate scheduler, and did
> pretty well at showing the cost of these particular collisions.)
> 
> First set of numbers is without tick skew, and nohz enabled.  Second set
> is tick skewed, nohz and rt push/pull turned off for the isolated core
> set.  The tick skew alone is responsible for an order of magnitude of
> jitter improvement.  I have hard numbers for nohz and cpupri_set() as
> well, but bottom line for me is that with nohz enabled, my 30us jitter
> budget is nearly doubled, so even with the tick skewed, nohz is just not
> a viable option ATM.
> 
> 
> From: Mike Galbraith <mgalbraith@xxxxxxx>
> 
> clockevents: Reinstate the per cpu tick skew
> 
> Quoting removal commit af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867
> Historically, Linux has tried to make the regular timer tick on the
> various CPUs not happen at the same time, to avoid contention on
> xtime_lock.
>     
> Nowadays, with the tickless kernel, this contention no longer happens
> since time keeping and updating are done differently. In addition,
> this skew is actually hurting power consumption in a measurable way on
> many-core systems.
> End quote
> 
> Contrary to the above, contention does still happen, and can be a
> problem for realtime loads whether nohz is active or not, so give
> the user the ability to decide whether power consumption or jitter
> is the more important consideration.
> 
> Signed-off-by: Mike Galbraith <mgalbraith@xxxxxxx>
> Cc: Arjan van de Ven <arjan@xxxxxxxxxxxxxxx>
> 
> ---
>  Documentation/kernel-parameters.txt |    3 +++
>  kernel/time/tick-sched.c            |   19 +++++++++++++++++++
>  2 files changed, 22 insertions(+)
> 
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2295,6 +2295,9 @@ bytes respectively. Such letter suffixes
>  	simeth=		[IA-64]
>  	simscsi=
>  
> +	skew_tick=	[KNL] Offset the periodic timer tick per cpu to mitigate
> +			xtime_lock contention on larger systems.
> +
>  	slram=		[HW,MTD]
>  
>  	slub_debug[=options[,slabs]]	[MM, SLUB]
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -759,6 +759,8 @@ static enum hrtimer_restart tick_sched_t
>  	return HRTIMER_RESTART;
>  }
>  
> +static int sched_skew_tick;
> +
>  /**
>   * tick_setup_sched_timer - setup the tick emulation timer
>   */
> @@ -777,6 +779,14 @@ void tick_setup_sched_timer(void)
>  	/* Get the next period (per cpu) */
>  	hrtimer_set_expires(&ts->sched_timer, tick_init_jiffy_update());
>  
> +	/* Offset the tick to avert xtime_lock contention. */
> +	if (sched_skew_tick) {
> +		u64 offset = ktime_to_ns(tick_period) >> 1;
> +		do_div(offset, num_possible_cpus());
> +		offset *= smp_processor_id();
> +		hrtimer_add_expires_ns(&ts->sched_timer, offset);
> +	}
> +
>  	for (;;) {
>  		hrtimer_forward(&ts->sched_timer, now, tick_period);
>  		hrtimer_start_expires(&ts->sched_timer,
> @@ -858,3 +868,12 @@ int tick_check_oneshot_change(int allow_
>  	tick_nohz_switch_to_nohz();
>  	return 0;
>  }
> +
> +static int __init skew_tick(char *str)
> +{
> +	get_option(&str, &sched_skew_tick);
> +
> +	return 0;
> +}
> +early_param("skew_tick", skew_tick);
> +
> 
> No skewed tick, nohz active:
> FREQ=960 FRAMES=3456000 LOOP=50000 using CPUs 4 - 23
> FREQ=666 FRAMES=2397600 LOOP=72072 using CPUs 24 - 43
> FREQ=300 FRAMES=1080000 LOOP=160000 using CPUs 44 - 63
> on your marks... get set... POW!
> Cpu Frames    Min     Max(Frame)      Avg     Sigma     LastTrans Fliers(Frames) 
> 4   3456000   0.0159  51.51 (1751285) 1.0811  2.3215    0 (0)     940 (2496,2497,36625,36626,45649,..3438632)
> 5   3456000   0.0159  57.44 (1301949) 1.1164  2.3599    0 (0)     1010 (32353,32354,36625,36626,43681,..3434312)
> 6   3456000   0.0159  49.58 (546753)  1.0602  2.3222    0 (0)     1037 (32353,32354,36625,36626,41809,..3425240)
> 7   3456000   0.0159  52.20 (546753)  1.0681  2.3370    0 (0)     1035 (32353,32354,36625,36626,41809,..3432248)
> 8   3456000   0.0159  58.91 (1407504) 1.0592  2.0873    0 (0)     865 (11041,11042,15505,15506,25585,..3412208)
> 9   3456000   0.0159  54.61 (1407504) 1.0581  2.0775    0 (0)     850 (11041,11042,15505,15506,20234,..3411272)
> 10  3456000   0.0159  52.91 (1338694) 1.1259  2.0825    0 (0)     799 (11041,11042,15505,15506,16465,..3400640)
> 11  3456000   0.0159  50.56 (2470554) 1.1881  2.0364    0 (0)     334 (50714,113715,113716,166349,178780,..3421185)
> 12  3456000   0.0159  50.29 (2462200) 0.9961  2.0202    0 (0)     639 (9337,9338,11041,11042,15505,..3452529)
> 13  3456000   0.0159  56.52 (2470554) 1.1478  2.0602    0 (0)     400 (2545,2546,9121,9122,66434,..3440289)
> 14  3456000   0.0159  55.06 (34587)   1.2129  2.4890    0 (0)     444 (34587,34588,62571,62572,62619,..3440434)
> 15  3456000   0.0159  46.48 (583883)  1.2891  2.1824    0 (0)     306 (91563,95739,95740,141197,155741,..3406785)
> 16  3456000   0.0159  103.70 (2828662)2.1077  4.0380    410 (2)   9435 (697,698,1105,1106,1153,..3455937)
> 17  3456000   0.0159  73.89 (2470553) 2.1598  3.7529    0 (0)     6180 (2473,2474,3985,3986,8569,..3438201)
> 18  3456000   0.0159  54.14 (1212190) 2.2391  3.7075    0 (0)     5485 (10274,10275,13970,13971,14379,..3455794)
> 19  3456000   0.0159  99.20 (810712)  2.3861  4.5793    0 (0)     19845 (674,675,2259,2260,3554,..3455915)
> 20  3456000   0.0159  71.30 (631597)  2.2565  4.3141    0 (0)     9365 (674,675,3555,7394,7395,..3455914)
> 21  3456000   0.0159  71.51 (1431073) 2.3127  4.4810    0 (0)     25073 (1154,2259,2260,4011,4012,..3455963)
> 22  3456000   0.0159  62.45 (215262)  2.1318  4.3088    0 (0)     23570 (2259,2260,4011,4012,4539,..3455963)
> 23  3456000   0.0159  61.50 (212190)  2.1307  4.3165    0 (0)     23605 (2259,2260,4539,4540,5019,..3455963)
> 24  2397600   0.0587  145.26 (2229318)2.6808  6.2104    492 (14)  32977 (812,813,1145,1470,1471,..2397564)
> 25  2397600   0.0587  133.93 (250966) 2.6171  6.3300    492 (13)  35463 (812,813,1145,1146,1462,..2397564)
> 26  2397600   0.0587  140.25 (1405878)2.7079  6.1603    492 (12)  32428 (806,812,813,1145,1146,..2397564)
> 27  2397600   0.0587  141.56 (1405879)2.6893  6.1515    492 (14)  32089 (808,809,810,811,812,..2397564)
> 28  2397600   0.0587  146.57 (1405879)2.7129  6.0797    492 (14)  31637 (800,801,812,813,827,..2397564)
> 29  2397600   0.0587  137.99 (2172039)2.3360  5.9859    492 (14)  30551 (826,827,1157,1480,1481,..2397564)
> 30  2397600   0.0587  144.06 (948198) 2.2381  5.0413    496 (6)   19401 (826,827,832,833,1175,..2397566)
> 31  2397600   0.0587  141.92 (948198) 2.2509  5.0654    496 (4)   19353 (826,827,832,833,1175,..2397566)
> 32  2397600   0.0587  149.31 (2172038)2.7842  6.8891    492 (10)  41301 (822,823,824,825,826,..2397564)
> 33  2397600   0.0587  142.99 (1975198)2.6904  5.3538    181 (6)   21954 (511,512,846,847,1175,..2397582)
> 34  2397600   0.0587  167.07 (948199) 2.6350  5.6616    179 (4)   23602 (503,504,507,508,511,..2397582)
> 35  2397600   0.0587  79.81 (2152123) 2.5135  4.1781    0 (0)     5406 (1879,1881,1882,2876,2877,..2396956)
> 36  2397600   0.0587  112.24 (1184061)2.7419  5.3774    0 (0)     21005 (1185,1186,1189,1190,1518,..2397263)
> 37  2397600   0.0587  78.86 (986867)  2.6678  5.1954    0 (0)     19350 (529,530,861,863,1189,..2397263)
> 38  2397600   0.0587  77.90 (1782680) 2.5881  4.8399    0 (0)     13516 (525,526,529,530,860,..2396938)
> 39  2397600   0.0587  78.02 (1642135) 2.4351  3.8095    0 (0)     3569 (898,2900,2901,3561,3566,..2397291)
> 40  2397600   0.0587  218.81 (891116) 2.7215  6.6456    392 (8)   38961 (714,715,726,727,1046,..2397450)
> 41  2397600   0.0587  141.56 (1975198)2.6441  5.2995    181 (4)   22572 (846,847,1179,1180,1185,..2397249)
> 42  2397600   0.0587  77.07 (1782679) 2.3957  5.0119    0 (0)     17798 (529,530,860,861,862,..2397263)
> 43  2397600   0.0587  81.72 (1333323) 2.3469  4.5082    0 (0)     11172 (1205,1206,1207,1208,1865,..2396552)
> 44  1080000   0.0032  168.33 (988438) 2.7037  7.1729    381 (10)  20368 (650,651,662,663,809,..1056079)
> 45  1080000   0.0032  156.88 (935898) 2.6181  7.1047    0 (0)     19932 (767,768,809,810,866,..1022038)
> 46  1080000   0.0032  156.40 (935898) 2.2137  6.8080    0 (0)     18522 (684567,684568,695466,695467,699570,..975856)
> 47  1080000   0.0032  150.20 (905448) 2.6011  7.0525    0 (0)     19427 (2012,2013,510347,510348,617324,..980947)
> 48  1080000   0.0032  163.08 (1012102)3.0856  8.6857    491 (49)  32197 (527,528,536,537,545,..1059883)
> 49  1080000   0.0032  151.87 (861738) 2.1150  6.2499    0 (0)     14993 (679920,679921,681762,681763,684567,..889561)
> 50  1080000   0.0032  143.53 (843639) 2.3864  6.2304    0 (0)     14372 (673311,673312,676716,676717,679680,..907048)
> 51  1080000   0.0032  148.53 (815289) 2.4022  6.1284    0 (0)     13945 (667971,667972,672835,673311,673312,..925077)
> 52  1080000   0.0032  149.49 (815289) 2.4059  6.0745    0 (0)     13932 (667971,667972,672834,672835,673311,..925077)
> 53  1080000   0.0032  149.49 (788680) 2.2976  5.4171    0 (0)     10821 (662766,662767,664794,664795,667971,..851374)
> 54  1080000   0.0032  146.63 (788680) 2.1600  5.5494    0 (0)     11435 (662766,662767,664794,664795,667971,..925077)
> 55  1080000   0.0032  145.91 (817180) 2.3747  5.9131    0 (0)     13198 (664794,664795,667971,667972,672834,..925077)
> 56  1080000   0.0032  140.91 (788680) 2.4499  5.8216    0 (0)     13403 (641917,658567,662767,664794,664795,..925077)
> 57  1080000   0.0032  141.38 (707776) 1.2948  3.8831    0 (0)     5041 (654816,654817,658320,658321,658566,..757666)
> 58  1080000   0.0032  149.73 (707776) 1.2131  3.6946    0 (0)     4076 (641916,641917,654136,654816,654817,..739225)
> 59  1080000   0.0032  51.02 (220341)  1.3073  3.1542    0 (0)     1869 (138187,145140,145141,147822,147823,..1021026)
> 60  1080000   0.0032  119.93 (313205) 1.6518  5.2116    0 (0)     9504 (3019,3020,12955,12956,25645,..1078275)
> 61  1080000   0.0032  149.25 (707776) 1.2933  3.5546    0 (0)     3393 (631761,631762,641916,641917,647521,..732562)
> 62  1080000   0.0032  126.60 (222973) 2.0194  5.6079    0 (0)     11357 (3019,3020,12955,12956,14420,..1078275)
> 63  1080000   0.0032  126.60 (222973) 2.0223  5.6224    0 (0)     11452 (3019,3020,12955,12956,14420,..1078275)
> 
> Same kernel, tick skew enabled, nohz and push/pull (100% pinned load...)
> disabled for the isolated cpuset.  This is 10us or so better than 33-rt
> can do on this box with nohz=off, ie that's roughly the jitter that
> cpupri_set() induces (_can_ double that very rarely it seems).
> 
> So with a couple little tweaks, 3.0-rt performs better than 33-rt (and
> can dynamically become "green" again when not running picky rt load)
> despite being a little fatter.  'Course if I applied the same dinky
> tweaks to 33-rt, the weight gain would show.  Anyway, the numbers..
> 
> FREQ=960 FRAMES=3456000 LOOP=50000 using CPUs 4 - 23
> FREQ=666 FRAMES=2397600 LOOP=72072 using CPUs 24 - 43
> FREQ=300 FRAMES=1080000 LOOP=160000 using CPUs 44 - 63
> on your marks... get set... POW!
> Cpu Frames    Min     Max(Frame)      Avg     Sigma     LastTrans Fliers(Frames) 
> 4   3456000   0.0159  5.98 (1957035)  0.1275  0.2979    0 (0)     
> 5   3456000   0.0159  6.21 (2641598)  0.2173  0.3444    0 (0)     
> 6   3456000   0.0159  5.26 (1313825)  0.1599  0.2956    0 (0)     
> 7   3456000   0.0159  5.98 (346106)   0.1632  0.2877    0 (0)     
> 8   3456000   0.0159  5.50 (70893)    0.1437  0.3450    0 (0)     
> 9   3456000   0.0159  5.98 (1550901)  0.1381  0.3502    0 (0)     
> 10  3456000   0.0159  5.74 (106100)   0.1478  0.3313    0 (0)     
> 11  3456000   0.0159  5.71 (3174550)  0.1413  0.3090    0 (0)     
> 12  3456000   0.0159  5.02 (1506694)  0.1761  0.3098    0 (0)     
> 13  3456000   0.0159  5.71 (3054611)  0.1768  0.3546    0 (0)     
> 14  3456000   0.0159  5.02 (3148871)  0.1299  0.3062    0 (0)     
> 15  3456000   0.0159  4.99 (2122036)  0.1521  0.3132    0 (0)     
> 16  3456000   0.0159  6.42 (1728959)  0.1521  0.3905    0 (0)     
> 17  3456000   0.0159  6.21 (854434)   0.1618  0.3652    0 (0)     
> 18  3456000   0.0159  6.93 (2190440)  0.1418  0.3548    0 (0)     
> 19  3456000   0.0159  6.90 (1614252)  0.2075  0.4128    0 (0)     
> 20  3456000   0.0159  5.47 (136316)   0.2002  0.3977    0 (0)     
> 21  3456000   0.0159  6.69 (1057262)  0.1435  0.3475    0 (0)     
> 22  3456000   0.0159  6.66 (3123382)  0.1602  0.3585    0 (0)     
> 23  3456000   0.0159  5.94 (2297025)  0.2283  0.3616    0 (0)     
> 24  2397600   0.0587  6.38 (991357)   0.2580  0.3817    0 (0)     
> 25  2397600   0.0587  6.73 (1162518)  0.2380  0.3730    0 (0)     
> 26  2397600   0.0587  7.21 (733474)   0.2502  0.3590    0 (0)     
> 27  2397600   0.0587  6.86 (1873716)  0.2280  0.3768    0 (0)     
> 28  2397600   0.0587  7.21 (2296767)  0.2521  0.3884    0 (0)     
> 29  2397600   0.0587  7.21 (616888)   0.4165  0.4887    0 (0)     
> 30  2397600   0.0587  7.09 (458995)   0.4245  0.4577    0 (0)     
> 31  2397600   0.0587  6.14 (1674893)  0.3974  0.4544    0 (0)     
> 32  2397600   0.0587  7.45 (130233)   0.4440  0.5456    0 (0)     
> 33  2397600   0.0587  7.09 (1453350)  0.2482  0.3813    0 (0)     
> 34  2397600   0.0587  6.73 (2365066)  0.2886  0.3827    0 (0)     
> 35  2397600   0.0587  6.14 (35955)    0.2556  0.3841    0 (0)     
> 36  2397600   0.0587  6.62 (2145554)  0.2566  0.3933    0 (0)     
> 37  2397600   0.0587  7.81 (130234)   0.5375  0.5129    0 (0)     
> 38  2397600   0.0587  7.33 (130234)   0.4921  0.5255    0 (0)     
> 39  2397600   0.0587  7.57 (130234)   0.4200  0.4901    0 (0)     
> 40  2397600   0.0587  6.62 (2367859)  0.2962  0.4553    0 (0)     
> 41  2397600   0.0587  6.26 (206979)   0.5036  0.5491    0 (0)     
> 42  2397600   0.0587  6.38 (1302660)  0.5093  0.5469    0 (0)     
> 43  2397600   0.0587  6.73 (1825681)  0.5511  0.5734    0 (0)     
> 44  1079999   0.0032  7.39 (91927)    0.4603  0.5291    0 (0)     
> 45  1079999   0.0032  6.92 (977865)   0.3143  0.4378    0 (0)     
> 46  1079999   0.0032  5.96 (1002473)  0.2129  0.3999    0 (0)     
> 47  1079999   0.0032  6.44 (981423)   0.4193  0.5293    0 (0)     
> 48  1079999   0.0032  6.20 (375165)   0.2602  0.4201    0 (0)     
> 49  1079999   0.0032  5.73 (886536)   0.4002  0.5174    0 (0)     
> 50  1079999   0.0032  6.44 (547629)   0.3182  0.4507    0 (0)     
> 51  1079999   0.0032  5.73 (143994)   0.4736  0.5952    0 (0)     
> 52  1079999   0.0032  6.68 (1053525)  0.4753  0.5132    0 (0)     
> 53  1079999   0.0032  6.44 (378576)   0.3686  0.4691    0 (0)     
> 54  1079999   0.0032  6.92 (886639)   0.6017  0.5538    0 (0)     
> 55  1079999   0.0032  6.68 (1055655)  0.4917  0.5232    0 (0)     
> 56  1079999   0.0032  6.44 (293526)   0.2752  0.4340    0 (0)     
> 57  1079999   0.0032  8.59 (913209)   1.1433  0.8550    0 (0)     
> 58  1079999   0.0032  5.25 (259824)   0.2139  0.3702    0 (0)     
> 59  1079999   0.0032  6.68 (245211)   0.2031  0.3665    0 (0)     
> 60  1079999   0.0032  6.44 (895440)   0.4445  0.4867    0 (0)     
> 61  1079999   0.0032  5.96 (896382)   0.2541  0.3923    0 (0)     
> 62  1079999   0.0032  7.16 (895440)   0.5437  0.5162    0 (0)     
> 63  1079999   0.0032  6.44 (895371)   0.5707  0.5135    0 (0)
> 
> So IMHO there is a valid case for keeping NO_HZ a config option for
> folks who can never tolerate the pricetag, but as for the nohz=off
> option, methinks that could indeed go away, given it's easy to make an
> on/off switch.  I made one for both nohz and push/pull, just need to
> move it into cpusets and make it pretty enough to live.
> 
> WRT $subject, it seems pretty clear that the RT kernel either wants tick
> skew back.. or collision avoidance radar.. or something.
> 
> 	-Mike
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RT Stable]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Photo]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

Add to Google Powered by Linux