Created attachment 188323 [details]
Patch for the ULE load balancer solving the issue of this bug report.
The load balancer of ULE has a bug which cause it to be executed only once at startup, and it will *never* be executed again.
The reason is simple, when the sched_balance function is called for the first time, the smp is not yet setup, and the function returns.
However the balance_tick variable is not resetted before the return and thus will remains 0 for ever.
Because the condition in sched_clock to trigger the load balancer is :
" balance_ticks && --balance_ticks == 0 ", once balance_ticks is 0, this condition will never be met, and thus the load balancer will never be executed again.
To convince yourself you can even put a panic right before the call to sched_balance_group in sched_balance, it will never panic ...
The fix is rather simple, just move the line resetting the balance_ticks value *before* the if branch "if (smp_started == 0 || rebalance == 0)". That way we are sure that the load balancer is re-scheduled.
I think the analysis is correct.
Might be, sched_initticks() should set the balance_ticks as well, after the balance_interval is initialized ?
sched_setup calls sched_setup_smp which in turns calls sched_balance that will set the balance_ticks.
Maybe, this version use less overhead charges for not SMP and given the sched_random() function.
--- sched_ule.c.orig 2017-11-29 09:23:11.503091000 +0200
+++ sched_ule.c.my1 2017-12-12 13:40:46.330506000 +0200
@@ -880,11 +880,15 @@
struct tdq *tdq;
- if (smp_started == 0 || rebalance == 0)
+ if (smp_started == 0)
balance_ticks = max(balance_interval / 2, 1) +
(sched_random() % balance_interval);
+ if (rebalance == 0)
tdq = TDQ_SELF();
(In reply to Ivan Klymenko from comment #3)
Sorry, I think I said foolishness.