Created attachment 188323 [details]
Patch for the ULE load balancer solving the issue of this bug report.
The load balancer of ULE has a bug which cause it to be executed only once at startup, and it will *never* be executed again.
The reason is simple, when the sched_balance function is called for the first time, the smp is not yet setup, and the function returns.
However the balance_tick variable is not resetted before the return and thus will remains 0 for ever.
Because the condition in sched_clock to trigger the load balancer is :
" balance_ticks && --balance_ticks == 0 ", once balance_ticks is 0, this condition will never be met, and thus the load balancer will never be executed again.
To convince yourself you can even put a panic right before the call to sched_balance_group in sched_balance, it will never panic ...
The fix is rather simple, just move the line resetting the balance_ticks value *before* the if branch "if (smp_started == 0 || rebalance == 0)". That way we are sure that the load balancer is re-scheduled.
I think the analysis is correct.
Might be, sched_initticks() should set the balance_ticks as well, after the balance_interval is initialized ?
sched_setup calls sched_setup_smp which in turns calls sched_balance that will set the balance_ticks.
Maybe, this version use less overhead charges for not SMP and given the sched_random() function.
--- sched_ule.c.orig 2017-11-29 09:23:11.503091000 +0200
+++ sched_ule.c.my1 2017-12-12 13:40:46.330506000 +0200
@@ -880,11 +880,15 @@
struct tdq *tdq;
- if (smp_started == 0 || rebalance == 0)
+ if (smp_started == 0)
balance_ticks = max(balance_interval / 2, 1) +
(sched_random() % balance_interval);
+ if (rebalance == 0)
tdq = TDQ_SELF();
(In reply to Ivan Klymenko from comment #3)
Sorry, I think I said foolishness.
This is getting attention due to the recent FreeBSD ULE vs Linux CFS research paper which is what caused this bug to be discovered. Is there further analysis needed here?
Cc’ing a few people who have made recent changes to the ULE scheduler, and therefore might feel confident reviewing this patch.