Suppose one creates a simple jail: jail -c name=foo command=/bin/sh It will inherit the default cpuset from the parent jail. You can use cpuset -j to shrink this set: % cpuset -g -j 1 jail 1 mask: 0, 1, 2, 3, 4, 5, 6, 7 % cpuset -g -j 1 -r jail 1 mask: 0, 1, 2, 3, 4, 5, 6, 7 % cpuset -j 1 -l 0-3 % cpuset -g -j 1 jail 1 mask: 0, 1, 2, 3 However, once you've shrunk the set, you can never expand it. The reason is that the jail set is its own root, so the check against the 'root' mask in cpuset_modify() fails with EINVAL. I think this is perhaps not the intended behavior. I think that when setting the cpuset of a jail you want to apply the check against the parent jail's mask, not the jail's own mask. In particular, this prevents using cpuset -j to dynamically manage the CPUs available to jails. The alternative is to leave the jails unrestricted and manage the processes in the jail (or create dedicated, named cpusets for each jail and manage those) which is not as convenient for tools operating at the abstraction level of a jail. One possibility might be to have cpuset_getroot() always skip over the passed in set to its parent at least once before checking for the ROOT flag (or fix callers to pass set->cs_parent instead of set), but I haven't looked at what other implications that might have.
(In reply to John Baldwin from comment #0) I am not sure if I understand it correctly. Is it problem with nested jails or from the host it-self? I am running jails with cpuset for many years and it worked for me: # cpuset -j 3 -g jail 3 mask: 3, 4 # cpuset -j 3 -l 3-5 # cpuset -j 3 -g jail 3 mask: 3, 4, 5 It is on FreeBSD 11.3, no nested jails.
(In reply to Miroslav Lachman from comment #1) It doesn't matter if nested or not, jails in 12.X or CURRENT cannot extend their cpuset. However, you're right, in FreeBSD 11.X it works quite well.
(In reply to Luca Pizzamiglio from comment #2) Ah, thank you for the clarification. I hope this regression will be fixed before we plan to upgrade to 12.x.
Created attachment 219867 [details] git(1) diff against base I think the attached is a recommendation that I'm happy with; you can't globally let cpuset_getbase() find the root's root, but you can fix the restrictions in cpuset_modify(). With this: (viper = host, boo = jail, boo.foo = jail nested inside boo) ``` boo# cpuset -gi pid -1 cpuset id: 4 boo# cpuset -g pid -1 mask: 0, 1, 2, 3 pid -1 domain policy: first-touch mask: 0 boo# cpuset -l 0,1,2 -s 4 cpuset: setaffinity: Operation not permitted root@viper:/usr/home/kevans# cpuset -l 0,1,2 -s 4 boo# cpuset -g pid -1 mask: 0, 1, 2 pid -1 domain policy: first-touch mask: 0 root@viper:/usr/home/kevans# jail -c name=boo.foo path=/ command=/bin/sh boo.foo# cpuset -g pid -1 mask: 0, 1, 2 pid -1 domain policy: first-touch mask: 0 boo.foo# cpuset -c boo.foo# cpuset -gi pid -1 cpuset id: 5 boo# cpuset -l 0,1 -s 5 boo.foo# cpuset -g pid -1 mask: 0, 1 pid -1 domain policy: first-touch mask: 0 ``` So every jail can modify a subordinate jail's root, but not its own root, all the way up to prison0 root. root can restrict a jail to 1,2 or widen it back to 1,2,3 and that jail can delegate a subset of those to child jails.
base r368779 alleviates this; unfortunately I had forgotten to tag this PR in it.
I've just tested the expansion of a jail cpuset on CURRENT and it works. 11.x is not affected, the bug has been introduced when cpuset was reworked to manage memory domains Have you already merged in 12-STABLE as well? If yes, we can close this PR
Ah, indeed, sorry; I seem to have merged it in 24a8ea4df3426dfce2896e265eb3e0206aa33a21. Thanks!