Bug 240687

Summary: jail cpuset masks cannot be expanded
Product: Base System Reporter: John Baldwin <jhb>
Component: kernAssignee: Kyle Evans <kevans>
Status: Closed FIXED    
Severity: Affects Some People CC: 000.fbsd, bugs, emaste, jeff, kevans, pizzamig
Priority: --- Flags: kevans: mfc-stable12+
kevans: mfc-stable11-
Version: CURRENT   
Hardware: Any   
OS: Any   
URL: https://reviews.freebsd.org/D21890
Attachments:
Description Flags
git(1) diff against base none

Description John Baldwin freebsd_committer freebsd_triage 2019-09-19 14:56:08 UTC
Suppose one creates a simple jail:

jail -c name=foo command=/bin/sh

It will inherit the default cpuset from the parent jail.  You can use cpuset -j to shrink this set:

% cpuset -g -j 1
jail 1 mask: 0, 1, 2, 3, 4, 5, 6, 7

% cpuset -g -j 1 -r
jail 1 mask: 0, 1, 2, 3, 4, 5, 6, 7

% cpuset -j 1 -l 0-3

% cpuset -g -j 1
jail 1 mask: 0, 1, 2, 3

However, once you've shrunk the set, you can never expand it.  The reason is that the jail set is its own root, so the check against the 'root' mask in cpuset_modify() fails with EINVAL.  I think this is perhaps not the intended behavior.  I think that when setting the cpuset of a jail you want to apply the check against the parent jail's mask, not the jail's own mask.

In particular, this prevents using cpuset -j to dynamically manage the CPUs available to jails.  The alternative is to leave the jails unrestricted and manage the processes in the jail (or create dedicated, named cpusets for each jail and manage those) which is not as convenient for tools operating at the abstraction level of a jail.

One possibility might be to have cpuset_getroot() always skip over the passed in set to its parent at least once before checking for the ROOT flag (or fix callers to pass set->cs_parent instead of set), but I haven't looked at what other implications that might have.
Comment 1 Miroslav Lachman 2019-10-04 13:48:16 UTC
(In reply to John Baldwin from comment #0)
I am not sure if I understand it correctly. Is it problem with nested jails or from the host it-self?

I am running jails with cpuset for many years and it worked for me:

# cpuset -j 3 -g
jail 3 mask: 3, 4

# cpuset -j 3 -l 3-5

# cpuset -j 3 -g
jail 3 mask: 3, 4, 5

It is on FreeBSD 11.3, no nested jails.
Comment 2 Luca Pizzamiglio freebsd_committer freebsd_triage 2019-10-04 16:06:22 UTC
(In reply to Miroslav Lachman from comment #1)

It doesn't matter if nested or not, jails in 12.X or CURRENT cannot extend their cpuset.
However, you're right, in FreeBSD 11.X it works quite well.
Comment 3 Miroslav Lachman 2019-10-04 17:11:24 UTC
(In reply to Luca Pizzamiglio from comment #2)
Ah, thank you for the clarification. I hope this regression will be fixed before we plan to upgrade to 12.x.
Comment 4 Kyle Evans freebsd_committer freebsd_triage 2020-11-21 21:20:13 UTC
Created attachment 219867 [details]
git(1) diff against base

I think the attached is a recommendation that I'm happy with; you can't globally let cpuset_getbase() find the root's root, but you can fix the restrictions in cpuset_modify(). With this:

(viper = host, boo = jail, boo.foo = jail nested inside boo)
```
boo# cpuset -gi
pid -1 cpuset id: 4
boo# cpuset -g
pid -1 mask: 0, 1, 2, 3
pid -1 domain policy: first-touch mask: 0
boo# cpuset -l 0,1,2 -s 4
cpuset: setaffinity: Operation not permitted

root@viper:/usr/home/kevans# cpuset -l 0,1,2 -s 4

boo# cpuset -g
pid -1 mask: 0, 1, 2
pid -1 domain policy: first-touch mask: 0

root@viper:/usr/home/kevans# jail -c name=boo.foo path=/ command=/bin/sh

boo.foo# cpuset -g
pid -1 mask: 0, 1, 2
pid -1 domain policy: first-touch mask: 0
boo.foo# cpuset -c
boo.foo# cpuset -gi
pid -1 cpuset id: 5

boo# cpuset -l 0,1 -s 5

boo.foo# cpuset -g
pid -1 mask: 0, 1
pid -1 domain policy: first-touch mask: 0
```

So every jail can modify a subordinate jail's root, but not its own root, all the way up to prison0 root. root can restrict a jail to 1,2 or widen it back to 1,2,3 and that jail can delegate a subset of those to child jails.
Comment 5 Kyle Evans freebsd_committer freebsd_triage 2020-12-19 03:32:34 UTC
base r368779 alleviates this; unfortunately I had forgotten to tag this PR in it.
Comment 6 Luca Pizzamiglio freebsd_committer freebsd_triage 2021-01-14 20:51:09 UTC
I've just tested the expansion of a jail cpuset on CURRENT and it works.
11.x is not affected, the bug has been introduced when cpuset was reworked to manage memory domains

Have you already merged in 12-STABLE as well?
If yes, we can close this PR
Comment 7 Kyle Evans freebsd_committer freebsd_triage 2021-01-14 20:55:18 UTC
Ah, indeed, sorry; I seem to have merged it in 24a8ea4df3426dfce2896e265eb3e0206aa33a21.

Thanks!