Bug 240687 - jail cpuset masks cannot be expanded
Summary: jail cpuset masks cannot be expanded
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Luca Pizzamiglio
URL: https://reviews.freebsd.org/D21890
Depends on:
Reported: 2019-09-19 14:56 UTC by John Baldwin
Modified: 2020-11-21 21:20 UTC (History)
5 users (show)

See Also:

git(1) diff against base (1.43 KB, patch)
2020-11-21 21:20 UTC, Kyle Evans
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description John Baldwin freebsd_committer freebsd_triage 2019-09-19 14:56:08 UTC
Suppose one creates a simple jail:

jail -c name=foo command=/bin/sh

It will inherit the default cpuset from the parent jail.  You can use cpuset -j to shrink this set:

% cpuset -g -j 1
jail 1 mask: 0, 1, 2, 3, 4, 5, 6, 7

% cpuset -g -j 1 -r
jail 1 mask: 0, 1, 2, 3, 4, 5, 6, 7

% cpuset -j 1 -l 0-3

% cpuset -g -j 1
jail 1 mask: 0, 1, 2, 3

However, once you've shrunk the set, you can never expand it.  The reason is that the jail set is its own root, so the check against the 'root' mask in cpuset_modify() fails with EINVAL.  I think this is perhaps not the intended behavior.  I think that when setting the cpuset of a jail you want to apply the check against the parent jail's mask, not the jail's own mask.

In particular, this prevents using cpuset -j to dynamically manage the CPUs available to jails.  The alternative is to leave the jails unrestricted and manage the processes in the jail (or create dedicated, named cpusets for each jail and manage those) which is not as convenient for tools operating at the abstraction level of a jail.

One possibility might be to have cpuset_getroot() always skip over the passed in set to its parent at least once before checking for the ROOT flag (or fix callers to pass set->cs_parent instead of set), but I haven't looked at what other implications that might have.
Comment 1 Miroslav Lachman 2019-10-04 13:48:16 UTC
(In reply to John Baldwin from comment #0)
I am not sure if I understand it correctly. Is it problem with nested jails or from the host it-self?

I am running jails with cpuset for many years and it worked for me:

# cpuset -j 3 -g
jail 3 mask: 3, 4

# cpuset -j 3 -l 3-5

# cpuset -j 3 -g
jail 3 mask: 3, 4, 5

It is on FreeBSD 11.3, no nested jails.
Comment 2 Luca Pizzamiglio freebsd_committer 2019-10-04 16:06:22 UTC
(In reply to Miroslav Lachman from comment #1)

It doesn't matter if nested or not, jails in 12.X or CURRENT cannot extend their cpuset.
However, you're right, in FreeBSD 11.X it works quite well.
Comment 3 Miroslav Lachman 2019-10-04 17:11:24 UTC
(In reply to Luca Pizzamiglio from comment #2)
Ah, thank you for the clarification. I hope this regression will be fixed before we plan to upgrade to 12.x.
Comment 4 Kyle Evans freebsd_committer 2020-11-21 21:20:13 UTC
Created attachment 219867 [details]
git(1) diff against base

I think the attached is a recommendation that I'm happy with; you can't globally let cpuset_getbase() find the root's root, but you can fix the restrictions in cpuset_modify(). With this:

(viper = host, boo = jail, boo.foo = jail nested inside boo)
boo# cpuset -gi
pid -1 cpuset id: 4
boo# cpuset -g
pid -1 mask: 0, 1, 2, 3
pid -1 domain policy: first-touch mask: 0
boo# cpuset -l 0,1,2 -s 4
cpuset: setaffinity: Operation not permitted

root@viper:/usr/home/kevans# cpuset -l 0,1,2 -s 4

boo# cpuset -g
pid -1 mask: 0, 1, 2
pid -1 domain policy: first-touch mask: 0

root@viper:/usr/home/kevans# jail -c name=boo.foo path=/ command=/bin/sh

boo.foo# cpuset -g
pid -1 mask: 0, 1, 2
pid -1 domain policy: first-touch mask: 0
boo.foo# cpuset -c
boo.foo# cpuset -gi
pid -1 cpuset id: 5

boo# cpuset -l 0,1 -s 5

boo.foo# cpuset -g
pid -1 mask: 0, 1
pid -1 domain policy: first-touch mask: 0

So every jail can modify a subordinate jail's root, but not its own root, all the way up to prison0 root. root can restrict a jail to 1,2 or widen it back to 1,2,3 and that jail can delegate a subset of those to child jails.