Bug 253724

Summary: FreeBSD 13.0-BETA3: jail: cpuset: setaffinity: Resource deadlock avoided
Product: Base System Reporter: rashey
Component: kernAssignee: Kyle Evans <kevans>
Status: Closed FIXED    
Severity: Affects Only Me CC: bugs, chris, kevans
Priority: ---    
Version: 13.0-STABLE   
Hardware: amd64   
OS: Any   
URL: https://reviews.freebsd.org/D28952

Description rashey 2021-02-20 17:47:21 UTC
Hi,
I have upgraded FreeBSD from 12.1-RELEASE to 13.0-BETA3 (fresh install) and I can't assign different CPU cores to host userland and jail at the same time anymore.

# cat /boot/loader.conf
init_script="/etc/init.sh"

# cat /etc/init.sh
#!/bin/sh
cpuset -C -c -l 0 -p 1

# cat /etc/jail.conf
path = "/usr/jail/${name}";
exec.clean;
exec.start      = "sh /etc/rc";
exec.poststart  = "cpuset -l 1 -j ${name}";
exec.stop       = "sh /etc/rc.shutdown jail";
host.hostname   = "${name}";
mount.devfs;
test {
}

# cat /usr/jail/test/etc/rc.conf
dumpdev="NO"
cron_flags="-J 30"
sendmail_enable="NONE"
syslogd_flags="-ss"

# service jail onestart
Starting jails:test: created
ELF ldconfig path: /lib /usr/lib /usr/lib/compat
32-bit compatibility ldconfig path: /usr/lib32
Updating motd:.
Creating and/or trimming log files.
Clearing /tmp (X related).
Updating /var/run/os-release done.
Starting syslogd.
/etc/rc: WARNING: failed to start syslogd
Starting cron.

Sat Feb 20 18:40:41 CET 2021
cpuset: setaffinity: Resource deadlock avoided
jail: test: cpuset -l 1 -j test: failed
test: removed
.
Comment 1 Kyle Evans freebsd_committer freebsd_triage 2021-02-26 22:37:33 UTC
Sorry about that; I think this is my preferred method to solve it: https://reviews.freebsd.org/D28952

The main need that I have is preventing unprivileged users who are restricted to a subset of available CPUs from bypassing that restriction by attaching (allowed by MAC policy) to a jail with a wider mask. The patch above restores the system root's ability to administer such a setup as yours, and allows the previous behavior entirely (i.e. unprivileged users) with a MAC policy.
Comment 2 Kyle Evans freebsd_committer freebsd_triage 2021-02-27 18:38:14 UTC
You could workaround this initial problem by making your exec.poststart an exec.created, but you would need D28952 on top of that for jail creation to succeed as the next step won't be able to attach without widening its mask pre-attach. exec.created is debatably the more appropriate place for resource-control type stuff anyways, though it doesn't matter much at all for cpuset.

We're still discussing in that review and elsewhere whether this can be fixed quickly in a low-impact way besides reverting the original change; we should arrive at something before -rc1.
Comment 3 commit-hook freebsd_committer freebsd_triage 2021-03-01 18:38:58 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=60c4ec806dfd0f79edf8ca3abc04bbb69c0418f7

commit 60c4ec806dfd0f79edf8ca3abc04bbb69c0418f7
Author:     Kyle Evans <kevans@FreeBSD.org>
AuthorDate: 2021-02-26 21:46:47 +0000
Commit:     Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2021-03-01 18:38:31 +0000

    jail: allow root to implicitly widen its cpuset to attach

    The default behavior for attaching processes to jails is that the jail's
    cpuset augments the attaching processes, so that it cannot be used to
    escalate a user's ability to take advantage of more CPUs than the
    administrator wanted them to.

    This is problematic when root needs to manage jails that have disjoint
    sets with whatever process is attaching, as this would otherwise result
    in a deadlock. Therefore, if we did not have an appropriate common
    subset of cpus/domains for our new policy, we now allow the process to
    simply take on the jail set *if* it has the privilege to widen its mask
    anyways.

    With the new logic, root can still usefully cpuset a process that
    attaches to a jail with the desire of maintaining the set it was given
    pre-attachment while still retaining the ability to manage child jails
    without jumping through hoops.

    A test has been added to demonstrate the issue; cpuset of a process
    down to just the first CPU and attempting to attach to a jail without
    access to any of the same CPUs previously resulted in EDEADLK and now
    results in taking on the jail's mask for privileged users.

    PR:             253724
    Reviewed by:    jamie (also discussed with)
    MFC after:      3 days
    Differential Revision:  https://reviews.freebsd.org/D28952

 lib/libc/tests/sys/cpuset_test.c | 203 ++++++++++++++++++++++++++++++++++++++-
 sys/kern/kern_cpuset.c           |   8 ++
 2 files changed, 210 insertions(+), 1 deletion(-)
Comment 4 Kyle Evans freebsd_committer freebsd_triage 2021-03-01 19:11:21 UTC
The above-referenced commit will fix it, with the caveat that you'll need to move the cpuset invocation from exec.poststart to exec.created so that it's in effect when /etc/rc gets invoked -- this should be portable across all releases, and we'll refine jail(8) further.

I'll MFC the above in 3 days and get it in to 13.0.
Comment 5 commit-hook freebsd_committer freebsd_triage 2021-03-04 02:04:52 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c4ccb6d1be1f00ebcda9e83f06db55f9d6c152ac

commit c4ccb6d1be1f00ebcda9e83f06db55f9d6c152ac
Author:     Kyle Evans <kevans@FreeBSD.org>
AuthorDate: 2021-02-26 21:46:47 +0000
Commit:     Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2021-03-04 02:04:10 +0000

    jail: allow root to implicitly widen its cpuset to attach

    The default behavior for attaching processes to jails is that the jail's
    cpuset augments the attaching processes, so that it cannot be used to
    escalate a user's ability to take advantage of more CPUs than the
    administrator wanted them to.

    This is problematic when root needs to manage jails that have disjoint
    sets with whatever process is attaching, as this would otherwise result
    in a deadlock. Therefore, if we did not have an appropriate common
    subset of cpus/domains for our new policy, we now allow the process to
    simply take on the jail set *if* it has the privilege to widen its mask
    anyways.

    With the new logic, root can still usefully cpuset a process that
    attaches to a jail with the desire of maintaining the set it was given
    pre-attachment while still retaining the ability to manage child jails
    without jumping through hoops.

    A test has been added to demonstrate the issue; cpuset of a process
    down to just the first CPU and attempting to attach to a jail without
    access to any of the same CPUs previously resulted in EDEADLK and now
    results in taking on the jail's mask for privileged users.

    PR:             253724

    (cherry picked from commit 60c4ec806dfd0f79edf8ca3abc04bbb69c0418f7)

 lib/libc/tests/sys/cpuset_test.c | 203 ++++++++++++++++++++++++++++++++++++++-
 sys/kern/kern_cpuset.c           |   8 ++
 2 files changed, 210 insertions(+), 1 deletion(-)
Comment 6 commit-hook freebsd_committer freebsd_triage 2021-03-04 02:42:59 UTC
A commit in branch releng/13.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=5fdc5c5dbac400a2ff05820ba8a63cdec8603c92

commit 5fdc5c5dbac400a2ff05820ba8a63cdec8603c92
Author:     Kyle Evans <kevans@FreeBSD.org>
AuthorDate: 2021-02-26 21:46:47 +0000
Commit:     Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2021-03-04 02:41:48 +0000

    jail: allow root to implicitly widen its cpuset to attach

    The default behavior for attaching processes to jails is that the jail's
    cpuset augments the attaching processes, so that it cannot be used to
    escalate a user's ability to take advantage of more CPUs than the
    administrator wanted them to.

    This is problematic when root needs to manage jails that have disjoint
    sets with whatever process is attaching, as this would otherwise result
    in a deadlock. Therefore, if we did not have an appropriate common
    subset of cpus/domains for our new policy, we now allow the process to
    simply take on the jail set *if* it has the privilege to widen its mask
    anyways.

    With the new logic, root can still usefully cpuset a process that
    attaches to a jail with the desire of maintaining the set it was given
    pre-attachment while still retaining the ability to manage child jails
    without jumping through hoops.

    A test has been added to demonstrate the issue; cpuset of a process
    down to just the first CPU and attempting to attach to a jail without
    access to any of the same CPUs previously resulted in EDEADLK and now
    results in taking on the jail's mask for privileged users.

    PR:             253724
    Approved by:    re (gjb)

    (cherry picked from commit 60c4ec806dfd0f79edf8ca3abc04bbb69c0418f7)
    (cherry picked from commit c4ccb6d1be1f00ebcda9e83f06db55f9d6c152ac)

 lib/libc/tests/sys/cpuset_test.c | 203 ++++++++++++++++++++++++++++++++++++++-
 sys/kern/kern_cpuset.c           |   8 ++
 2 files changed, 210 insertions(+), 1 deletion(-)
Comment 7 Kyle Evans freebsd_committer freebsd_triage 2021-03-04 03:10:13 UTC
Leaving this open a little longer; the first half went in time for -rc1 that fixes this setup if you cpuset it before running rc (exec.created). There's still a second change in progress that restores it (mostly) by effectively using the jail's initial cpuset for spawning off rc: https://reviews.freebsd.org/D29008
Comment 8 commit-hook freebsd_committer freebsd_triage 2021-03-04 19:29:47 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=466df976babed65f8a8de9e36d7f016a444609af

commit 466df976babed65f8a8de9e36d7f016a444609af
Author:     Kyle Evans <kevans@FreeBSD.org>
AuthorDate: 2021-03-04 19:28:53 +0000
Commit:     Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2021-03-04 19:28:53 +0000

    jail(8): reset to root cpuset before attaching to run commands

    Recent changes have made it such that attaching to a jail will augment
    the attaching process' cpu mask with the jail's cpuset. While this is
    convenient for allowing the administrator to cpuset arbitrary programs
    that will attach to a jail, this is decidedly not convenient for
    executing long-running daemons during jail creation.

    This change inserts a reset of the process cpuset to the root cpuset
    between the fork and attach to execute a command. This allows commands
    executed to have the widest mask possible, and the administrator can
    cpuset(1) it back down inside the jail as needed.

    With this applied, one should be able to change a jail's cpuset at
    exec.poststart in addition to exec.created.  The former was made
    difficult if jail(8) itself was running with a constrained set, as then
    some processes may have been spawned inside the jail with a non-root
    set.  The latter is the preferred option so that processes starting in
    the jail are constrained appropriately up front.

    Note that all system commands are still run with the process' initial
    cpuset applied.

    PR:             253724
    MFC after:      3 days
    Reviewed by:    jamie
    Differential Revision:  https://reviews.freebsd.org/D29008

 usr.sbin/jail/command.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)
Comment 9 commit-hook freebsd_committer freebsd_triage 2021-03-07 21:40:09 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=bdd61b6914f1f961b5f414b2d5cc623a5a829b89

commit bdd61b6914f1f961b5f414b2d5cc623a5a829b89
Author:     Kyle Evans <kevans@FreeBSD.org>
AuthorDate: 2021-03-04 19:28:53 +0000
Commit:     Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2021-03-07 21:39:29 +0000

    jail(8): reset to root cpuset before attaching to run commands

    Recent changes have made it such that attaching to a jail will augment
    the attaching process' cpu mask with the jail's cpuset. While this is
    convenient for allowing the administrator to cpuset arbitrary programs
    that will attach to a jail, this is decidedly not convenient for
    executing long-running daemons during jail creation.

    This change inserts a reset of the process cpuset to the root cpuset
    between the fork and attach to execute a command. This allows commands
    executed to have the widest mask possible, and the administrator can
    cpuset(1) it back down inside the jail as needed.

    With this applied, one should be able to change a jail's cpuset at
    exec.poststart in addition to exec.created.  The former was made
    difficult if jail(8) itself was running with a constrained set, as then
    some processes may have been spawned inside the jail with a non-root
    set.  The latter is the preferred option so that processes starting in
    the jail are constrained appropriately up front.

    Note that all system commands are still run with the process' initial
    cpuset applied.

    PR:             253724

    (cherry picked from commit 466df976babed65f8a8de9e36d7f016a444609af)

 usr.sbin/jail/command.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)
Comment 10 commit-hook freebsd_committer freebsd_triage 2021-03-11 19:25:24 UTC
A commit in branch releng/13.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=7a1f6858dabfdb24d9f58a52fd5e6e1fe0ceead1

commit 7a1f6858dabfdb24d9f58a52fd5e6e1fe0ceead1
Author:     Kyle Evans <kevans@FreeBSD.org>
AuthorDate: 2021-03-04 19:28:53 +0000
Commit:     Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2021-03-11 19:24:28 +0000

    jail(8): reset to root cpuset before attaching to run commands

    Recent changes have made it such that attaching to a jail will augment
    the attaching process' cpu mask with the jail's cpuset. While this is
    convenient for allowing the administrator to cpuset arbitrary programs
    that will attach to a jail, this is decidedly not convenient for
    executing long-running daemons during jail creation.

    This change inserts a reset of the process cpuset to the root cpuset
    between the fork and attach to execute a command. This allows commands
    executed to have the widest mask possible, and the administrator can
    cpuset(1) it back down inside the jail as needed.

    With this applied, one should be able to change a jail's cpuset at
    exec.poststart in addition to exec.created.  The former was made
    difficult if jail(8) itself was running with a constrained set, as then
    some processes may have been spawned inside the jail with a non-root
    set.  The latter is the preferred option so that processes starting in
    the jail are constrained appropriately up front.

    Note that all system commands are still run with the process' initial
    cpuset applied.

    PR:             253724
    Approved by:    re (gjb)

    (cherry picked from commit 466df976babed65f8a8de9e36d7f016a444609af)
    (cherry picked from commit bdd61b6914f1f961b5f414b2d5cc623a5a829b89)

 usr.sbin/jail/command.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)
Comment 11 Kyle Evans freebsd_committer freebsd_triage 2021-03-11 19:27:26 UTC
Thanks for the report!