Bug 250823 - linuxulator mkdir in jail breaks mount
Summary: linuxulator mkdir in jail breaks mount
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-emulation (Nobody)
URL:
Keywords:
Depends on:
Blocks: 247219
  Show dependency treegraph
 
Reported: 2020-11-03 00:53 UTC by Russell Allen
Modified: 2023-08-23 05:15 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Russell Allen 2020-11-03 00:53:44 UTC
I'm running 

    FreeBSD 12.1-RELEASE-p10 FreeBSD 12.1-RELEASE-p10 GENERIC  amd64

which is hosting a jail called 'j1' running

    FreeBSD j1 12.1-RELEASE-p10 FreeBSD 12.1-RELEASE-p10 GENERIC  i386

Inside the jail, the Centos Linux packages are installed.

A host directory is mounted into the jail:

    mount_nullfs /${BASE}/ourself-manager /${BASE}/j1/self

I can use a linux binary to check the mounted directory:

    jexec j1 /compat/linux/bin/bash -c "ls /self"

    (shows big list of files)

However if I attempt to mkdir /self within a linux app, instead of returning EEXIST it breaks the mount:

    jexec j1 /compat/linux/bin/bash -c "mkdir /self"
    jexec j1 /compat/linux/bin/bash -c "ls /self"

    (shows empty directory)

Once this has happened, no linux app within the jail can see the contents of /self, nor can freebsd apps started by a linux app. However jailed freebsd apps started from outside still work, ie

    jexec j1 /usr/local/bin/bash -c "ls /self"

    (shows big list of files)

If I set enforce_statfs = 0; in my jail.conf, calling mount within the jail shows /self as still being mounted.

This also occurs where /self is a mounted ZFS dataset instead of a null_fs mount, and it isn't bash specific - any linux app seems to have the same result.
Comment 1 Conrad Meyer freebsd_committer freebsd_triage 2020-11-03 01:26:03 UTC
I wonder if the same thing happens with a nullfs mount inside a non-linux jail?  It could be linuxemul, except the mkdir() syscall is pretty basic; so my suspicion is that this has more to do with the jail environment and/or nullfs rather than linuxemul.
Comment 2 Russell Allen 2020-11-03 02:04:41 UTC
When I try a freebsd binary within the jail:

    jexec j1 /usr/local/bin/bash -c "mkdir /self"

    (returns mkdir: /self: File exists)

as expected.

Also when I swap out the null_fs mount for another kind of mount, like a zfs dataset mounted at /self, I get the same behavior so I don't think it is a null_fs issue.
Comment 3 Conrad Meyer freebsd_committer freebsd_triage 2020-11-03 17:55:00 UTC
Thanks!
Comment 4 Conrad Meyer freebsd_committer freebsd_triage 2020-11-03 18:07:57 UTC
Ok, I wonder if what's happening here relates to the linuxemul automatic-mount-of-/compat/linux-at-/ path emulation goop.  In which case, your mount is still at ${BASE}/j1/self, but mkdir created "/compat/linux/self" in the linuxemul environment.  That second directory is shadowing the first for linux binaries.  (That's my guess, anyway.)

This seems like a silly thing we do in linuxemul to avoid requiring a chroot to run linux binaries.

Try disabling the compat.linux.use_emul_path sysctl and see if the shadowing disappears:

    sysctl compat.linux.use_emul_path=0
Comment 5 Conrad Meyer freebsd_committer freebsd_triage 2020-11-03 18:10:50 UTC
(Separately, yes, it seems like creating an overlay directory via this path emulation should not be possible, but hopefully the workaround resolves the immediate issue.)
Comment 6 Russell Allen 2020-11-03 23:44:42 UTC
That setting doesn't seem to work:

    sysctl compat.linux.use_emul_path=0
    --- sysctl: unknown oid 'compat.linux.use_emul_path'

But your diagnosis seems right, because if I create /compat/linux/self it shadows /self for the linux binaries, ie

    jexec j1 bash -c "mkdir /compat/linux/self"
    --- (returns nothing)

    jexec j1 /compat/linux/bin/bash -c "ls /self"
    --- (returns empty dir)

For the moment I can mount twice - at ${BASE}/j1/self and ${BASE}/j1/compat/linux/self. It's a bit cludgy but seems to solve the issue.
Comment 7 Conrad Meyer freebsd_committer freebsd_triage 2020-11-04 00:03:35 UTC
The sysctl might be new in CURRENT, but it sounds like we've narrowed down the root cause.  Thanks.
Comment 8 Mark Linimon freebsd_committer freebsd_triage 2023-08-23 05:15:36 UTC
Canonicalize assignment.