Bug 231773 - Nested jails: "IPv4 addresses clash"
Summary: Nested jails: "IPv4 addresses clash"
Status: Closed Works As Intended
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Jamie Gritton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-28 07:50 UTC by Kristof Provost
Modified: 2023-01-30 01:41 UTC (History)
9 users (show)

See Also:


Attachments
Automated test (2.00 KB, text/plain)
2018-09-28 18:40 UTC, Kristof Provost
no flags Details
Patch to fix the original bug (i.e. to keep the reported "bug" broken) (1.06 KB, patch)
2018-09-30 03:23 UTC, Jamie Gritton
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kristof Provost freebsd_committer freebsd_triage 2018-09-28 07:50:45 UTC
I recently upgraded one of my stable/11 machines to head (so what will be stable/12 soon), and ran into an issue with my nested jails setup.

I run standard (i.e. non-vnet) jails, by assigning e.g. 172.16.4.2 to the parent jail, and starting a jail with that IP address from within that jail (basically, so I can delegate a jailed setup to other people).

That used to work on stable/11, but now it failed with ‘IPv4 addresses clash’.
To the extent that I understand the relevant code it looks like we try to verify that no other jail uses the address we’re trying to assign to the new jail.

I think this block tries to ensure we start at either the host system or the vnet jail that’s hosting us. I suspect that’s just wrong, because we don’t do that if VIMAGE is not enabled (and there’s no need either, because this check will already have been done for the parent jail).

    #ifdef VIMAGE
                for (; tppr != &prison0; tppr = tppr->pr_parent)
                        if (tppr->pr_flags & PR_VNET)
                                break;
    #endif

I can work around the problem by resetting 'tppr = ppr;' just before the FOREACH_PRISON_DESCENDANT() loop.
(The IPv6 code has exactly the same problem.)

Presumably the trigger for this is the enabling of VIMAGE in CURRENT.
Comment 1 Lukasz Wasikowski 2018-09-28 17:58:06 UTC
I'm also using the same IP for many jails, it's convenient. I hope this feature won't be removed.
Comment 2 Kristof Provost freebsd_committer freebsd_triage 2018-09-28 18:40:24 UTC
Created attachment 197577 [details]
Automated test

I wrote up a quick automated test for this problem.
It just checks that we can start (well, that jail doesn't return errors) a nested jail.
Comment 3 Kristof Provost freebsd_committer freebsd_triage 2018-09-28 18:42:29 UTC
While writing this test case I also noticed something else unexpected: trying to terminate the nested jail from the host (prison0) results in 'jail: "nestedjail" not found'.

Doing so from within the base jail (which started "nestedjail") does succeed.
Comment 4 Jamie Gritton freebsd_committer freebsd_triage 2018-09-28 22:42:59 UTC
(In reply to Kristof Provost from comment #3)
Were you trying to remove "basejail.nestedjail" or just "nestedjail"?  You need the fully qualified name to operate on the nested jail from the system level.
Comment 5 Kristof Provost freebsd_committer freebsd_triage 2018-09-28 22:45:00 UTC
(In reply to Jamie Gritton from comment #4)
Just "nestedjail". That explains it, thanks.
Comment 6 Bjoern A. Zeeb freebsd_committer freebsd_triage 2018-09-29 21:11:33 UTC
(In reply to Kristof Provost from comment #0)

As per jail(8) man page:

  ip4.addr
             A list of IPv4 addresses assigned to the jail.  If this is set,
             the jail is restricted to using only these addresses.  Any
             attempts to use other addresses fail, and attempts to use
             wildcard addresses silently use the jailed address instead.  For
             IPv4 the first address given will be used as the source address
             when source address selection on unbound sockets cannot find a
             better match.

             !!!!!!!!!

             It is only possible to start multiple jails with
             the same IP address if none of the jails has more than this
             single overlapping IP address assigned to itself.

             !!!!!!!!!


Everything else is unfortunately a bug and cannot work deterministically
and might lead to all kinds of problems.   Robert and I spent a lot of time
in before multi-IP jails went in and the conclusion was "you cannot get that
right".

It had also been "documented" in the original jail code:
https://svnweb.freebsd.org/base/head/sys/kern/kern_jail.c?annotate=185435#l534
https://svnweb.freebsd.org/base/head/sys/kern/kern_jail.c?annotate=185435#l209

Now for nested jails that brings a problem of its own as I can see "delegate
10 addresses to customer A" and "customer A delegates 1 address to
customers {m,n,o,p,..} himself starting a nested jail;  sounds great in theory,
in practise its a recipe for disaster.   Just a vnet jail as base jail and then
single-IP jails for the nested ones probably the best idea?

No matter what people had done in the past in any overlapping services cases,
they got indeterministic, random behaviour.

Given people also use jails as security compartments fixings this to the documented
behaviour is the only solution I can see; sorry.
Comment 7 Bjoern A. Zeeb freebsd_committer freebsd_triage 2018-09-29 23:53:23 UTC
(In reply to Bjoern A. Zeeb from comment #6)

Hmm, I was thinking some more about this.

What definitively should not work is:

basejail:  192.0.2.2,192.0.2.3,192.0.2.4,192.0.2.5
basejail.foo:  192.0.2.3,192.0.2.4

Now both jails have more than one address.

I have to think some more about nested jails.

Let's star flat..

With just 1st level jails (as in the old days without nested jails):
jailA:  192.0.2.1
jailB:  192.0.2.1

is fine.

jailA:   192.0.2.1,192.0.2.2
jailB:   192.0.2.1,192.0.2.33
is not fine.

jailA:  192.0.2.1
jailB:  192.0.2.1,192.0.2.33
is not fine either I believe.

I think the conclusion is if jailA is a child of jailB it's equally not fine as for the PCB it's a flat space.

Jamie can you go back and read the old pre-nested jails code I previously cited and independently confirm that my memory serves me right?
Otherwise we might have to download any of 7.2/7.3/7.4-R and check the behaviour there to be sure.
Comment 8 Jamie Gritton freebsd_committer freebsd_triage 2018-09-30 03:20:04 UTC
(In reply to Bjoern A. Zeeb from comment #7)
That's right - duplicates are allowed only if both jails have exactly one IP address.  In particular, there code contained: ((ip4s > 0 && tpr->pr_ip4s > 1) || (ip4s > 1 && tpr->pr_ip4s > 0)).  While the specifics of the test changes with the hierarchical jail code, the fact remained that only single-IP jails could have the same address as other single-IP jails.

Note that while we don't allow a child jail to have a partial match of IP addresses, we do allow it to have a full match (by not specifying ip4.addr).  Otherwise the only choice for a child jail would be to have no IP addresses.

Why are single-address jails allowed to be duplicates, and why only of other single-address jails?  I understand the general nature of the problem, but the discussion was before my time, and these subtleties are beyond me.
Comment 9 Jamie Gritton freebsd_committer freebsd_triage 2018-09-30 03:23:26 UTC
Created attachment 197613 [details]
Patch to fix the original bug (i.e. to keep the reported "bug" broken)

Here's the patch to make the non-VIMAGE case behave the same as the VIMAGE case.  This is how I should have made the original VIMAGE change way back when.
Comment 10 Bjoern A. Zeeb freebsd_committer freebsd_triage 2018-09-30 06:23:54 UTC
(In reply to Jamie Gritton from comment #8)

For single-IP jails laddr operations of inaddr_any are automagically translated to the jail's single IP address; hence there can be no duplicates and thus no undefined behaviour.  Multi-IP jails will bind to inaddr_any and if you have two jails with overlapping IP ranges your packets/connection will go to one or the other in indeterministic ways.
Comment 11 Kristof Provost freebsd_committer freebsd_triage 2018-09-30 08:58:19 UTC
(In reply to Bjoern A. Zeeb from comment #6)
Okay, thanks for the clarification. I can probably re-do my config around vnet jails.
Comment 12 Jamie Gritton freebsd_committer freebsd_triage 2018-09-30 14:40:05 UTC
(In reply to Bjoern A. Zeeb from comment #10)

The remapping if single-IP bindings from inaddr_any to the address does indeed many be wonder about your previous note:

> jailA:  192.0.2.1
> jailB:  192.0.2.1,192.0.2.33
> is not fine either I believe.
> 
> I think the conclusion is if jailA is a child of jailB it's equally not fine as for the PCB it's a flat space.

This makes me think it would be fine if jailA is a child of jailB, at least as far as inaddr_any binding goes.  It would be analogous to the case of jailA under the base system.

But it could still be a problem as far as localhost remapping goes.  Of course, any child jail is going to have such a problem when a subset of parent addresses is disallowed; the two jails will always try to bind localhost to the first address in both their lists.
Comment 13 Jamie Gritton freebsd_committer freebsd_triage 2018-10-07 03:07:56 UTC
I've committed r339211 as a returns of this bug.  The commit doesn't mention the PR, because I didn't actually *fix* the problem, so much as I formalized the (to many people) new way of doing things as the in-fact-proper *old* way of doing things.

It does at least clear up the discrepancy between VIMAGE and non-VIMAGE jails, but the 12 release will give a few people an unwelcome surprise.  Hopefully a very few people.
Comment 14 Bryan Drewery freebsd_committer freebsd_triage 2019-03-04 22:04:45 UTC
I think https://github.com/freebsd/poudriere/issues/657 is the same problem.
All Poudriere is trying to do is pass 127.0.0.1 down to a nested jail.
I'm not sure where the regression is here but it used to work and now
doesn't.
Comment 15 Bryan Drewery freebsd_committer freebsd_triage 2019-03-04 22:59:11 UTC
What makes no sense here is that the restriction is not applied evenly with
nested jails.

The restriction quoted here would mean that *any jail must have the same IPs as the host/prison0*. If we properly exempt that then why do we not exempt nested jails in the same way? PRISON0->PRISON1 shouldn't be any more special than PRISON1->PRISON2.
Comment 16 Bryan Drewery freebsd_committer freebsd_triage 2019-03-04 22:59:45 UTC
(In reply to Bryan Drewery from comment #15)
> What makes no sense here is that the restriction is not applied evenly with
> nested jails.
> 
> The restriction quoted here would mean that *any jail must have the same IPs
> as the host/prison0*. If we properly exempt that then why do we not exempt
> nested jails in the same way? PRISON0->PRISON1 shouldn't be any more special
> than PRISON1->PRISON2.

Where PRISON2 is nested in PRISON1.