Bug 210813

Summary: vimage ALPHA6 [ipfw will not kldload]
Product: Base System Reporter: Joe Barbish <qjail1>
Component: kernAssignee: Bjoern A. Zeeb <bz>
Status: Closed Overcome By Events    
Severity: Affects Only Me CC: bz
Priority: --- Keywords: vimage
Version: CURRENT   
Hardware: Any   
OS: Any   

Description Joe Barbish 2016-07-04 12:58:39 UTC
Host running ipfilter. Vimage jail trying to kldload ipfw when "service ipfw start" command issued from inside of jail. Also get nd6_dad_timer error message.



epair2a: Ethernet address: 02:c1:a8:00:06:0a
epair2b: Ethernet address: 02:c1:a8:00:07:0b
kldload: can't load ipfw: Operation not permitted
/etc/rc.d/ipfw: WARNING: Unable to load kernel module ipfw

/root >Jul  4 04:38:30 kernel: nd6_dad_timer: cancel DAD on epair2a be
cause of ND6_IFF_IFDISABLED.
Comment 1 Bjoern A. Zeeb freebsd_committer freebsd_triage 2016-07-05 17:08:16 UTC
Can you please try to compile ipfilter into the kernel and see if all your problems go away?   It seems ipfilter (like multicast) is exhausting the per-VNET module data area.
Comment 2 Joe Barbish 2016-07-06 00:52:01 UTC
Let me state here what I think you are trying to say. That the host ipfilter firewall is being kldloaded when the ipfilter statements are read in the hosts rc.conf file. This is normal behavior. 

You think that compiling ipfilter into the kernel along with vimage will allow the host to run ipfilter and enable vnet/vimage jails to kldload ipfw, or pf, when there statements are read in rc.conf file in the vnet/vimage jail. 

Whats being tested here is whether a vnet/image jail can run a different firewall then what the host is running. Testing has demonstrated that this is not possible so far and is the bases of this pr.

In the case of ipfw, testing has shown that compiling ipfw in the kernel will allow the host and vnet/vimage jails to run the ipfw firewall. What is desired is ipfw being kldloaded by the host and the vnet/vimage jails to also use that host kldloaded ipfw module. This is the next test I will run.

Requiring the desired firewall to be compiled into the kernel along with vimage is not a solution.
Comment 3 Bjoern A. Zeeb freebsd_committer freebsd_triage 2016-07-06 09:09:47 UTC
No, what I am saying is that what is compiled into the kernel we know its global space needed to duplicate state with each VNET instance and thus it is a "static constant at compile time".

For modules we need to reserve space as on load time (either on boot if loaded by the loader, or in whatever way using kldload after init is run) we have to determine the amount of memory needed for the state of each VNET.   We cannot preserve endless amount of memory for that and for the last few years this was set to two pages per VNET (plus a tiny bit of roundup memory depending on the static global state).  If the amount of  virtualized space needed by all modules for their global state exceeds this they will (if properly written) fail to load, or they will be loaded but not function (ideally the load should fail).

Thus compiling modules into the kernel, e.g., firewalls or multicast routing or virtual network devices which tend to need quite a bit of virtualized global state space, will help to avoid that general problem.

Unfortunately there is no way to automatically increase that per-module region at run time.  I was pondering adding a loader tunable so people could enlarge it but that turned out to be not trivial either.  We could add a kernel option so people could override the default w/o patching vnet.c.

See https://svnweb.freebsd.org/base/head/sys/net/vnet.c?annotate=302054#l165

The problem with ipfilter seems to be that on amd64 without checking #ifdefs the size of ipf_main_softc_t alone is 2280.  While that should easily fit in combination with other modules it might not anymore, hence my question to try to compile it into the kernel and see if things just work then before we are going off into a long debug session for possible other causes of error.
Comment 4 Bjoern A. Zeeb freebsd_committer freebsd_triage 2016-07-06 09:12:41 UTC
To address your other comment:

if the firewalls are compiled into the kernel (or the modules get properly loaded), then each VNET instance, as the base system, will be able to use one or more (any combination) of provided firewalls.

One can then use pf in one VNET, ipfilter in another, ipfw in a third and all three together in a forth in one desires (as one could for a plain system if it was just a GENERIC kernel).
Comment 5 Joe Barbish 2016-07-07 21:52:38 UTC
Compiled the kernel with ipfilter compiled in. 

test 1.
Have the ipfilter statements in the host rc.conf commented out so host is not running any firewall at all.

Have ipfw statements in the vnet/vimage jail's rc.conf and when jail starts get the same messages as posted before except the nd6_dad_timer message does not happen.

kldload: can't load ipfw: Operation not permitted
/etc/rc.d/ipfw: WARNING: Unable to load kernel module ipfw


test 2.
Have ipfilter statements in the host rc.conf so host is running ipfilter firewall. 

Have ipfw statements in the vnet/vimage jail's rc.conf and when jail starts get the same messages as posted before except the nd6_dad_timer message does not happen.

kldload: can't load ipfw: Operation not permitted
/etc/rc.d/ipfw: WARNING: Unable to load kernel module ipfw
 

Compiling ipfilter in the kernel changed nothing.
Comment 6 Joe Barbish 2016-07-15 23:26:56 UTC
Compiled ipfirewall and vimage together.

options VIMAGE
options IPFIREWAL
options IPFIREWAL_NAT
options IPDIVERT
options LIBALIAS

My network is like this

Gateway host connected to public internet with LAN behind it.
On LAN is ALPHA6 box being used for testing vnet/vimage.

TEST #1
This ALPHA6 test box is running the generic kernel with ipfw statements in the hosts rc.conf. Only have 2 rules in ipfw.
ipfw add 010 allow all from any to any via lo0
ipfw add 010 allow log all from any to any via rl0

At boot time get msg ipfw rules loaded & ipfw logging enabled
ipfw show command shows those 2 rules
Issue ping 8.8.8.8 returns results, meaning box has network connection to public internet. The ipfw log shows the logged packets from the ping command.
This verifies that host generic and ipfw are working.

TEST #2
Everything is the same except this time booted the vimage kernel and vnet jail has ipfw statements in it's rc.conf. When vnet jail starts get msg that ipfw rules loaded & logging enabled.
Vnet jail console log has this msg.
Protect: Procctl: operation not permitted.
Logging into the started vnet jail and issuing ping 8.8.8.8 returns 0 packets received. The vnet jail ipfw log is empty. Host ipfw log shows ICMP packets out via epair1b which is the vnet jail.

TEST #3
Everything is the same except this time rebooted the ALPHA6 test box running kernel with vimage/ipfw compiled in. Host boot messages show msg that ipfw rules loaded & logging enabled. Host ping works and host ipfw log shows ICMP packets. When vnet jail starts get msg that ipfw rules loaded & logging enabled. Vnet jail console log has this msg.
Protect: Procct: operation not permitted.
Logging into the started vnet jail and issuing ping 8.8.8.8 returns 0 packets received. The vnet jail ipfw log is empty. Host ipfw log shows ICMP packets out via epair1b which is the vnet jail.

I have no idea what "Protect: Procctl: operation not permitted." means or if it may have any baring on what is happening here.

Now the host ipfw log is very interesting. I would think that when the vnet jail ping command is issued the host ipwf firewall should receive a packet via "IN" but we see via "out" instead.

On another subject. I see BETA1 is out. Have you made changes to vimage or any of the 3 firewalls that are missing form ALPHA6. Do I need to install BETA1 on my test box to test new vimage changes?
Comment 7 Bjoern A. Zeeb freebsd_committer freebsd_triage 2016-07-22 14:23:44 UTC
The "Protect: Procctl: operation not permitted."  is unrelated.  I assume you start sshd inside the vnet jail.  Se man 1 protect and man 2 procctl .


With regards to the ipfw it is unclear to me how you connect the vnet to the outside.   Is the epair bridged to rl0?  How does it get it's address.

What happens if you try to ping a host system IP address from the Vnet?  Does that work?   Can the Vnet ping it's default gateway?

Can you use tcpdump on the various (host interfaces) to follow all incoming/outgoing packets related to the vnet?   Start with the epair connected to the vnet and then try the physical interface (probably limit tcpdump to icmp to not log ssh traffic in case you are logged in remotely via the same interface).

Alos what happens if you start the base system the same way, and then start the vnet just without the ipfw firewall?   Do things work then?

Just trying to narrow down where the problem in your setup comes from.
Comment 8 Joe Barbish 2016-07-22 15:12:08 UTC
I do not login to the host system remotely. I have host console in front of me. The same is true for logging into the vnet jail. I enter jexec command on host console to log into vnet jail. So problem with "Protect: Procctl: operation not permitted."  is still open.

Yes the epair is bridged to rl0. As stated in test #1 of my previous post bridge/epair works because I can ping the public internet.

The conclusion is vnet/vimage works ok as long as the vnet jail does not try to start any one of the 3 firewalls. 

If your doing testing using the "service" script or the jail rc.d scripts this may be why you are getting different results. The jail rc.d scripts have know problems with vnet jails(8) jails. I do not use that old script system. I only use the jail(8) command for starting/stopping my jails with jail definitions in jail.conf format.

All your comment #7 question have already been answered by my previous post #6.

I am the qjail maintainer and used qjail to perform all the tests posted in #6. I have users who use qjail for vnet jails so its proved its function is valid. The single common item among all the qjail vnet users is none of them can get a firewall to run in a vnet jail. This is the same thing I am seeing with the updated vimage available in ALPHA6.

Maybe you need another pair of eyes to review your setup for testing vnet/vimage jails. I would be open to doing so.
Comment 9 Bjoern A. Zeeb freebsd_committer freebsd_triage 2016-07-22 15:30:37 UTC
root@:/ # sysctl -a | grep jailed
security.jail.jailed: 1
root@:/ # ping 192.168.5.1
PING 192.168.5.1 (192.168.5.1): 56 data bytes
64 bytes from 192.168.5.1: icmp_seq=0 ttl=64 time=0.408 ms
64 bytes from 192.168.5.1: icmp_seq=1 ttl=64 time=0.302 ms
64 bytes from 192.168.5.1: icmp_seq=2 ttl=64 time=0.312 ms
^C
--- 192.168.5.1 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.302/0.341/0.408/0.048 ms
root@:/ # ipfw show
00010  0    0 allow ip from any to any via lo0
00010  0    0 allow log ip from any to any via rl0
00010  0    0 allow log ip from any to any via epair0b
00010  7  600 allow log ip from any to any via epair0a
65535 90 8802 allow ip from any to any

Clearly my pings to my default gateway work when I do this.

Then also added your default rules to the base system (which had ipfw running with a default allow for quite a while already):
root@rabbit4:~ # ipfw show
00010     672     143534 allow log ip from any to any via igb0
00010       0          0 allow log ip from any to any via lo0
65535 5242947 1487787707 allow ip from any to any

bridged the epair to the physical interface in the base system.

root@rabbit4:~ # ifconfig bridge0
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 02:41:30:09:bd:00
        nd6 options=9<PERFORMNUD,IFDISABLED>
        groups: bridge
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        member: igb0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 1 priority 128 path cost 20000
        member: epair0b flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 7 priority 128 path cost 2000
root@rabbit4:~ # ifconfig epair0b
epair0b: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8<VLAN_MTU>
        ether 02:ff:c0:00:07:0b
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        groups: epair



Could you please go ahead and try to tcpdump all interfaces on the base system and inside the vnet (rl0, bridge0, both end of the epair) and show where you can see each packet and a possible reply.  It'll be essential to figure out where we lose it in your setup.
Comment 10 Joe Barbish 2016-07-23 17:03:29 UTC
Commands issued from the host

/root >ipfw show
00010   0     0 allow ip from any to any via lo0
00011   0     0 deny ip from 10.0.10.4 to any in via rl0
00012   0     0 allow log ip from any to any via rl0
00013   0     0 deny log ip from any to any
65535 276 27346 deny ip from any to any

/root >cat /var/log/security
host ipfw log file is empty

/root >ls /usr/jails
archive		sharedfs	v10		v30		v50
flavors		template	v20		v40


/root >qjail list

STA JID  NIC IP              Jailname
--- ---- --- --------------- --------------------------------------------------
DS  N/A  rl0 vnet|be|ipf     v10
DS  N/A  rl0 vnet|be|ipfw    v20
DS  N/A  rl0 vnet|be|pf      v30
DS  N/A  rl0 vnet|ng|none    v40
DS  N/A  rl0 vnet|be|none    v50
 
 
/root >cat /usr/local/etc/qjail.config/v20
v20 { 
host.hostname       =  "v20";
path                =  "/usr/jails/v20";
mount.fstab         =  "/usr/local/etc/qjail.fstab/v20";
exec.start          =  "/bin/sh /etc/rc";
exec.stop           =  "/bin/sh /etc/rc.shutdown";
exec.consolelog     =  "/var/log/qjail.v20.console.log";
mount.devfs;
devfs_ruleset       =  "4";
vnet;
exec.poststart="/usr/local/bin/qjail.vnet.be start v20 rl0 ipfw";
exec.prestop="/usr/local/bin/qjail.vnet.be stop v20 rl0 ipfw";
}


/root >cat /usr/local/etc/qjail.fstab/v20
/usr/jails/sharedfs /usr/jails/v20/sharedfs nullfs ro 0 0


/root >cat /usr/local/bin/qjail.vnet.be
#!/bin/sh
         
function=$1
jailname=$2
nicname=$3
firewall=$4
jaildir="/usr/jails"
           
           
start () {  
                    
jid=`jls -j ${jailname} jid` 
               
#if [ "${jid}" -gt "100" ]; then
#  echo " "         
#  echo "WARNING: The JID value is greater then 100."
#  echo "This may indicate many cycles of starting/stopping vnet jails"
#  echo "which results in lost memory pages. To recover the lost memory,"
#  echo "shutdown the host and reboot. This will zero out the JID counter"
#  echo "and make all the memory available again."
#  echo " "       
#fi               
              
if [ "${jid}" -gt "250" ]; then
  echo " "             
  echo "ERROR: No more vnet jail epair ip addresses can be created."
  echo "You MUST shutdown the host and reboot before vnet jails are"
  echo "startable again."
  echo " "             
  exit 2              
fi                  
                  
# Check the hosts network for existing bridge.
# If no bridge yet then create the bridge.
# Add real interface device name to one side of bridge.
#             
bridge=`ifconfig | grep -m 1 bridge | cut -f 1 -d :`
if [ -z ${bridge} ]; then
  ifconfig bridge0 create 
  ifconfig bridge0 addm ${nicname}
  ifconfig bridge0 up
  # vnet jails will not work unless ip forwarding is enabled.
  sysctl net.inet.ip.forwarding=1 
fi           
            
# Do this logic for all vnet jails.
# Assign alias IP number to bridge using jid to make it unique per vnet jail.
# The alias IP number is the vnet jails default route ip address.
# Create epair assigning "a" to bridge and "b" to the vnet jail
#             
ifconfig bridge0 alias 10.${jid}.0.1
ifconfig epair${jid} create 
ifconfig bridge0 addm epair${jid}a
ifconfig epair${jid}a up
ifconfig epair${jid}b vnet ${jid}
             
# Assign ip address to epair "b" inside of the vnet jail.
#            
jexec ${jailname} ifconfig epair${jid}b 10.${jid}.0.2
jexec ${jailname} route add default 10.${jid}.0.1 
jexec ${jailname} ifconfig lo0 127.0.0.1
                    
                    
if [ ${firewall} = "none" ]; then
  # If no firewall was selected in config -v
  # Start services inside of jail needed for network.
  # Note: using service command because it's not nojail keyword aware.
  #                 
  jexec ${jailname} service netif start 
  jexec ${jailname} service routing start 
  exit 0            
fi                  
                  
                   
if [ ${firewall} = "ipfw" ]; then
                    
  # Chech to see if selected firewall kernel modules have been loaded.
  #if ! kldstat -v | grep -qw ${firewall}; then
  #  echo "Error: ${firewall} was not compiled into the kernel."
  #  exit 2          
  #fi                
                    
  # If ipfw firewall was selected in config -v
  # Get the epairXb interface name of the vnet jail and
  # write the vaule to a file so the epairXb interface name can be
  # passed to the ipfw.rules file, then start ipfw.
  # Start services inside of jail needed by ipfw firewall.
  # Note: using service command because it's not nojail keyword aware.
  #                 
  jexec ${jailname} service netif start 
  jexec ${jailname} service routing start 
  ipfw_epair="${jaildir}/${jailname}/etc/epair"
  jexec ${jailname} ifconfig | grep -m 1 epair | cut -f 1 -d : > ${ipfw_epair}
  echo "ipfw_epair = ${ipfw_epair}"
  jexec ${jailname} service ipfw restart 
  exit 0
fi                  
                     
                                   
if [ ${firewall} = "pf" ]; then
                   
  # Chech to see if selected firewall kernel modules have been loaded.
  #if ! kldstat -v | grep -qw ${firewall}; then
  #  echo "Error: ${firewall} was not compiled into the kernel."
  #  exit 2           
  #fi                
                   
  # If pf firewall was selected in config -v  
  # Get the epairXb interface name of the vnet jail and
  # write the vaule to a file so the epairXb interface name can be
  # passed to the pf.rules file, then start pf.
  # Start services inside of jail needed by pf firewall.
  # Note: using service command because it's not nojail keyword aware.
  #               
  #jexec ${jailname} service netif start > /dev/null 2> /dev/null
  #jexec ${jailname} service routing start > /dev/null 2> /dev/null
  pf_epair="${jaildir}/${jailname}/etc/epair"
  jexec ${jailname} ifconfig | grep -m 1 epair | cut -f 1 -d : > ${pf_epair}
#  jexec ${jailname} service pf start > /dev/null 2> /dev/null
  jexec ${jailname} service pf start 
# jexec ${jailname} pfctl -F all; pfctl -f /etc/pf.rules
fi                 
                     
if [ ${firewall} = "ipf" ]; then
  ####### This stub is not used. Coded for when ipfilter becomes vnet aware.
  # If ipfilter firewall was selected in config -v
  # Get the epairXb interface name of the vnet jail and
  # write the vaule to a file so the epairXb interface name can be
  # passed to the ipf.rules file, then start ipf.
  # Start services inside of jail needed by ipfilter firewall.
  # Note: using service command because it's not nojail keyword aware.
  #               
  jexec ${jailname} service netif start > /dev/null 2> /dev/null
  jexec ${jailname} service routing start > /dev/null 2> /dev/null
  ipf_epair="${jaildir}/${jailname}/etc/epair"
  jexec ${jailname} ifconfig | grep -m 1 epair | cut -f 1 -d : > ${ipf_epair}
#  jexec ${jailname} service ipfilter start > /dev/null 2> /dev/null
  jexec ${jailname} service ipfilter start
fi            
            
}           
               
          
stop () {       
               
# Disable vnet jails network configuration.
#          
jid=`jls -j ${jailname} jid`
ifconfig epair${jid}b -vnet ${jid}
ifconfig bridge0 -alias 10.${jid}.0.1
ifconfig epair${jid}a destroy
          
# If host has no more vnet jails then disable bridge.
#         
epair=`ifconfig | grep -m 1 epair | cut -f 1 -d :`
if [ -z ${epair} ]; then
  ifconfig bridge0 destroy
  # sysctl net.inet.ip.forwarding=0 > /dev/null 2> /dev/null
fi            
                                
if [ ${firewall} = "ipfw" ]; then
  # If ipfw was started, now disable it.
  #                   
  jexec ${jailname} service ipfw stop > /dev/null 2> /dev/null
  jexec ${jailname} service routing stop > /dev/null 2> /dev/null
  jexec ${jailname} service netif stop > /dev/null 2> /dev/null
  sleep 2         
fi               
               
if [ ${firewall} = "pf" ]; then
  # If pf was started, now disable it.
  #               
  jexec ${jailname} service pf stop > /dev/null 2> /dev/null
  jexec ${jailname} service routing stop > /dev/null 2> /dev/null
  jexec ${jailname} service netif stop > /dev/null 2> /dev/null
  sleep 2        
fi                
                
#if [ ${firewall} = "ipf" ]; then
#  ######### This stub is not used right now.
#  # If ipfilter was started, now disable it.
#  #          
#  jexec ${jailname} service ipfilter stop > /dev/null 2> /dev/null
#  jexec ${jailname} service routing stop > /dev/null 2> /dev/null
#  jexec ${jailname} service netif stop > /dev/null 2> /dev/null
#  sleep 2              
#fi             
            
#if [ ${firewall} = "none" ]; then
  # If no firewall was started, disable network.
  #         
  #jexec ${jailname} service routing stop > /dev/null 2> /dev/null
  #jexec ${jailname} service netif stop > /dev/null 2> /dev/null
#  exit 0
#fi          
           
}            
                
[ "${function}" = "start" ]   && start   $*  && exit 0
[ "${function}" = "stop" ]    && stop    $*  && exit 0
              

/root >qjail start v20
net.inet.ip.forwarding: 1 -> 1
epair4a
add net default: gateway 10.4.0.1
Starting Network: lo0.
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128 
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
	inet 127.0.0.1 netmask 0xff000000 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	groups: lo 
add host 127.0.0.1: gateway lo0 fib 0: route already in table
Additional inet routing options: gateway=YES.
add host ::1: gateway lo0 fib 0: route already in table
add net fe80::: gateway ::1 fib 0: route already in table
add net ff02::: gateway ::1 fib 0: route already in table
add net ::ffff:0.0.0.0: gateway ::1 fib 0: route already in table
add net ::0.0.0.0: gateway ::1 fib 0: route already in table
ipfw_epair = /usr/jails/v20/etc/epair
net.inet.ip.fw.enable: 1 -> 0
net.inet6.ip6.fw.enable: 1 -> 0
jailed /etc/epair = epair4b
Firewall rules loaded.
Firewall logging enabled.
Jail successfully started  v20


/root >ifconfig -a
rl0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=2008<VLAN_MTU,WOL_MAGIC>
	ether 00:0c:6e:09:8b:74
	inet 10.0.10.9 netmask 0xfffffff0 broadcast 10.0.10.15 
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
	media: Ethernet autoselect (100baseTX <full-duplex>)
	status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128 
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 
	inet 127.0.0.1 netmask 0xff000000 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	groups: lo 
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	ether 02:46:d0:31:46:00
	inet 10.4.0.1 netmask 0xff000000 broadcast 10.255.255.255 
	nd6 options=9<PERFORMNUD,IFDISABLED>
	groups: bridge 
	id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
	maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
	root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
	member: epair4a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
	        ifmaxaddr 0 port 4 priority 128 path cost 2000
	member: rl0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
	        ifmaxaddr 0 port 1 priority 128 path cost 200000
epair4a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=8<VLAN_MTU>
	ether 02:c0:00:00:04:0a
	inet6 fe80::c0:ff:fe00:40a%epair4a prefixlen 64 scopeid 0x4 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
	status: active
	groups: epair 


Start the jail and issue commands to jail from jail console 


/root >qjail console v20
Last login: Sat Jul 23 07:25:09 on pts/0
FreeBSD 11.0-ALPHA6 (ipfwVimage) #0: Sun Jul 10 09:10:17 EDT 2016

Welcome to your FreeBSD jail.
v20 /root >
v20 /root >ipfw show
00010 0 0 allow ip from any to any via lo0
00011 0 0 allow log ip from any to any via epair4b
00012 0 0 deny log ip from any to any
65535 0 0 deny ip from any to any


v20 /root >ifconfig -a
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128 
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
	inet 127.0.0.1 netmask 0xff000000 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	groups: lo 
epair4b: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=8<VLAN_MTU>
	ether 02:c0:00:00:05:0b
	inet 10.4.0.2 netmask 0xff000000 broadcast 10.255.255.255 
	inet6 fe80::c0:ff:fe00:50b%epair4b prefixlen 64 scopeid 0x2 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
	status: active
	groups: epair 


v20 /root >ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
^C
--- 8.8.8.8 ping statistics ---
6 packets transmitted, 0 packets received, 100.0% packet loss


v20 /root >ipfw show
00010 0   0 allow ip from any to any via lo0
00011 6 504 allow log ip from any to any via epair4b
00012 0   0 deny log ip from any to any
65535 0   0 deny ip from any to any


v20 /root >cat /var/log/security
Jul  2 20:36:27 v20 newsyslog[3010]: logfile first created
log is empty

v20 /root >exit
logout

Back to issueing commands to the host

/root >cat tcpdump.epair4a
07:29:14.031691 ARP, Request who-has 10.4.0.1 tell 10.4.0.2, length 28
07:29:14.031803 ARP, Reply 10.4.0.1 is-at 02:46:d0:31:46:00 (oui Unknown), lengt
h 28
07:29:14.031829 IP 10.4.0.2 > 8.8.8.8: ICMP echo request, id 5909, seq 0, length
 64
07:29:15.033222 IP 10.4.0.2 > 8.8.8.8: ICMP echo request, id 5909, seq 1, length
 64
07:29:16.034410 IP 10.4.0.2 > 8.8.8.8: ICMP echo request, id 5909, seq 2, length
 64
07:29:17.035091 IP 10.4.0.2 > 8.8.8.8: ICMP echo request, id 5909, seq 3, length
 64
07:29:18.036279 IP 10.4.0.2 > 8.8.8.8: ICMP echo request, id 5909, seq 4, length
 64
07:29:19.037469 IP 10.4.0.2 > 8.8.8.8: ICMP echo request, id 5909, seq 5, length
 64
07:34:44.631297 ARP, Request who-has 10.0.10.2 tell 10.0.10.3, length 46


/root >cat tcpdump.bridge0
07:29:14.031725 ARP, Request who-has 10.4.0.1 tell 10.4.0.2, length 28
07:29:14.031780 ARP, Reply 10.4.0.1 is-at 02:46:d0:31:46:00 (oui Unknown), lengt
h 28
07:29:14.031833 IP 10.4.0.2 > 8.8.8.8: ICMP echo request, id 5909, seq 0, length
 64
07:29:15.033231 IP 10.4.0.2 > 8.8.8.8: ICMP echo request, id 5909, seq 1, length
 64
07:29:16.034418 IP 10.4.0.2 > 8.8.8.8: ICMP echo request, id 5909, seq 2, length
 64
07:29:17.035099 IP 10.4.0.2 > 8.8.8.8: ICMP echo request, id 5909, seq 3, length
 64
07:29:18.036286 IP 10.4.0.2 > 8.8.8.8: ICMP echo request, id 5909, seq 4, length
 64
07:29:19.037477 IP 10.4.0.2 > 8.8.8.8: ICMP echo request, id 5909, seq 5, length
 64
07:34:44.631308 ARP, Request who-has 10.0.10.2 tell 10.0.10.3, length 46

/root >cat tcpdump.rl0
07:29:14.031717 ARP, Request who-has 10.4.0.1 tell 10.4.0.2, length 46
07:34:44.631277 ARP, Request who-has 10.0.10.2 tell 10.0.10.3, length 46

/root/bin >cat /var/log/security
Jul 23 07:29:14 fbsdjones kernel: ipfw: 11 Accept ICMP:8.0 10.4.0.2 8.8.8.8 out
via epair4b
Jul 23 07:29:14 fbsdjones kernel: ipfw: 13 Deny ICMP:8.0 10.4.0.2 8.8.8.8 in via
 bridge0
Jul 23 07:29:15 fbsdjones kernel: ipfw: 11 Accept ICMP:8.0 10.4.0.2 8.8.8.8 out
via epair4b
Jul 23 07:29:15 fbsdjones kernel: ipfw: 13 Deny ICMP:8.0 10.4.0.2 8.8.8.8 in via
 bridge0
Jul 23 07:29:16 fbsdjones kernel: ipfw: 11 Accept ICMP:8.0 10.4.0.2 8.8.8.8 out
via epair4b
Jul 23 07:29:16 fbsdjones kernel: ipfw: 13 Deny ICMP:8.0 10.4.0.2 8.8.8.8 in via
 bridge0
Jul 23 07:29:17 fbsdjones kernel: ipfw: 11 Accept ICMP:8.0 10.4.0.2 8.8.8.8 out
via epair4b
Jul 23 07:29:17 fbsdjones kernel: ipfw: 13 Deny ICMP:8.0 10.4.0.2 8.8.8.8 in via
 bridge0
Jul 23 07:29:18 fbsdjones kernel: ipfw: 11 Accept ICMP:8.0 10.4.0.2 8.8.8.8 out
via epair4b
Jul 23 07:29:18 fbsdjones kernel: ipfw: 13 Deny ICMP:8.0 10.4.0.2 8.8.8.8 in via
 bridge0
Jul 23 07:29:19 fbsdjones kernel: ipfw: 11 Accept ICMP:8.0 10.4.0.2 8.8.8.8 out
via epair4b
Jul 23 07:29:19 fbsdjones kernel: ipfw: 13 Deny ICMP:8.0 10.4.0.2 8.8.8.8 in via
 bridge0


/root/bin >ipfw show
00010   0     0 allow ip from any to any via lo0
00011   0     0 deny ip from 10.0.10.4 to any in via rl0
00012   0     0 allow log ip from any to any via rl0
00013  41  4004 deny log ip from any to any
65535 276 27346 deny ip from any to any


Conclution:

From the results it looks like the vnet jail's ipfw log is writing 
to the hosts ipfw log which is /var/log/security. As shown by the 
"ipfw show" command for the host that no packets have been passed 
to the rl0 interface.

1. There is a security problem with the vnet jailed ipfw firewall having 
write access to the hosts /var/log/security file. A jail no matter what
kind it is, non-vnet or vnet, is by design, not suppose to have access to
anything on the host. Here is hard evidence that it is happening.

2. The output to the host's ipfw log is missing in action. Host's ipfw 
firewall rules log all denied packets, but they are not in the host's 
security log interspersed with the vnet jail's log records. 

3. External evidence indicates the passing of packets from the vnet 
jail stack is NOT being handed off correctly to the host's stack.

4. In general everything points to ipfw not yet being totally integrated 
into vimage and in turn into the host at the kernel level.
Comment 11 Joe Barbish 2016-07-31 15:05:04 UTC
Installed BETA2 and same results. Firewalls do not work with vimage. BETA3 is available but no use testing with it because no changes applied yet. Looks like 11.0 is on tract to be published with only basic vimage working. NO firewall of any kind being able to run on host and/or in vimage jail with vimage compiled into host kernel.

I have reviewed your comment #9 in detail. You did not provide enough details to prove anything is working. Pinging the host's bridge from within a vnet jail is a long way from pinging the public internet. 

Posting your test jail.conf file contents and your epair/bridge commands and the commands you use to start/stop your vnet jail including the login banner showing which Freebsd version is running on the host would be very helpful for me to reproduce your test environment. 

Maybe another pair of eyes will help. Some times a developer is too close to the forest to see the trees. IE too involved with the project to see the problem staring them in the face. Have been their myself.
Comment 12 Bjoern A. Zeeb freebsd_committer freebsd_triage 2016-07-31 23:38:51 UTC
(In reply to Joe Barbish from comment #10)

To your #1, what is logged to /var/log/security is the kernel log that syslog gets.  It's not anything in a jail writing to that file, it's your syslogd on the all-seeing base system.  That's essentially "dmesg".  That's an unfortunate historic thing of the ipfw implementation;  using tcpdump on the ipfw0 interface like on pflog0 for pf will avoid you seeing the vnet-jail logging on the base system as well.

To your #2, that might be the case as a bit later your syslog might log a line that the last message was repeated another n times.  Hard to say from just the output.

To your #3 and your actual problem:

(a) if you are bridging you do not need ip forwarding, especially not inside the vnet-jail.  The fact that you are bridging and trying to forward is a weird setup in first place.

(b) your current topology looks like:  (gateway system) --- |physical wire| --- (rl0) --- (bridge0) --- epairNa --- |jail| --- epairNb
All these interfaces are in the same L2 broadcast domain (hence no need for ip forwarding ideally).  However you setup your L3 that bridge0 is the default gw for the jail, so your host system suddenly has to "forward" these packets.  You can have IP aliases (or different subnets) on your gateway machine and then just point the vnet-jail at that (as in move the IP address from the bridge0 to the gateway), or you can remove the bridge0 have have your base system be a router forwarding the packets.  In that case you put the IP address on epairNa instead of the bridge.  In the latter case however your gateway machine needs to route that subnet to the IP of the rl0 interface of the base system, as otherwise return packets never make it.

(c) the base system firewall does what it's told to do and drops the packet on the bridge0 interface on the base system, as your log shows:

    Jul 23 07:29:14 fbsdjones kernel: ipfw: 13 Deny ICMP:8.0 10.4.0.2 8.8.8.8 in via
 bridge0

So everything does work as expected, but your base system rules do not allow the packet to pass.


To your #4 I think it's mostly a problem of not enough documentation and not enough samples yet.
Comment 13 Joe Barbish 2016-08-01 19:29:41 UTC
From Barbish’s comment #10
#1. There is a security problem with the vnet jailed ipfw firewall having write access to the hosts /var/log/security file. A jail no matter what kind it is, non-vnet or vnet, is by design, not suppose to have access to anything on the host. Here is hard evidence that it is happening.

From Zeeb’s Comment #12 
To your #1, what is logged to /var/log/security is the kernel log that syslog gets.  It's not anything in a jail writing to that file, it's your syslogd on the all-seeing base system.  That's essentially "dmesg".  That's an unfortunate historic thing of the ipfw implementation;  using tcpdump on the ipfw0 interface like on pflog0 for pf will avoid you seeing the vnet-jail logging on the base system as well.
************* reply *******
You did a fine job of describing the logging problem, as it currently exists. It cannot be left this way. The base system ipfw /var/log/security file should only contain records logged from the base system ipfw firewall. Log records from the IPFW firewall in the vnet jail must be posted to the vnet jails directory tree /var/log/security file. The ipfw logging sub-system needs to be made vnet aware to accomplish logging to the correct vnet jail as their may be more that a single vnet jail running on the base host system. This implies that each vnet jail would have its own ipfw log records written to their vnet jails directory tree /var/log/security file. This is what the vnet jail user community expects.

Ipfw may be writing syslog format records, but its tagging all its records to the “security” facility and the base system /etc/syslog.conf file has that facility being written to /var/log/security file which is different than “dmesg” records which go to /var/log/messages file. 

When it comes to your recommendation “using tcpdump on the ipfw0 interface like on pflog0 for pf will avoid you seeing the vnet-jail logging on the base system as well.” On one hand, this don’t seem to be doable, and on the other hand you can’t really thing you are going to force vnet users to jump through hoops to set this up for each vnet jail. That’s crazy. This has to be fixed centrally. Its my understanding that the ipfw0 interface is only enabled with “firewall_logging=yes” in the rc.conf of the host or the vnet jail. This currently activates logging to the hosts /var/log/security file. So the baseline security file will still be polluted with logging from each running vnet jails ipfw firewall, while the vnet jail admin grows his own userland task to tcpdump the ipfw0 raw data and write it to the correct vnet jail in real time. You need to rethink this approach.

From Zeeb’s Comment #12
To your #3 and your actual problem:

(a) if you are bridging you do not need ip forwarding, especially not inside the vnet-jail. The fact that you are bridging and trying to forward is a weird setup in first place.

************* reply *******
It’s my understanding that non-vnet jails need gateway_enable=yes in the hosts (base system) rc.conf or “sysctl net.init.ip.forwarding=1 to work.  If the goal is have a base system where both non-vnet jails and vnet jails can run at same time then my bridging setup is “normal or more the norm then just having a base system that can only run vnet jails. Although an only vnet jail base system is also a normal setup. 
In this light my setup is not a weird setup at all, but a more flexible one. The jail(8) man page does not state any restrictions on vnet jails and non-vnet jails not being allowed to run on the same base system at the same time.   


From Zeeb’s Comment #12
(b) Your current topology looks like:  
Snip…….
 There is nothing wrong with the bridge/epair method I have employed. Without using an dynamically loaded firewall on the host base system and not having ipfw compiled into the kernel with vimage, using a kernel with just vimage, I can start a vnet jail and ping the public internet getting replies without any problems. Its when I enable ipfw on the host that problems arise. Even compiling a kernel with vimage and ipfw, problems also occur.
Comment 14 Joe Barbish 2016-08-01 19:44:19 UTC
Here is another trace of events documenting the problem ipfw on a ALPHA6 and BETA2 os.


Script started on Mon Aug  1 12:19:58 2016
# Entering commands on the host console
#host is running vimage kernel
#with ipfw statements in the hosts rc.conf

/root >cat /var/log/security

# host ipfw security log is empty
/root >ipfw show
00050 0 0 check-state
00060 0 0 allow ip from any to any via lo0
00070 0 0 deny ip from 10.0.10.4 to any
00080 0 0 allow log ip from any to any via rl0 keep-state
00090 0 0 allow log ip from any to any keep-state
65535 0 0 deny ip from any to any

#no activity on host ipfw firewall yet

/root >qjail start v50
Jail successfully started  v50

# Lets see what host network looks like

/root >ifconfig -a
rl0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> 
        metric 0 mtu 1500
	options=2008<VLAN_MTU,WOL_MAGIC>
	ether 00:0c:6e:09:8b:74
	inet 10.0.10.9 netmask 0xfffffff0 broadcast 10.0.10.15 
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
	media: Ethernet autoselect (100baseTX <full-duplex>)
	status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128 
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 
	inet 127.0.0.1 netmask 0xff000000 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	groups: lo 
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	ether 02:8f:94:84:0c:00
	inet 10.5.0.1 netmask 0xff000000 broadcast 10.255.255.255 
	nd6 options=9<PERFORMNUD,IFDISABLED>
	groups: bridge 
	id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
	maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
	root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
	member: epair5a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
	        ifmaxaddr 0 port 4 priority 128 path cost 2000
	member: rl0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
	        ifmaxaddr 0 port 1 priority 128 path cost 200000
epair5a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> 
        metric 0 mtu 1500
	options=8<VLAN_MTU>
	ether 02:c0:00:00:04:0a
	inet6 fe80::c0:ff:fe00:40a%epair5a prefixlen 64 scopeid 0x4 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
	status: active
	groups: epair 

/root >jexec v50 login -f root
Last login: Mon Aug  1 12:06:21 on ttyv0
FreeBSD 11.0-BETA2 (Vimage) #0: Tue Jul 26 07:48:38 EDT 2016

Welcome to your FreeBSD jail.
v50 /root >ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
ping: sendto: Permission denied
ping: sendto: Permission denied
ping: sendto: Permission denied
ping: sendto: Permission denied
ping: sendto: Permission denied
ping: sendto: Permission denied
^C
--- 8.8.8.8 ping statistics ---
6 packets transmitted, 0 packets received, 100.0% packet loss
v50 /root >exit
logout
^M
# Lets go see what the hosts ipfw log has captured
/root >cat /var/log/security

Aug  1 12:23:52 fbsdjones kernel: ipfw: 90 Accept ICMPv6:143.0 [::] [ff02::16] out via epair5a
Aug  1 12:23:53 fbsdjones kernel: ipfw: 90 Accept ICMPv6:135.0 [::] [ff02::1:ff00:40a] out via epair5a
Aug  1 12:23:53 fbsdjones kernel: ipfw: 80 Accept UDP 10.0.10.9:51100 209.18.47.61:53 out via rl0
Aug  1 12:23:53 fbsdjones kernel: ipfw: 80 Accept UDP 209.18.47.61:53 10.0.10.9:51100 in via rl0
Aug  1 12:23:53 fbsdjones kernel: ipfw: 80 Accept UDP 10.0.10.9:59748 209.18.47.61:53 out via rl0
Aug  1 12:23:53 fbsdjones kernel: ipfw: 80 Accept UDP 209.18.47.61:53 10.0.10.9:59748 in via rl0
Aug  1 12:23:53 fbsdjones kernel: ipfw: 80 Accept UDP 10.0.10.9:18948 209.18.47.61:53 out via rl0
Aug  1 12:23:53 fbsdjones kernel: ipfw: 80 Accept UDP 209.18.47.61:53 10.0.10.9:18948 in via rl0
Aug  1 12:23:53 fbsdjones kernel: ipfw: 80 Accept UDP 10.0.10.9:16357 209.18.47.61:53 out via rl0
Aug  1 12:23:53 fbsdjones kernel: ipfw: 80 Accept UDP 209.18.47.61:53 10.0.10.9:16357 in via rl0
Aug  1 12:23:53 fbsdjones kernel: ipfw: 80 Accept UDP 10.0.10.9:48864 209.18.47.61:53 out via rl0
Aug  1 12:23:53 fbsdjones kernel: ipfw: 80 Accept UDP 209.18.47.61:53 10.0.10.9:48864 in via rl0
Aug  1 12:23:54 fbsdjones kernel: ipfw: 90 Accept ICMPv6:143.0 [fe80::c0:ff:fe00:40a] [ff02::16] out via epair5a
Aug  1 12:23:54 fbsdjones kernel: ipfw: 80 Accept UDP 10.0.10.9:49985 209.18.47.61:53 out via rl0
Aug  1 12:23:54 fbsdjones kernel: ipfw: 80 Accept UDP 209.18.47.61:53 10.0.10.9:49985 in via rl0
Aug  1 12:23:54 fbsdjones kernel: ipfw: 80 Accept UDP 10.0.10.9:35004 209.18.47.61:53 out via rl0
Aug  1 12:23:54 fbsdjones kernel: ipfw: 80 Accept UDP 209.18.47.61:53 10.0.10.9:35004 in via rl0
Aug  1 12:23:54 fbsdjones kernel: ipfw: 80 Accept UDP 10.0.10.9:53364 209.18.47.61:53 out via rl0
Aug  1 12:23:55 fbsdjones kernel: ipfw: 80 Accept UDP 209.18.47.61:53 10.0.10.9:53364 in via rl0

# the vnet jail is trying to do dns lookuup for its domain name

/root >ipfw show
00050  0    0 check-state
00060  0    0 allow ip from any to any via lo0
00070  0    0 deny ip from 10.0.10.4 to any
00080 16 1859 allow log ip from any to any via rl0 keep-state
00090  3  304 allow log ip from any to any keep-state
65535  0    0 deny ip from any to any

# check-state & keep-state is the standard method of only needing a rule
# to let stuff pass the firewall without needing a rule to allow it back in.
# A in core keep-state rule is created to auto allow the conversation back in.

Lets look at some tcpdump files to see what is really moving a round
/root >/root >tcpdump -c50 -i epair5a > tcpdump.epair5a
/root >tcpdump: verbose output suppressed, use -v or -vv for full protocol
/root >listening on epair5a, link-type EN10MB (Ethernet), capture size
/root >26214 bytes
/root >^C
/root >0 packets captured
/root >0 packets received by filter
/root >0 packets dropped by kernel

/root >tcpdump: verbose output suppressed, use -v or -vv for full protocol
/root >listening on bridge0, link-type EN10MB (Ethernet), capture size 
/root >26214 bytes
/root >^C
/root >0 packets captured
/root >0 packets received by filter
/root >0 packets dropped by kernel

/root >/root >tcpdump -c50 -i rl0 > tcpdump.rl0
/root >tcpdump: verbose output suppressed, use -v or -vv for full protocol
/root >listening on rl0, link-type EN10MB (Ethernet), capture size
/root >26214 bytes
/root >^C
/root >23 packets captured
/root >23 packets received by filter
/root >0 packets dropped by kernel

# no dump data captured for epair5a & bridge0
# lets see what dump data captured from
/root >cat tcpdump.rl0
12:23:52.855962 ARP, Request who-has 10.5.0.1 tell 10.5.0.1, length 46
12:23:52.948717 ARP, Request who-has 10.5.0.2 tell 10.5.0.2, length 46
12:23:52.962511 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 3 group record(s), length 68
12:23:53.328262 IP6 :: > ff02::1:ff00:40a: ICMP6, neighbor solicitation, who has fe80::c0:ff:fe00:40a, length 32
12:23:53.593534 ARP, Request who-has 10.0.10.2 tell 10.0.10.9, length 46
12:23:53.619012 ARP, Reply 10.0.10.2 is-at 00:10:b5:7b:1d:6f (oui Unknown), length 46
12:23:53.619063 IP 10.0.10.9.51100 > dns-cac-lb-01.rr.com.domain: 53852+ PTR? 1.0.5.10.in-addr.arpa. (39)
12:23:53.637185 IP dns-cac-lb-01.rr.com.domain > 10.0.10.9.51100: 53852 NXDomain* 0/1/0 (98)
12:23:53.638286 IP 10.0.10.9.59748 > dns-cac-lb-01.rr.com.domain: 12862+ PTR? 2.0.5.10.in-addr.arpa. (39)
12:23:53.656456 IP dns-cac-lb-01.rr.com.domain > 10.0.10.9.59748: 12862 NXDomain* 0/1/0 (98)
12:23:53.657428 IP 10.0.10.9.18948 > dns-cac-lb-01.rr.com.domain: 8843+ PTR? 0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa. (90)
12:23:53.676235 IP dns-cac-lb-01.rr.com.domain > 10.0.10.9.18948: 8843 NXDomain* 0/1/0 (149)
12:23:53.677175 IP 10.0.10.9.16357 > dns-cac-lb-01.rr.com.domain: 40286+ PTR? 6.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (90)
12:23:53.694977 IP dns-cac-lb-01.rr.com.domain > 10.0.10.9.16357: 40286 NXDomain 0/1/0 (160)
12:23:53.695975 IP 10.0.10.9.48864 > dns-cac-lb-01.rr.com.domain: 22515+ PTR? a.0.4.0.0.0.f.f.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (90)
12:23:53.923131 IP dns-cac-lb-01.rr.com.domain > 10.0.10.9.48864: 22515 NXDomain 0/1/0 (160)
12:23:54.562106 IP6 fe80::c0:ff:fe00:40a > ff02::16: HBH ICMP6, multicast listener report v2, 3 group record(s), length 68
12:23:54.946338 IP 10.0.10.9.49985 > dns-cac-lb-01.rr.com.domain: 47412+ PTR? 2.10.0.10.in-addr.arpa. (40)
12:23:54.964724 IP dns-cac-lb-01.rr.com.domain > 10.0.10.9.49985: 47412 NXDomain* 0/1/0 (99)
12:23:54.965525 IP 10.0.10.9.35004 > dns-cac-lb-01.rr.com.domain: 48228+ PTR? 9.10.0.10.in-addr.arpa. (40)
12:23:54.983620 IP dns-cac-lb-01.rr.com.domain > 10.0.10.9.35004: 48228 NXDomain* 0/1/0 (99)
12:23:54.984610 IP 10.0.10.9.53364 > dns-cac-lb-01.rr.com.domain: 24431+ PTR? 61.47.18.209.in-addr.arpa. (43)
12:23:55.002963 IP dns-cac-lb-01.rr.com.domain > 10.0.10.9.53364: 24431 1/0/0 PTR dns-cac-lb-01.rr.com. (77)

/root >exit
exit

Script done on Mon Aug  1 12:40:15 2016

Lets try to interpret what the evidence is telling us.

When the vnet jail is started it tries to do a dns search for the vnet jail
hostname which of course is bogus. But we see the traffic in the dump and
also in the host ipfw log. What is also shown is the ICMPV6 packets trying
to do the ping command issued from within the running vnet jail. They also
have keep-state so they should by allowed back in. These ICMPV6 packets are 
on the epair5a interface, but yet we see no traffic in the tcpdump for the
epair5a interface. What is really strange is there are no ipv4 ping packets
for the epair5a, bridge0, or rl0 tcpdumps. So what is unique about the stuff 
that did get out? The vnet jails hostname lookup is an auto function of
starting the jail. The ICMPV6 packets are also part of some auto function
of the ping command. In my opinion this strange behavior is caused by
the ipfw firewall running on the host system not being correctly integrated
into vimage.
Comment 15 Bjoern A. Zeeb freebsd_committer freebsd_triage 2016-08-04 15:32:42 UTC
(In reply to Joe Barbish from comment #13)

the log you get in /var/log/security is done by ipfw using a log(9) statement, essentially a printf in the kernel.  There is one kernel running.  Apart from the network stack nothing has been virtualised using VIMAGE.  If you want to virtualise the kernel message buffer, patches will be welcome.
The base system is always able to see everything so this is not a security issue.  What I might consider an issue is that a jail seems to be able to call dmesg, but that's a different issue also relevant to non-vnet-jaisl; I just opened a PR for this to track it.



Your assumption on gateway_enabled=YES in the base system is not correct;  it might depend on the setup but you can perfectly fine run (non-vnet) jails without enabling ip forwarding in the base system and I have done so for years:
$ sysctl -a | grep forwarding
net.inet.ip.forwarding: 0
net.inet.ip.fastforwarding: 0
net.inet6.ip6.forwarding: 0
$ jls -av | wc -l
      49
$




I think what I am saying about your topology is that if you are routing in your base system (turn forwarding on) you will not need the bridge interface.
If you use the bridge interface there's no need for forwarding.
You are doing both at the same time by treating the bridge interface as a gateway interface as well, and that's just not how L2 and L3 are done normally.
Comment 16 Bjoern A. Zeeb freebsd_committer freebsd_triage 2016-08-04 15:38:50 UTC
(In reply to Joe Barbish from comment #14)


All the evidence you show in this trace tells me there no single packet coming out from your vnet jail yet; sorry.  All other conclusions are not backed by the data you show.

Also your conclusions about ping 8.8.8.8 triggering IPv6 packets are wrong.  What you see is the epair in the base system doing DAD and joining a MC group, which gets bridged to your real interface.

Whatever DNS lookups are happening, e.g. the reverse for the jail IP, is triggered by something in the base system it seems (also see the source address of that packet), which is not 10.0.5.x but 10.0.10.9.

At this point I think asking on a mailing list for help to get your setup sorted might be more productive than discussing this in a PR.  It'll also help you to get the other pair of eyes.  You can always easily point people here and if I missed something someone will point it out, I am sure.