Created attachment 199362 [details] pflog from host symptoms= i386 vnet does not behave like the amd64 vnet version. The i386 version is flooding the host pflog with ipv4 MULTICAST requests and ipv6 Neighborhood requests. The amd64 version doesn't do that. On the i386 system with all the vnet jails stopped and then issuing the shutdown command the system takes a dump only if vnet jails had been started/stopped. This does not happen on a amd64 system. Configuration = I386 box running pf firewall with very simple rules that pass and log all traffic. This I386 box is on private lan so no nat being done. Has vnet jail running pf firewall with very simple rules that pass and log all traffic. Host config = rc.conf ifconfig_xl0="DHCP" pf_enable="YES" pflog_enable="YES" pf_rules="/etc/pf.rules.host" pflog_logfile="/var/log/pflog" pf.rules.host oif = "xl0" set block-policy drop set state-policy if-bound set loginterface $oif scrub out on $oif all random-id scrub reassemble tcp set skip on lo0 pass out log (all) quick pass in log (all) quick Vnet jail configuration rc.conf gateway_enable="YES" pf_enable="YES" pf_rules="/etc/pf.conf" pflog_enable="YES" pflog_logfile="/var/log/pflog" pf.conf oif=epair1b set block-policy drop set fail-policy drop set state-policy if-bound scrub in on $oif all set skip on lo0 block out log quick on $oif inet proto tcp from any to any port 43 pass out log (all) quick pass in log (all) quick After the vnet jail is started I see this on the host ipfconfig xl0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> options=82009<RXCSUM,VLAN_MTU,WOL_MAGIC,LINKSTATE> ether 00:01:02:2f:c3:00 inet 10.0.10.6 netmask 0xfffffff0 broadcast 10.0.10.15 media: Ethernet autoselect (100baseTX <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> pflog0: flags=141<UP,RUNNING,PROMISC> metric 0 mtu 33184 groups: pflog bridge10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> ether 02:2a:47:08:71:0a id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: epair1a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 7 priority 128 path cost 2000 member: xl0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 2 priority 128 path cost 200000 groups: bridge nd6 options=9<PERFORMNUD,IFDISABLED> epair1a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> options=8<VLAN_MTU> ether 02:a0:73:db:2f:0a inet6 fe80::a0:73ff:fedb:2f0a%epair1a prefixlen 64 scopeid 0x7 groups: epair media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> ps ax 692 - DL 0:06.87 [pf purge] 1105 - Is 0:00.00 pflogd: [priv] (pflogd) 1106 - S 0:00.29 pflogd: [running] -s 116 -i pflog0 1409 - IsJ 0:00.01 pflogd: [priv] (pflogd) 1413 - SJ 0:00.31 pflogd: [running] -s 116 -i pflog0 1465 - SsJ 0:00.02 /usr/sbin/syslogd -ss 1521 - IsJ 0:00.03 /usr/sbin/cron -J 60 -s After the vnet jail is started I see this on the vnet console ipfconfig pflog0: flags=141<UP,RUNNING,PROMISC> metric 0 mtu 33184 groups: pflog epair1b: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> options=8<VLAN_MTU> ether 02:a0:73:db:2f:0b inet 10.0.10.31 netmask 0xff000000 broadcast 10.255.255.255 inet6 fe80::a0:73ff:fedb:2f0b%epair1b prefixlen 64 scopeid 0x3 groups: epair media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> ****************************************************** From the vnet console I issue this command. ping -c 2 freebsd.org PING freebsd.org (96.47.72.84): 56 data bytes 64 bytes from 96.47.72.84: icmp_seq=0 ttl=46 time=39.367 ms 64 bytes from 96.47.72.84: icmp_seq=1 ttl=46 time=39.096 ms --- freebsd.org ping statistics --- 2 packets transmitted, 2 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 39.096/39.231/39.367/0.135 ms Then I looked at the pflog on the host and in the vnet jail to see the ping packets and what I see is a flood of other ipv4 and ipv6 packets. The ipv6 packet flood was there in 11.x i386 and now in 12.0 there is a flood of ipv4 packets. There is a bug report about the ipv6 packet flood in 11.x. A lot of network resources are being consumed making this background noise. Looks like originating from vimage. The pflog host report is attached as separate file. pflog.txt.bug1.host
Created attachment 199363 [details] pflog from vnet jail
(In reply to Joe Barbish from comment #1) That traffic looks like you've got a device announcing itself using the Simple Service Discovery Protocol. I'm pretty sure that if you tcpdump on bridge0 on the host you'd see the traffic too. (And that's why the vnet jail sees it.) That's almost certainly not a bug. Do you have a backtrace for the panic on shutdown?
I don't disagree that the traffic looks like a device announcing itself using the Simple Service Discovery Protocol. Thanks for identifying what it is. The problem is it never ends. The log file is flooded until it fills up and then rotates. This is not happening on amd64 system and if I use ipfw on the host and vnet jail this also does't happen. That is the problem. Cut the following out of the vmcore dump. Hope its what you want. in6_purgeaddr: err=65, destination address delete failed Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x2f9528c8 fault code = supervisor read, page not present instruction pointer = 0x20:0x18c0f2c0 stack pointer = 0x28:0x16614b6c frame pointer = 0x28:0x16614b74 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (thread taskq) trap number = 12 panic: page fault cpuid = 0 time = 1542749869 KDB: stack backtrace: #0 0x11080cf at kdb_backtrace+0x4f #1 0x10bb097 at vpanic+0x147 #2 0x10baf4b at panic+0x1b #3 0x16910a5 at trap_fatal+0x395 #4 0x16910e3 at trap_pfault+0x33 #5 0x169072f at trap+0x3cf #6 0xffc0315d at PTDpde+0x4165 #7 0x11b9a06 at if_clone_destroyif+0x1a6 #8 0x11ba17c at if_clone_detach+0xbc #9 0x18c0f441 at vnet_pflog_uninit+0x21 #10 0x11dfaeb at vnet_destroy+0x10b #11 0x1083da4 at prison_deref+0x284 #12 0x1119c59 at taskqueue_run_locked+0x189 #13 0x111ae07 at taskqueue_thread_loop+0x97 #14 0x107c731 at fork_exit+0x71 #15 0xffc033ba at PTDpde+0x43c2
(In reply to Joe Barbish from comment #3) Please check with tcpdump on bridge0/xl0 to see if you actually see that traffic there too. I'll try to reproduce the panic on shutdown. That does look like a pflog issue.
(In reply to Kristof Provost from comment #4) To be clear: what I'm suggesting is that you confirm/deny that the traffic is coming in through xl0. If so, then it's entirely expected to see it in the log file. If it keeps coming you'll keep seeing it.
Created attachment 199436 [details] tcpdump of LAN i386 12.0-RC1 host
I got the tcpdump on xl0 and the vnet pflog and the host pflog. I was surprised at the results. None of those flood packets show up. So I had to think about what was different this time. My production config is the gateway host is 11.2 amd64 and the LAN development host where previous vnet jail testing was done was i386 11.0-RC1. To prepare for 12.0 testing I installed 12.0-RC1 on a empty HD which I swapped with the 11.2 hd on the gateway host. I also installed 12.0-RC1 on the LAN development i386 host. The previous posted test pflogs are from a config where the gateway host is 12.0-RC1 and the LAN development host is 12.0-RC1. This is the config where the pflogs on the development LAN host are flooded with all those packets This test I did not swap the gateway host 11.2 HD with the 12.0-RC1 HD, and kept the development LAN host at 12.0-RC1. This resulted in the pflogs on the development host to no longer have the flood packets. I attached the tcpdump, host pflog, and vnet jail pflog as separate files
Created attachment 199437 [details] pflog test2
Created attachment 199438 [details] pflog jail test2
A simple kldload pflog / kldunload pflog is sufficient to provoke this on i386, but not on amd64. The panic happens trying to access V_pflogifs in pflog_clone_destroy(): #15 0x224022c0 in pflog_clone_destroy (ifp=0x22fc7800) at /usr/src/sys/netpfil/pf/if_pflog.c:149 149 if (V_pflogifs[i] == ifp) (kgdb) info registers eax 0xffffffc0 -64 ecx 0x7597b08 123304712 edx 0x2 2 ebx 0x9aa7680 162166400 esp 0x0 0x0 ebp 0x1db26acc 0x1db26acc esi 0x22fc7800 586971136 edi 0x22fc7800 586971136 eip 0x224022c0 0x224022c0 eflags 0x210246 2163270 cs 0x20 32 ss 0x0 0 ds 0x28 40 es 0x28 40 fs 0x8 8 gs 0x0 0 (kgdb) disassemble Dump of assembler code for function pflog_clone_destroy: 0x224022a0 <pflog_clone_destroy+0>: push %ebp 0x224022a1 <pflog_clone_destroy+1>: mov %esp,%ebp 0x224022a3 <pflog_clone_destroy+3>: push %esi 0x224022a4 <pflog_clone_destroy+4>: push %eax 0x224022a5 <pflog_clone_destroy+5>: mov $0xffffffc0,%eax 0x224022aa <pflog_clone_destroy+10>: mov 0x8(%ebp),%esi 0x224022ad <pflog_clone_destroy+13>: nop 0x224022ae <pflog_clone_destroy+14>: nop 0x224022af <pflog_clone_destroy+15>: nop 0x224022b0 <pflog_clone_destroy+16>: mov %fs:0x0,%ecx 0x224022b7 <pflog_clone_destroy+23>: mov 0x31c(%ecx),%ecx 0x224022bd <pflog_clone_destroy+29>: mov 0x1c(%ecx),%ecx 0x224022c0 <pflog_clone_destroy+32>: cmp %esi,0x22403140(%ecx,%eax,1) <------------- 0x224022c7 <pflog_clone_destroy+39>: je 0x224022d0 <pflog_clone_destroy+48> 0x224022c9 <pflog_clone_destroy+41>: add $0x4,%eax Strangely, adding a printf("KP: %d\n", i); just before that prevents it from panicking. With that printf() the module unloads just fine. Disassembling that version shows: for (i = 0; i < PFLOGIFS_MAX; i++) { printf("KP %d\n", i); 12b0: 89 7c 24 04 mov %edi,0x4(%esp) 12b4: c7 04 24 71 01 00 00 movl $0x171,(%esp) 12bb: e8 fc ff ff ff call 12bc <pflog_clone_destroy+0x1c> 12c0: 64 a1 00 00 00 00 mov %fs:0x0,%eax if (V_pflogifs[i] == ifp) 12c6: 8b 80 1c 03 00 00 mov 0x31c(%eax),%eax 12cc: 8b 40 1c mov 0x1c(%eax),%eax 12cf: 39 b4 b8 00 21 00 00 cmp %esi,0x2100(%eax,%edi,4) 12d6: 75 0b jne 12e3 <pflog_clone_destroy+0x43> V_pflogifs[i] = NULL; 12d8: c7 84 b8 00 21 00 00 movl $0x0,0x2100(%eax,%edi,4) <------------------- 12df: 00 00 00 00 static void pflog_clone_destroy(struct ifnet *ifp) { int i; for (i = 0; i < PFLOGIFS_MAX; i++) { 12e3: 47 inc %edi 12e4: 83 ff 10 cmp $0x10,%edi 12e7: 75 c7 jne 12b0 <pflog_clone_destroy+0x10> printf("KP %d\n", i); if (V_pflogifs[i] == ifp) V_pflogifs[i] = NULL; As opposed to the panicking version: for (i = 0; i < PFLOGIFS_MAX; i++) if (V_pflogifs[i] == ifp) 12b7: 8b 89 1c 03 00 00 mov 0x31c(%ecx),%ecx 12bd: 8b 49 1c mov 0x1c(%ecx),%ecx 12c0: 39 b4 01 40 21 00 00 cmp %esi,0x2140(%ecx,%eax,1) <-------------- 12c7: 74 07 je 12d0 <pflog_clone_destroy+0x30> It's almost as if there's a compiler issue here. My x86 asm foo is a bit too weak to work out what's supposed to be happening here, and what might be wrong.
Building with gcc doesn't seem to make the problem go away. It seems unlikely that this is really a compiler issue then, but right now I have no idea.
On the subject of the flooding packets. After many different tcpdumps and device re-cabling have figured out where these packets are coming from. They are originating from android phones and tablets. Unable to determine if this action is normal by design or if those devices have been infected by malware. Those android devices connect by WiFi router that had been bridged to a switch that also has the LAN cable on it. At the switch the unwanted WiFi traffic short cuts to the LAN cable never showing up on the gateway host. Changed the WiFi router cabling layout and enabled the routers firewall adding rules to drop the unwanted traffic so it doesn't exit the WiFi router any more. What a learning experience!
^Triage: with all the changes since FreeBSD 12.0-RC1, this PR is surely OBE.