Bug 233341 - 12.0-RC1 i386 vnet does not behave like the amd64 vnet version.
Summary: 12.0-RC1 i386 vnet does not behave like the amd64 vnet version.
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-STABLE
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-net (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-19 21:50 UTC by Joe Barbish
Modified: 2018-12-02 20:55 UTC (History)
1 user (show)

See Also:


Attachments
pflog from host (8.57 KB, text/plain)
2018-11-19 21:50 UTC, Joe Barbish
no flags Details
pflog from vnet jail (20.24 KB, text/plain)
2018-11-19 21:52 UTC, Joe Barbish
no flags Details
tcpdump of LAN i386 12.0-RC1 host (736 bytes, text/plain)
2018-11-21 22:08 UTC, Joe Barbish
no flags Details
pflog test2 (3.01 KB, text/plain)
2018-11-21 22:11 UTC, Joe Barbish
no flags Details
pflog jail test2 (774 bytes, text/plain)
2018-11-21 22:13 UTC, Joe Barbish
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Joe Barbish 2018-11-19 21:50:11 UTC
Created attachment 199362 [details]
pflog from host

symptoms= i386 vnet does not behave like the amd64 vnet version. The i386 version is flooding the host pflog with ipv4 MULTICAST requests and ipv6 Neighborhood requests. The amd64 version doesn't do that. On the i386 system with all the vnet jails stopped and then issuing the shutdown command the system takes a dump only if vnet jails had been started/stopped. This does not happen on a amd64 system. 

Configuration = I386 box running pf firewall with very simple rules that pass
and log all traffic. This I386 box is on private lan so no nat being done. Has vnet jail running pf firewall with very simple rules that pass and log all traffic.

Host config =
  rc.conf 
    ifconfig_xl0="DHCP"
    pf_enable="YES"                
    pflog_enable="YES"                
    pf_rules="/etc/pf.rules.host"    
    pflog_logfile="/var/log/pflog"   
          
  pf.rules.host
    oif = "xl0"
    set block-policy drop 
    set state-policy if-bound 
    set loginterface $oif
    scrub out on $oif all random-id
    scrub reassemble tcp
    set skip on lo0
    pass out log (all) quick
    pass in  log (all) quick

Vnet jail configuration
  rc.conf
    gateway_enable="YES"
    pf_enable="YES"
    pf_rules="/etc/pf.conf"
    pflog_enable="YES"
    pflog_logfile="/var/log/pflog"

  pf.conf
    oif=epair1b
    set block-policy drop
    set fail-policy drop
    set state-policy if-bound
    scrub in on $oif all
    set skip on lo0
    block out log quick on $oif inet proto tcp from any to any port 43
    pass out log (all) quick
    pass in  log (all) quick
         
After the vnet jail is started I see this on the host
ipfconfig
xl0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> 
  options=82009<RXCSUM,VLAN_MTU,WOL_MAGIC,LINKSTATE>
  ether 00:01:02:2f:c3:00
  inet 10.0.10.6 netmask 0xfffffff0 broadcast 10.0.10.15
  media: Ethernet autoselect (100baseTX <full-duplex>)
  status: active
  nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        
pflog0: flags=141<UP,RUNNING,PROMISC> metric 0 mtu 33184
  groups: pflog
        
bridge10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST>
  ether 02:2a:47:08:71:0a
  id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
  maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
  root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
  member: epair1a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
          ifmaxaddr 0 port 7 priority 128 path cost 2000
  member: xl0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
          ifmaxaddr 0 port 2 priority 128 path cost 200000
      groups: bridge
      nd6 options=9<PERFORMNUD,IFDISABLED>
               
epair1a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> 
  options=8<VLAN_MTU>
  ether 02:a0:73:db:2f:0a
  inet6 fe80::a0:73ff:fedb:2f0a%epair1a prefixlen 64 scopeid 0x7
  groups: epair
  media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
  status: active
  nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
         
ps ax
692  -  DL     0:06.87 [pf purge]
1105  -  Is     0:00.00 pflogd: [priv] (pflogd)
1106  -  S      0:00.29 pflogd: [running] -s 116 -i pflog0 
1409  -  IsJ    0:00.01 pflogd: [priv] (pflogd)
1413  -  SJ     0:00.31 pflogd: [running] -s 116 -i pflog0  
1465  -  SsJ    0:00.02 /usr/sbin/syslogd -ss
1521  -  IsJ    0:00.03 /usr/sbin/cron -J 60 -s



After the vnet jail is started I see this on the vnet console
ipfconfig
pflog0: flags=141<UP,RUNNING,PROMISC> metric 0 mtu 33184
  groups: pflog
epair1b: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST>
  options=8<VLAN_MTU>
  ether 02:a0:73:db:2f:0b
  inet 10.0.10.31 netmask 0xff000000 broadcast 10.255.255.255
  inet6 fe80::a0:73ff:fedb:2f0b%epair1b prefixlen 64 scopeid 0x3
  groups: epair
  media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
  status: active
  nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
         
******************************************************

From the vnet console I issue this command.
ping -c 2 freebsd.org
PING freebsd.org (96.47.72.84): 56 data bytes
64 bytes from 96.47.72.84: icmp_seq=0 ttl=46 time=39.367 ms
64 bytes from 96.47.72.84: icmp_seq=1 ttl=46 time=39.096 ms

--- freebsd.org ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 39.096/39.231/39.367/0.135 ms

Then I looked at the pflog on the host and in the vnet jail
to see the ping packets and what I see is a flood of other 
ipv4 and ipv6 packets. The ipv6 packet flood was there in 11.x i386 
and now in 12.0 there is a flood of ipv4 packets. There is a bug report
about the ipv6 packet flood in 11.x. A lot of network resources are 
being consumed making this background noise. Looks like originating
from vimage.

The pflog host report is attached as separate file.
   pflog.txt.bug1.host
Comment 1 Joe Barbish 2018-11-19 21:52:15 UTC
Created attachment 199363 [details]
pflog from vnet jail
Comment 2 Kristof Provost freebsd_committer freebsd_triage 2018-11-20 20:21:35 UTC
(In reply to Joe Barbish from comment #1)
That traffic looks like you've got a device announcing itself using the Simple Service Discovery Protocol.
I'm pretty sure that if you tcpdump on bridge0 on the host you'd see the traffic too. (And that's why the vnet jail sees it.)
That's almost certainly not a bug.

Do you have a backtrace for the panic on shutdown?
Comment 3 Joe Barbish 2018-11-20 22:25:30 UTC
I don't disagree that the traffic looks like a device announcing itself using the Simple Service Discovery Protocol. Thanks for identifying what it is. The problem is it never ends. The log file is flooded until it fills up and then rotates. This is not happening on amd64 system and if I use ipfw on the host and vnet jail this also does't happen. That is the problem.

Cut the following out of the vmcore dump. Hope its what you want.

in6_purgeaddr: err=65, destination address delete failed


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x2f9528c8
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0x18c0f2c0
stack pointer           = 0x28:0x16614b6c
frame pointer           = 0x28:0x16614b74
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (thread taskq)
trap number             = 12
panic: page fault
cpuid = 0
time = 1542749869
KDB: stack backtrace:
#0 0x11080cf at kdb_backtrace+0x4f
#1 0x10bb097 at vpanic+0x147
#2 0x10baf4b at panic+0x1b
#3 0x16910a5 at trap_fatal+0x395
#4 0x16910e3 at trap_pfault+0x33
#5 0x169072f at trap+0x3cf
#6 0xffc0315d at PTDpde+0x4165
#7 0x11b9a06 at if_clone_destroyif+0x1a6
#8 0x11ba17c at if_clone_detach+0xbc
#9 0x18c0f441 at vnet_pflog_uninit+0x21
#10 0x11dfaeb at vnet_destroy+0x10b
#11 0x1083da4 at prison_deref+0x284
#12 0x1119c59 at taskqueue_run_locked+0x189
#13 0x111ae07 at taskqueue_thread_loop+0x97
#14 0x107c731 at fork_exit+0x71
#15 0xffc033ba at PTDpde+0x43c2
Comment 4 Kristof Provost freebsd_committer freebsd_triage 2018-11-21 07:14:25 UTC
(In reply to Joe Barbish from comment #3)
Please check with tcpdump on bridge0/xl0 to see if you actually see that traffic there too.

I'll try to reproduce the panic on shutdown. That does look like a pflog issue.
Comment 5 Kristof Provost freebsd_committer freebsd_triage 2018-11-21 09:01:11 UTC
(In reply to Kristof Provost from comment #4)
To be clear: what I'm suggesting is that you confirm/deny that the traffic is coming in through xl0. If so, then it's entirely expected to see it in the log file. If it keeps coming you'll keep seeing it.
Comment 6 Joe Barbish 2018-11-21 22:08:19 UTC
Created attachment 199436 [details]
tcpdump of LAN i386 12.0-RC1 host
Comment 7 Joe Barbish 2018-11-21 22:10:09 UTC
I got the tcpdump on xl0 and the vnet pflog and the host pflog. I was surprised at the results. None of those flood packets show up. So I had to think about what was different this time.

My production config is the gateway host is 11.2 amd64 and the LAN development host where previous vnet jail testing was done was i386 11.0-RC1. 

To prepare for 12.0 testing I installed 12.0-RC1 on a empty HD which I swapped with the 11.2 hd on the gateway host. I also installed 12.0-RC1 on the LAN development i386 host.  

The previous posted test pflogs are from a config where the gateway host is 12.0-RC1 and the LAN development host is 12.0-RC1. This is the config where the pflogs on the development LAN host are flooded with all those packets

This test I did not swap the gateway host 11.2 HD with the 12.0-RC1 HD, and kept the development LAN host at 12.0-RC1. This resulted in the pflogs on the development host to no longer have the flood packets.

I attached the tcpdump, host pflog, and vnet jail pflog as separate files
Comment 8 Joe Barbish 2018-11-21 22:11:55 UTC
Created attachment 199437 [details]
pflog test2
Comment 9 Joe Barbish 2018-11-21 22:13:08 UTC
Created attachment 199438 [details]
pflog jail test2
Comment 10 Kristof Provost freebsd_committer freebsd_triage 2018-12-01 13:22:09 UTC
A simple kldload pflog / kldunload pflog is sufficient to provoke this on i386, but not on amd64.

The panic happens trying to access V_pflogifs in pflog_clone_destroy():
#15 0x224022c0 in pflog_clone_destroy (ifp=0x22fc7800) at /usr/src/sys/netpfil/pf/if_pflog.c:149
149                     if (V_pflogifs[i] == ifp)

(kgdb) info registers
eax            0xffffffc0       -64
ecx            0x7597b08        123304712
edx            0x2      2
ebx            0x9aa7680        162166400
esp            0x0      0x0
ebp            0x1db26acc       0x1db26acc
esi            0x22fc7800       586971136
edi            0x22fc7800       586971136
eip            0x224022c0       0x224022c0
eflags         0x210246 2163270
cs             0x20     32
ss             0x0      0
ds             0x28     40
es             0x28     40
fs             0x8      8
gs             0x0      0
(kgdb) disassemble
Dump of assembler code for function pflog_clone_destroy:
0x224022a0 <pflog_clone_destroy+0>:     push   %ebp
0x224022a1 <pflog_clone_destroy+1>:     mov    %esp,%ebp
0x224022a3 <pflog_clone_destroy+3>:     push   %esi
0x224022a4 <pflog_clone_destroy+4>:     push   %eax
0x224022a5 <pflog_clone_destroy+5>:     mov    $0xffffffc0,%eax
0x224022aa <pflog_clone_destroy+10>:    mov    0x8(%ebp),%esi
0x224022ad <pflog_clone_destroy+13>:    nop
0x224022ae <pflog_clone_destroy+14>:    nop
0x224022af <pflog_clone_destroy+15>:    nop
0x224022b0 <pflog_clone_destroy+16>:    mov    %fs:0x0,%ecx
0x224022b7 <pflog_clone_destroy+23>:    mov    0x31c(%ecx),%ecx
0x224022bd <pflog_clone_destroy+29>:    mov    0x1c(%ecx),%ecx
0x224022c0 <pflog_clone_destroy+32>:    cmp    %esi,0x22403140(%ecx,%eax,1) <-------------
0x224022c7 <pflog_clone_destroy+39>:    je     0x224022d0 <pflog_clone_destroy+48>
0x224022c9 <pflog_clone_destroy+41>:    add    $0x4,%eax

Strangely, adding a printf("KP: %d\n", i); just before that prevents it from panicking. With that printf() the module unloads just fine. Disassembling that version shows:

        for (i = 0; i < PFLOGIFS_MAX; i++) {
        printf("KP %d\n", i);
    12b0:       89 7c 24 04             mov    %edi,0x4(%esp)
    12b4:       c7 04 24 71 01 00 00    movl   $0x171,(%esp)
    12bb:       e8 fc ff ff ff          call   12bc <pflog_clone_destroy+0x1c>
    12c0:       64 a1 00 00 00 00       mov    %fs:0x0,%eax
                if (V_pflogifs[i] == ifp)
    12c6:       8b 80 1c 03 00 00       mov    0x31c(%eax),%eax
    12cc:       8b 40 1c                mov    0x1c(%eax),%eax
    12cf:       39 b4 b8 00 21 00 00    cmp    %esi,0x2100(%eax,%edi,4)
    12d6:       75 0b                   jne    12e3 <pflog_clone_destroy+0x43>
                        V_pflogifs[i] = NULL;
    12d8:       c7 84 b8 00 21 00 00    movl   $0x0,0x2100(%eax,%edi,4) <-------------------
    12df:       00 00 00 00
static void
pflog_clone_destroy(struct ifnet *ifp)
{
        int i;

        for (i = 0; i < PFLOGIFS_MAX; i++) {
    12e3:       47                      inc    %edi
    12e4:       83 ff 10                cmp    $0x10,%edi
    12e7:       75 c7                   jne    12b0 <pflog_clone_destroy+0x10>
        printf("KP %d\n", i);
                if (V_pflogifs[i] == ifp)
                        V_pflogifs[i] = NULL;

As opposed to the panicking version:
        for (i = 0; i < PFLOGIFS_MAX; i++)
                if (V_pflogifs[i] == ifp)
    12b7:       8b 89 1c 03 00 00       mov    0x31c(%ecx),%ecx
    12bd:       8b 49 1c                mov    0x1c(%ecx),%ecx
    12c0:       39 b4 01 40 21 00 00    cmp    %esi,0x2140(%ecx,%eax,1) <--------------
    12c7:       74 07                   je     12d0 <pflog_clone_destroy+0x30>

It's almost as if there's a compiler issue here. My x86 asm foo is a bit too weak to work out what's supposed to be happening here, and what might be wrong.
Comment 11 Kristof Provost freebsd_committer freebsd_triage 2018-12-02 15:23:14 UTC
Building with gcc doesn't seem to make the problem go away. It seems unlikely that this is really a compiler issue then, but right now I have no idea.
Comment 12 Joe Barbish 2018-12-02 20:55:43 UTC
On the subject of the flooding packets. After many different tcpdumps and device re-cabling have figured out where these packets are coming from. They are originating from android phones and tablets. Unable to determine if this action is normal by design or if those devices have been infected by malware. Those android devices connect by WiFi router that had been bridged to a switch that also has the LAN cable on it. At the switch the unwanted WiFi traffic short cuts to the LAN cable never showing up on the gateway host.

Changed the WiFi router cabling layout and enabled the routers firewall adding rules to drop the unwanted traffic so it doesn't exit the WiFi router any more.

What a learning experience!