Bug 243096 - netgraph ng_nat example causes panic on CURRENT
Summary: netgraph ng_nat example causes panic on CURRENT
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Mark Johnston
URL:
Keywords: panic
Depends on:
Blocks:
 
Reported: 2020-01-04 20:54 UTC by Robert James Hernandez
Modified: 2020-01-14 21:28 UTC (History)
8 users (show)

See Also:


Attachments
core.txt from panic (107.91 KB, text/plain)
2020-01-04 20:54 UTC, Robert James Hernandez
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robert James Hernandez 2020-01-04 20:54:48 UTC
Created attachment 210453 [details]
core.txt from panic

Overview:

Im seeing a consistent panic on CURRENT r356261 when following an example taken from `man ng_nat`:

 The ng_nat node can also be attached directly to the physical interface
 via ng_ether(4) node in the graph.  In the following example, we perform
 masquerading on a Ethernet interface connected to a public network.

           ifconfig igb0 inet x.y.8.35 netmask 0xfffff000
           route add default x.y.0.1
           /usr/sbin/ngctl -f- <<-SEQ
                   mkpeer igb0: nat lower in
                   name igb0:lower igb0_NAT
                   connect igb0: igb0_NAT: upper out
                   msg igb0_NAT: setdlt 1
                   msg igb0_NAT: setaliasaddr x.y.8.35
           SEQ

Im not very familar with netgraph so its been a little difficult for me to investigate and understand what might be wrong with the above example. /usr/share/examples/netgraph didnt seem to have a ng_nat example either. Essentially Im looking to create a NAT with the "wan" side being a physical interface and the "lan" being a bridge.

Steps to Reproduce:

Im testing on a x230 with em0 instead of igb0 and using DHCP:
$ ifconfig em0
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=481249b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,WOL_MAGIC,VLAN_HWFILTER,NOMAP>
        ether 3c:97:0e:21:cf:52
        inet 192.168.88.85 netmask 0xffffff00 broadcast 192.168.88.255
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

$ netstat -rn
Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            192.168.88.1       UGS         em0
127.0.0.1          link#2             UH          lo0
192.168.88.0/24    link#1             U           em0
192.168.88.85      link#1             UHS         lo0

Internet6:
Destination                       Gateway                       Flags     Netif Expire
::/96                             ::1                           UGRS        lo0
::1                               link#2                        UH          lo0
::ffff:0.0.0.0/96                 ::1                           UGRS        lo0
fe80::/10                         ::1                           UGRS        lo0
fe80::%lo0/64                     link#2                        U           lo0
fe80::1%lo0                       link#2                        UHS         lo0
ff02::/16                         ::1                           UGRS        lo0

With the physical interface up the following snippet causes the system to panic (again borrowed from the ng_nat manpage above):

$ /usr/sbin/ngctl -f- <<-SEQ
        mkpeer em0: nat lower in
        name em0:lower em0_NAT
        connect em0: em0_NAT: upper out
        msg em0_NAT: setdlt 1
        msg em0_NAT: setaliasaddr 192.168.88.85
SEQ

Actual Results:

After a few seconds the system then panics:

Unread portion of the kernel message buffer:
panic: ng_nat: ip_len != m_pkthdr.len
cpuid = 3
time = 1578139602
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe004b4bd690
vpanic() at vpanic+0x17e/frame 0xfffffe004b4bd6f0
panic() at panic+0x43/frame 0xfffffe004b4bd750
ng_nat_rcvdata() at ng_nat_rcvdata+0x3d5/frame 0xfffffe004b4bd7a0
ng_apply_item() at ng_apply_item+0xa3/frame 0xfffffe004b4bd820
ng_snd_item() at ng_snd_item+0x2b0/frame 0xfffffe004b4bd860
ng_ether_input() at ng_ether_input+0x4c/frame 0xfffffe004b4bd890
ether_nh_input() at ether_nh_input+0x24a/frame 0xfffffe004b4bd8f0
netisr_dispatch_src() at netisr_dispatch_src+0xb1/frame 0xfffffe004b4bd970
ether_input() at ether_input+0x9d/frame 0xfffffe004b4bd9d0
iflib_rxeof() at iflib_rxeof+0xbcd/frame 0xfffffe004b4bdae0
_task_fn_rx() at _task_fn_rx+0x7d/frame 0xfffffe004b4bdb20
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x155/frame 0xfffffe004b4bdb80
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 0xfffffe004b4bdbb0
fork_exit() at fork_exit+0x80/frame 0xfffffe004b4bdbf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe004b4bdbf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

See core.txt attached from panic for more info

In my testing it seems to be the last command thats the problematic one: msg em0_NAT: setaliasaddr 192.168.88.85

Expected Results:

em0_NAT interface to be available


If theres anything else that would be helpful for me to include just let me know.

System Info:
$ uname -a
FreeBSD test 13.0-CURRENT FreeBSD 13.0-CURRENT #0 r356261: Thu Jan  2 04:59:38 UTC 2020     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64
Comment 1 Kyle Evans freebsd_committer 2020-01-05 02:33:47 UTC
CC'ing Lutz in case he's interested in investigating this one, as he's been doing netgraph stuff lately.
Comment 2 Mark Johnston freebsd_committer 2020-01-08 00:26:44 UTC
I haven't been able to reproduce this on head in a VM with a vtnet interface or a machine using igb, both acting as NFS clients.  Do you know what kind of network traffic your system is seeing?
Comment 3 Mark Johnston freebsd_committer 2020-01-08 00:28:18 UTC
I haven't been able to reproduce this on head in a VM with a vtnet interface or a machine using igb, both acting as NFS clients.  Do you know what kind of network traffic your system is seeing?
Comment 4 Mark Johnston freebsd_committer 2020-01-08 00:47:25 UTC
(In reply to Mark Johnston from comment #3)
Hmm never mind, it just took a little while. :)
Comment 5 Mark Johnston freebsd_committer 2020-01-08 01:16:37 UTC
https://reviews.freebsd.org/D23080
Comment 6 Robert James Hernandez 2020-01-08 03:07:33 UTC
Thanks Mark for the help. Im glad you were able to reproduce this and pinpoint the issue. Ill look into building and testing the patch tonight.
Comment 7 lutz 2020-01-08 11:51:58 UTC
Thank you for joining me in.

It's interesting to see how to derive such an issue from the coredump. IPv6 is not mentioned in the bug report itself and the assert only point to a length mismatch. So I assumed a short (i.e. transmission error) paket from the network.

Good job, Mark!

OTOH, I'd would not panic in this case, because malformed packages can be easily sent to the host from outside, which in turn allows a remote crash of the device.
Comment 8 Mark Johnston freebsd_committer 2020-01-08 15:59:51 UTC
(In reply to lutz from comment #7)
> It's interesting to see how to derive such an issue from the coredump.

Are you asking how I discovered this problem?  In kgdb I did:

(kgdb) p/x *(struct ether_header *)m->data

from the frame where the assertion failure occurred.  This showed an ethertype of 0x86dd after correcting for network byte order.

> OTOH, I'd would not panic in this case, because malformed packages can be easily sent to the host from outside, which in turn allows a remote crash of the device.

Indeed.  The rest of ng_nat_rcvdata() assumes that the packet was validated by the IP stack, but in this configuration that is not happening.  So my patch fixes an obvious bug but there is a larger problem. :(

It looks like r342168 (which introduced the ability to receive packets from ng_ether) is quite incomplete.  Fortunately it was not merged to any stable branches yet.  It should of course be possible to fix this by duplicating some of the validation done in ip_input().  Gleb, Maxim, any thoughts?
Comment 9 Gleb Smirnoff freebsd_committer 2020-01-08 16:56:55 UTC
Originally ng_nat was supposed to be attached only via ipfw. So expecting a cleansed IPv4 traffic. Later mpd learned how to hook it with ng_ppp, again providing cleansed traffic.

Attachment through ng_ether doesn't do that, thus the panic. Either filtering needs to be added to ng_nat, or a ethertype filtering node needs to be written.
Comment 10 lutz 2020-01-08 17:40:12 UTC
I'd suggest to weaken the panics to a silent path-trough of unprocessable packets and update the man page to add the requirements of checking. Do not try to duplicate code.

https://reviews.freebsd.org/D23091
Comment 11 lutz 2020-01-08 17:51:13 UTC
(In reply to lutz from comment #10)
Updating the man page is not necessary. It does not even suggest the use case of this bug report.
Comment 12 Steve Wills freebsd_committer 2020-01-08 21:19:45 UTC
(In reply to lutz from comment #11)
Are you referring to the ng_ether man page or the ng_nat man page?
Comment 13 lutz 2020-01-09 07:52:57 UTC
(In reply to Steve Wills from comment #12)
Only ng_nat, which is the culprit anyway.
Comment 14 Robert James Hernandez 2020-01-09 08:45:57 UTC
I applied both patches and recompiled. I can confirm that I'm no longer seeing a panic for the following configuration:

# /usr/sbin/ngctl -f- <<-SEQ
        mkpeer em0: nat lower in
        name em0:lower em0_NAT
        connect em0: em0_NAT: upper out
        msg em0_NAT: setdlt 1
        msg em0_NAT: setaliasaddr 192.168.88.85
SEQ

# ngctl list -l                                                                                                                                                                                          
There are 10 total nodes:                                                
  Name: em0             Type: ether           ID: 00000001   Num hooks: 2
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  upper           em0_NAT         nat          00000014        out  
  lower           em0_NAT         nat          00000014        in
                                                                         
  Name: ubt0            Type: ubt             ID: 00000002   Num hooks: 1
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  hook            ubt0hci         hci          00000008        drv

  Name: btsock_hci_raw  Type: btsock_hci_raw  ID: 00000003   Num hooks: 1
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  ubt0raw         ubt0hci         hci          00000008        raw

  Name: btsock_l2c_raw  Type: btsock_l2c_raw  ID: 00000004   Num hooks: 1
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  ubt0ctl         ubt0l2cap       l2cap        0000000c        ctl

  Name: btsock_l2c      Type: btsock_l2c      ID: 00000005   Num hooks: 1
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  ubt0l2c         ubt0l2cap       l2cap        0000000c        l2c

  Name: btsock_sco      Type: btsock_sco      ID: 00000006   Num hooks: 0

  Name: ubt0hci         Type: hci             ID: 00000008   Num hooks: 3
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  raw             btsock_hci_raw  btsock_hci_raw 00000003        ubt0raw
  acl             ubt0l2cap       l2cap        0000000c        hci
  drv             ubt0            ubt          00000002        hook

  Name: ubt0l2cap       Type: l2cap           ID: 0000000c   Num hooks: 3
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  l2c             btsock_l2c      btsock_l2c   00000005        ubt0l2c
  ctl             btsock_l2c_raw  btsock_l2c_raw 00000004        ubt0ctl
  hci             ubt0hci         hci          00000008        acl

  Name: em0_NAT         Type: nat             ID: 00000014   Num hooks: 2
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  out             em0             ether        00000001        upper
  in              em0             ether        00000001        lower

  Name: ngctl1064       Type: socket          ID: 00000019   Num hooks: 0

Thanks everyone for the help, this will allow me to start digging into the ng_nat functionality.

Going forward I think we should add an example or two for ng_nat to /usr/share/examples/netgraph since there doesnt seem to be many other references besides the examples from man ng_nat.
Comment 15 Steve Wills freebsd_committer 2020-01-09 12:34:30 UTC
(In reply to lutz from comment #13)
I thought the man page suggested exactly the usage that triggered this bug report:

https://svnweb.freebsd.org/base/head/share/man/man4/ng_nat.4?revision=344482&view=markup#l372
Comment 16 lutz 2020-01-10 17:12:13 UTC
(In reply to Steve Wills from comment #15)
Oh, I used an outdated version of the man page (obviously).

Any idea what to write into the man page?
Malicious packets might occur even in the other use cases.
Comment 17 Robert James Hernandez 2020-01-13 07:18:40 UTC
(In reply to lutz from comment #16)

Taking inspiration from Gleb's earlier comment :

"Originally ng_nat was expected to be attached only via ipfw and later ng_ppp which both cleanse IPv4 traffic. In its current form ng_ether doesn't do that, thus ng_nat will panic if it recieves non IPv4 packets."
Comment 18 lutz 2020-01-13 11:00:06 UTC
Updated  https://reviews.freebsd.org/D23091 with a BUGS section in the man page.
Comment 19 lutz 2020-01-14 21:28:09 UTC
(In reply to lutz from comment #18)
And the addition to the man page is gone again (see Phabricator).