Bug 243096 - netgraph ng_nat example causes panic on CURRENT
Summary: netgraph ng_nat example causes panic on CURRENT
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Mark Johnston
URL:
Keywords: panic
Depends on:
Blocks:
 
Reported: 2020-01-04 20:54 UTC by Robert James Hernandez
Modified: 2020-06-08 14:13 UTC (History)
8 users (show)

See Also:
markj: mfc-stable12+


Attachments
core.txt from panic (107.91 KB, text/plain)
2020-01-04 20:54 UTC, Robert James Hernandez
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robert James Hernandez 2020-01-04 20:54:48 UTC
Created attachment 210453 [details]
core.txt from panic

Overview:

Im seeing a consistent panic on CURRENT r356261 when following an example taken from `man ng_nat`:

 The ng_nat node can also be attached directly to the physical interface
 via ng_ether(4) node in the graph.  In the following example, we perform
 masquerading on a Ethernet interface connected to a public network.

           ifconfig igb0 inet x.y.8.35 netmask 0xfffff000
           route add default x.y.0.1
           /usr/sbin/ngctl -f- <<-SEQ
                   mkpeer igb0: nat lower in
                   name igb0:lower igb0_NAT
                   connect igb0: igb0_NAT: upper out
                   msg igb0_NAT: setdlt 1
                   msg igb0_NAT: setaliasaddr x.y.8.35
           SEQ

Im not very familar with netgraph so its been a little difficult for me to investigate and understand what might be wrong with the above example. /usr/share/examples/netgraph didnt seem to have a ng_nat example either. Essentially Im looking to create a NAT with the "wan" side being a physical interface and the "lan" being a bridge.

Steps to Reproduce:

Im testing on a x230 with em0 instead of igb0 and using DHCP:
$ ifconfig em0
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=481249b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,WOL_MAGIC,VLAN_HWFILTER,NOMAP>
        ether 3c:97:0e:21:cf:52
        inet 192.168.88.85 netmask 0xffffff00 broadcast 192.168.88.255
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

$ netstat -rn
Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            192.168.88.1       UGS         em0
127.0.0.1          link#2             UH          lo0
192.168.88.0/24    link#1             U           em0
192.168.88.85      link#1             UHS         lo0

Internet6:
Destination                       Gateway                       Flags     Netif Expire
::/96                             ::1                           UGRS        lo0
::1                               link#2                        UH          lo0
::ffff:0.0.0.0/96                 ::1                           UGRS        lo0
fe80::/10                         ::1                           UGRS        lo0
fe80::%lo0/64                     link#2                        U           lo0
fe80::1%lo0                       link#2                        UHS         lo0
ff02::/16                         ::1                           UGRS        lo0

With the physical interface up the following snippet causes the system to panic (again borrowed from the ng_nat manpage above):

$ /usr/sbin/ngctl -f- <<-SEQ
        mkpeer em0: nat lower in
        name em0:lower em0_NAT
        connect em0: em0_NAT: upper out
        msg em0_NAT: setdlt 1
        msg em0_NAT: setaliasaddr 192.168.88.85
SEQ

Actual Results:

After a few seconds the system then panics:

Unread portion of the kernel message buffer:
panic: ng_nat: ip_len != m_pkthdr.len
cpuid = 3
time = 1578139602
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe004b4bd690
vpanic() at vpanic+0x17e/frame 0xfffffe004b4bd6f0
panic() at panic+0x43/frame 0xfffffe004b4bd750
ng_nat_rcvdata() at ng_nat_rcvdata+0x3d5/frame 0xfffffe004b4bd7a0
ng_apply_item() at ng_apply_item+0xa3/frame 0xfffffe004b4bd820
ng_snd_item() at ng_snd_item+0x2b0/frame 0xfffffe004b4bd860
ng_ether_input() at ng_ether_input+0x4c/frame 0xfffffe004b4bd890
ether_nh_input() at ether_nh_input+0x24a/frame 0xfffffe004b4bd8f0
netisr_dispatch_src() at netisr_dispatch_src+0xb1/frame 0xfffffe004b4bd970
ether_input() at ether_input+0x9d/frame 0xfffffe004b4bd9d0
iflib_rxeof() at iflib_rxeof+0xbcd/frame 0xfffffe004b4bdae0
_task_fn_rx() at _task_fn_rx+0x7d/frame 0xfffffe004b4bdb20
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x155/frame 0xfffffe004b4bdb80
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 0xfffffe004b4bdbb0
fork_exit() at fork_exit+0x80/frame 0xfffffe004b4bdbf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe004b4bdbf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

See core.txt attached from panic for more info

In my testing it seems to be the last command thats the problematic one: msg em0_NAT: setaliasaddr 192.168.88.85

Expected Results:

em0_NAT interface to be available


If theres anything else that would be helpful for me to include just let me know.

System Info:
$ uname -a
FreeBSD test 13.0-CURRENT FreeBSD 13.0-CURRENT #0 r356261: Thu Jan  2 04:59:38 UTC 2020     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64
Comment 1 Kyle Evans freebsd_committer 2020-01-05 02:33:47 UTC
CC'ing Lutz in case he's interested in investigating this one, as he's been doing netgraph stuff lately.
Comment 2 Mark Johnston freebsd_committer 2020-01-08 00:26:44 UTC
I haven't been able to reproduce this on head in a VM with a vtnet interface or a machine using igb, both acting as NFS clients.  Do you know what kind of network traffic your system is seeing?
Comment 3 Mark Johnston freebsd_committer 2020-01-08 00:28:18 UTC
I haven't been able to reproduce this on head in a VM with a vtnet interface or a machine using igb, both acting as NFS clients.  Do you know what kind of network traffic your system is seeing?
Comment 4 Mark Johnston freebsd_committer 2020-01-08 00:47:25 UTC
(In reply to Mark Johnston from comment #3)
Hmm never mind, it just took a little while. :)
Comment 5 Mark Johnston freebsd_committer 2020-01-08 01:16:37 UTC
https://reviews.freebsd.org/D23080
Comment 6 Robert James Hernandez 2020-01-08 03:07:33 UTC
Thanks Mark for the help. Im glad you were able to reproduce this and pinpoint the issue. Ill look into building and testing the patch tonight.
Comment 7 lutz 2020-01-08 11:51:58 UTC
Thank you for joining me in.

It's interesting to see how to derive such an issue from the coredump. IPv6 is not mentioned in the bug report itself and the assert only point to a length mismatch. So I assumed a short (i.e. transmission error) paket from the network.

Good job, Mark!

OTOH, I'd would not panic in this case, because malformed packages can be easily sent to the host from outside, which in turn allows a remote crash of the device.
Comment 8 Mark Johnston freebsd_committer 2020-01-08 15:59:51 UTC
(In reply to lutz from comment #7)
> It's interesting to see how to derive such an issue from the coredump.

Are you asking how I discovered this problem?  In kgdb I did:

(kgdb) p/x *(struct ether_header *)m->data

from the frame where the assertion failure occurred.  This showed an ethertype of 0x86dd after correcting for network byte order.

> OTOH, I'd would not panic in this case, because malformed packages can be easily sent to the host from outside, which in turn allows a remote crash of the device.

Indeed.  The rest of ng_nat_rcvdata() assumes that the packet was validated by the IP stack, but in this configuration that is not happening.  So my patch fixes an obvious bug but there is a larger problem. :(

It looks like r342168 (which introduced the ability to receive packets from ng_ether) is quite incomplete.  Fortunately it was not merged to any stable branches yet.  It should of course be possible to fix this by duplicating some of the validation done in ip_input().  Gleb, Maxim, any thoughts?
Comment 9 Gleb Smirnoff freebsd_committer 2020-01-08 16:56:55 UTC
Originally ng_nat was supposed to be attached only via ipfw. So expecting a cleansed IPv4 traffic. Later mpd learned how to hook it with ng_ppp, again providing cleansed traffic.

Attachment through ng_ether doesn't do that, thus the panic. Either filtering needs to be added to ng_nat, or a ethertype filtering node needs to be written.
Comment 10 lutz 2020-01-08 17:40:12 UTC
I'd suggest to weaken the panics to a silent path-trough of unprocessable packets and update the man page to add the requirements of checking. Do not try to duplicate code.

https://reviews.freebsd.org/D23091
Comment 11 lutz 2020-01-08 17:51:13 UTC
(In reply to lutz from comment #10)
Updating the man page is not necessary. It does not even suggest the use case of this bug report.
Comment 12 Steve Wills freebsd_committer 2020-01-08 21:19:45 UTC
(In reply to lutz from comment #11)
Are you referring to the ng_ether man page or the ng_nat man page?
Comment 13 lutz 2020-01-09 07:52:57 UTC
(In reply to Steve Wills from comment #12)
Only ng_nat, which is the culprit anyway.
Comment 14 Robert James Hernandez 2020-01-09 08:45:57 UTC
I applied both patches and recompiled. I can confirm that I'm no longer seeing a panic for the following configuration:

# /usr/sbin/ngctl -f- <<-SEQ
        mkpeer em0: nat lower in
        name em0:lower em0_NAT
        connect em0: em0_NAT: upper out
        msg em0_NAT: setdlt 1
        msg em0_NAT: setaliasaddr 192.168.88.85
SEQ

# ngctl list -l                                                                                                                                                                                          
There are 10 total nodes:                                                
  Name: em0             Type: ether           ID: 00000001   Num hooks: 2
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  upper           em0_NAT         nat          00000014        out  
  lower           em0_NAT         nat          00000014        in
                                                                         
  Name: ubt0            Type: ubt             ID: 00000002   Num hooks: 1
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  hook            ubt0hci         hci          00000008        drv

  Name: btsock_hci_raw  Type: btsock_hci_raw  ID: 00000003   Num hooks: 1
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  ubt0raw         ubt0hci         hci          00000008        raw

  Name: btsock_l2c_raw  Type: btsock_l2c_raw  ID: 00000004   Num hooks: 1
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  ubt0ctl         ubt0l2cap       l2cap        0000000c        ctl

  Name: btsock_l2c      Type: btsock_l2c      ID: 00000005   Num hooks: 1
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  ubt0l2c         ubt0l2cap       l2cap        0000000c        l2c

  Name: btsock_sco      Type: btsock_sco      ID: 00000006   Num hooks: 0

  Name: ubt0hci         Type: hci             ID: 00000008   Num hooks: 3
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  raw             btsock_hci_raw  btsock_hci_raw 00000003        ubt0raw
  acl             ubt0l2cap       l2cap        0000000c        hci
  drv             ubt0            ubt          00000002        hook

  Name: ubt0l2cap       Type: l2cap           ID: 0000000c   Num hooks: 3
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  l2c             btsock_l2c      btsock_l2c   00000005        ubt0l2c
  ctl             btsock_l2c_raw  btsock_l2c_raw 00000004        ubt0ctl
  hci             ubt0hci         hci          00000008        acl

  Name: em0_NAT         Type: nat             ID: 00000014   Num hooks: 2
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  out             em0             ether        00000001        upper
  in              em0             ether        00000001        lower

  Name: ngctl1064       Type: socket          ID: 00000019   Num hooks: 0

Thanks everyone for the help, this will allow me to start digging into the ng_nat functionality.

Going forward I think we should add an example or two for ng_nat to /usr/share/examples/netgraph since there doesnt seem to be many other references besides the examples from man ng_nat.
Comment 15 Steve Wills freebsd_committer 2020-01-09 12:34:30 UTC
(In reply to lutz from comment #13)
I thought the man page suggested exactly the usage that triggered this bug report:

https://svnweb.freebsd.org/base/head/share/man/man4/ng_nat.4?revision=344482&view=markup#l372
Comment 16 lutz 2020-01-10 17:12:13 UTC
(In reply to Steve Wills from comment #15)
Oh, I used an outdated version of the man page (obviously).

Any idea what to write into the man page?
Malicious packets might occur even in the other use cases.
Comment 17 Robert James Hernandez 2020-01-13 07:18:40 UTC
(In reply to lutz from comment #16)

Taking inspiration from Gleb's earlier comment :

"Originally ng_nat was expected to be attached only via ipfw and later ng_ppp which both cleanse IPv4 traffic. In its current form ng_ether doesn't do that, thus ng_nat will panic if it recieves non IPv4 packets."
Comment 18 lutz 2020-01-13 11:00:06 UTC
Updated  https://reviews.freebsd.org/D23091 with a BUGS section in the man page.
Comment 19 lutz 2020-01-14 21:28:09 UTC
(In reply to lutz from comment #18)
And the addition to the man page is gone again (see Phabricator).
Comment 20 Robert James Hernandez 2020-01-23 03:07:20 UTC
(In reply to lutz from comment #19)

Its unfortunate that the manpage update was reverted but I can understand why it might not be a good idea to include it.

Regardless, is there anything else that needs to be done on my end? I know the Phabricator changes are still pending review but I can create a follow up bug report for cleansed IPv4 traffic when using ng_ether if that'd be better vs hold up this existing fixes to ng_nat.
Comment 21 lutz 2020-01-23 08:13:40 UTC
(In reply to Robert James Hernandez from comment #20)
You can review the change in Phabricator and then we need to find someone which is willing to commit the change. I think all relevant persons are already named as reviewers of this change.
Comment 22 commit-hook freebsd_committer 2020-01-23 16:45:52 UTC
A commit references this bug:

Author: markj
Date: Thu Jan 23 16:45:49 UTC 2020
New revision: 357053
URL: https://svnweb.freebsd.org/changeset/base/357053

Log:
  ng_nat: Pass IPv6 packets through.

  ng_nat implements NAT for IPv4 traffic only.  When connected to an
  ng_ether node it erroneously handled IPv6 packets as well.

  This change is not sufficient: ng_nat does not do any validation of IP
  packets in this mode, even though they have not yet passed through
  ip_input().

  PR:		243096
  Reported by:	Robert James Hernandez <rob@sarcasticadmin.com>
  Reviewed by:	julian
  Differential Revision:	https://reviews.freebsd.org/D23080

Changes:
  head/share/man/man4/ng_nat.4
  head/sys/netgraph/ng_nat.c
Comment 23 Mark Johnston freebsd_committer 2020-01-23 16:48:06 UTC
(In reply to Robert James Hernandez from comment #20)
I committed the original diff.

The issue of unvalidated IPv4 packets still exists.  https://reviews.freebsd.org/D23091 makes some progress, but as I noted in the review it is not sufficient as-is: it does not validate checksums or IP fragments, etc..
Comment 24 Robert James Hernandez 2020-01-24 02:46:53 UTC
(In reply to lutz from comment #21)

Sounds good, Ill review the change in Phabricator once my account is approved (I hadnt created on until now).
Comment 25 Robert James Hernandez 2020-01-24 02:48:43 UTC
(In reply to Mark Johnston from comment #23)

Ok, understood. I just wasnt sure if this was something that could be done as a followup to this bug or needed to be handled in this revision.
Comment 26 Kurt Jaeger freebsd_committer 2020-01-24 06:18:45 UTC
(In reply to Robert James Hernandez from comment #24)
phab account approved.
Comment 27 Robert James Hernandez 2020-01-25 17:40:43 UTC
(In reply to Kurt Jaeger from comment #26)

Thanks Kurt! I just commented in Phab.
Comment 28 commit-hook freebsd_committer 2020-02-12 00:31:15 UTC
A commit references this bug:

Author: eugen
Date: Wed Feb 12 00:31:01 UTC 2020
New revision: 357786
URL: https://svnweb.freebsd.org/changeset/base/357786

Log:
  ng_nat: avoid panic if attached directly to ng_ether and got short packet

  From the beginning, ng_nat safely assumed cleansed traffic
  because of limited ways it could be attached to NETGRAPH:
  ng_ipfw or ng_ppp only.

  Now as it may be attached with ng_ether too, the assumption proven wrong.
  Add needed check to the ng_nat. Thanks for markj for debugging this.

  PR:		243096
  Submitted by:	Lutz Donnerhacke <lutz@donnerhacke.de>
  Reported by:	Robert James Hernandez <rob@sarcasticadmin.com>
  Reviewed by:	markj and others
  MFC after:	1 week
  Differential Revision:	https://reviews.freebsd.org/D23091

Changes:
  head/sys/netgraph/ng_nat.c
Comment 29 lutz 2020-02-12 10:30:41 UTC
This bug ticket can be closed now.
I don't know how to do it myself.
Comment 30 Kurt Jaeger freebsd_committer 2020-02-12 11:03:38 UTC
(In reply to lutz from comment #29)
mark probably wants to wait for the MFC in about a week.
Comment 31 commit-hook freebsd_committer 2020-04-07 17:49:46 UTC
A commit references this bug:

Author: eugen
Date: Tue Apr  7 17:49:37 UTC 2020
New revision: 359707
URL: https://svnweb.freebsd.org/changeset/base/359707

Log:
  MFC r357053 by markj: ng_nat: Pass IPv6 packets through.

  ng_nat implements NAT for IPv4 traffic only.  When connected to an
  ng_ether node it erroneously handled IPv6 packets as well.

  This change is not sufficient: ng_nat does not do any validation of IP
  packets in this mode, even though they have not yet passed through
  ip_input().

  PR:		243096
  Reported by:	Robert James Hernandez <rob@sarcasticadmin.com>
  Reviewed by:	julian
  Differential Revision:	https://reviews.freebsd.org/D23080

Changes:
_U  stable/12/
  stable/12/share/man/man4/ng_nat.4
  stable/12/sys/netgraph/ng_nat.c
Comment 32 commit-hook freebsd_committer 2020-04-07 17:50:49 UTC
A commit references this bug:

Author: eugen
Date: Tue Apr  7 17:50:18 UTC 2020
New revision: 359708
URL: https://svnweb.freebsd.org/changeset/base/359708

Log:
  MFC r357053 by markj: ng_nat: Pass IPv6 packets through.

  ng_nat implements NAT for IPv4 traffic only.  When connected to an
  ng_ether node it erroneously handled IPv6 packets as well.

  This change is not sufficient: ng_nat does not do any validation of IP
  packets in this mode, even though they have not yet passed through
  ip_input().

  PR:		243096
  Reported by:	Robert James Hernandez <rob@sarcasticadmin.com>
  Reviewed by:	julian
  Differential Revision:	https://reviews.freebsd.org/D23080

Changes:
_U  stable/11/
  stable/11/share/man/man4/ng_nat.4
  stable/11/sys/netgraph/ng_nat.c
Comment 33 Mark Johnston freebsd_committer 2020-06-08 14:13:42 UTC
The changes were merged to stable/12.  Thanks for the report!