Created attachment 210453 [details] core.txt from panic Overview: Im seeing a consistent panic on CURRENT r356261 when following an example taken from `man ng_nat`: The ng_nat node can also be attached directly to the physical interface via ng_ether(4) node in the graph. In the following example, we perform masquerading on a Ethernet interface connected to a public network. ifconfig igb0 inet x.y.8.35 netmask 0xfffff000 route add default x.y.0.1 /usr/sbin/ngctl -f- <<-SEQ mkpeer igb0: nat lower in name igb0:lower igb0_NAT connect igb0: igb0_NAT: upper out msg igb0_NAT: setdlt 1 msg igb0_NAT: setaliasaddr x.y.8.35 SEQ Im not very familar with netgraph so its been a little difficult for me to investigate and understand what might be wrong with the above example. /usr/share/examples/netgraph didnt seem to have a ng_nat example either. Essentially Im looking to create a NAT with the "wan" side being a physical interface and the "lan" being a bridge. Steps to Reproduce: Im testing on a x230 with em0 instead of igb0 and using DHCP: $ ifconfig em0 em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=481249b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,WOL_MAGIC,VLAN_HWFILTER,NOMAP> ether 3c:97:0e:21:cf:52 inet 192.168.88.85 netmask 0xffffff00 broadcast 192.168.88.255 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> $ netstat -rn Routing tables Internet: Destination Gateway Flags Netif Expire default 192.168.88.1 UGS em0 127.0.0.1 link#2 UH lo0 192.168.88.0/24 link#1 U em0 192.168.88.85 link#1 UHS lo0 Internet6: Destination Gateway Flags Netif Expire ::/96 ::1 UGRS lo0 ::1 link#2 UH lo0 ::ffff:0.0.0.0/96 ::1 UGRS lo0 fe80::/10 ::1 UGRS lo0 fe80::%lo0/64 link#2 U lo0 fe80::1%lo0 link#2 UHS lo0 ff02::/16 ::1 UGRS lo0 With the physical interface up the following snippet causes the system to panic (again borrowed from the ng_nat manpage above): $ /usr/sbin/ngctl -f- <<-SEQ mkpeer em0: nat lower in name em0:lower em0_NAT connect em0: em0_NAT: upper out msg em0_NAT: setdlt 1 msg em0_NAT: setaliasaddr 192.168.88.85 SEQ Actual Results: After a few seconds the system then panics: Unread portion of the kernel message buffer: panic: ng_nat: ip_len != m_pkthdr.len cpuid = 3 time = 1578139602 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe004b4bd690 vpanic() at vpanic+0x17e/frame 0xfffffe004b4bd6f0 panic() at panic+0x43/frame 0xfffffe004b4bd750 ng_nat_rcvdata() at ng_nat_rcvdata+0x3d5/frame 0xfffffe004b4bd7a0 ng_apply_item() at ng_apply_item+0xa3/frame 0xfffffe004b4bd820 ng_snd_item() at ng_snd_item+0x2b0/frame 0xfffffe004b4bd860 ng_ether_input() at ng_ether_input+0x4c/frame 0xfffffe004b4bd890 ether_nh_input() at ether_nh_input+0x24a/frame 0xfffffe004b4bd8f0 netisr_dispatch_src() at netisr_dispatch_src+0xb1/frame 0xfffffe004b4bd970 ether_input() at ether_input+0x9d/frame 0xfffffe004b4bd9d0 iflib_rxeof() at iflib_rxeof+0xbcd/frame 0xfffffe004b4bdae0 _task_fn_rx() at _task_fn_rx+0x7d/frame 0xfffffe004b4bdb20 gtaskqueue_run_locked() at gtaskqueue_run_locked+0x155/frame 0xfffffe004b4bdb80 gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 0xfffffe004b4bdbb0 fork_exit() at fork_exit+0x80/frame 0xfffffe004b4bdbf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe004b4bdbf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic See core.txt attached from panic for more info In my testing it seems to be the last command thats the problematic one: msg em0_NAT: setaliasaddr 192.168.88.85 Expected Results: em0_NAT interface to be available If theres anything else that would be helpful for me to include just let me know. System Info: $ uname -a FreeBSD test 13.0-CURRENT FreeBSD 13.0-CURRENT #0 r356261: Thu Jan 2 04:59:38 UTC 2020 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
CC'ing Lutz in case he's interested in investigating this one, as he's been doing netgraph stuff lately.
I haven't been able to reproduce this on head in a VM with a vtnet interface or a machine using igb, both acting as NFS clients. Do you know what kind of network traffic your system is seeing?
(In reply to Mark Johnston from comment #3) Hmm never mind, it just took a little while. :)
https://reviews.freebsd.org/D23080
Thanks Mark for the help. Im glad you were able to reproduce this and pinpoint the issue. Ill look into building and testing the patch tonight.
Thank you for joining me in. It's interesting to see how to derive such an issue from the coredump. IPv6 is not mentioned in the bug report itself and the assert only point to a length mismatch. So I assumed a short (i.e. transmission error) paket from the network. Good job, Mark! OTOH, I'd would not panic in this case, because malformed packages can be easily sent to the host from outside, which in turn allows a remote crash of the device.
(In reply to lutz from comment #7) > It's interesting to see how to derive such an issue from the coredump. Are you asking how I discovered this problem? In kgdb I did: (kgdb) p/x *(struct ether_header *)m->data from the frame where the assertion failure occurred. This showed an ethertype of 0x86dd after correcting for network byte order. > OTOH, I'd would not panic in this case, because malformed packages can be easily sent to the host from outside, which in turn allows a remote crash of the device. Indeed. The rest of ng_nat_rcvdata() assumes that the packet was validated by the IP stack, but in this configuration that is not happening. So my patch fixes an obvious bug but there is a larger problem. :( It looks like r342168 (which introduced the ability to receive packets from ng_ether) is quite incomplete. Fortunately it was not merged to any stable branches yet. It should of course be possible to fix this by duplicating some of the validation done in ip_input(). Gleb, Maxim, any thoughts?
Originally ng_nat was supposed to be attached only via ipfw. So expecting a cleansed IPv4 traffic. Later mpd learned how to hook it with ng_ppp, again providing cleansed traffic. Attachment through ng_ether doesn't do that, thus the panic. Either filtering needs to be added to ng_nat, or a ethertype filtering node needs to be written.
I'd suggest to weaken the panics to a silent path-trough of unprocessable packets and update the man page to add the requirements of checking. Do not try to duplicate code. https://reviews.freebsd.org/D23091
(In reply to lutz from comment #10) Updating the man page is not necessary. It does not even suggest the use case of this bug report.
(In reply to lutz from comment #11) Are you referring to the ng_ether man page or the ng_nat man page?
(In reply to Steve Wills from comment #12) Only ng_nat, which is the culprit anyway.
I applied both patches and recompiled. I can confirm that I'm no longer seeing a panic for the following configuration: # /usr/sbin/ngctl -f- <<-SEQ mkpeer em0: nat lower in name em0:lower em0_NAT connect em0: em0_NAT: upper out msg em0_NAT: setdlt 1 msg em0_NAT: setaliasaddr 192.168.88.85 SEQ # ngctl list -l There are 10 total nodes: Name: em0 Type: ether ID: 00000001 Num hooks: 2 Local hook Peer name Peer type Peer ID Peer hook ---------- --------- --------- ------- --------- upper em0_NAT nat 00000014 out lower em0_NAT nat 00000014 in Name: ubt0 Type: ubt ID: 00000002 Num hooks: 1 Local hook Peer name Peer type Peer ID Peer hook ---------- --------- --------- ------- --------- hook ubt0hci hci 00000008 drv Name: btsock_hci_raw Type: btsock_hci_raw ID: 00000003 Num hooks: 1 Local hook Peer name Peer type Peer ID Peer hook ---------- --------- --------- ------- --------- ubt0raw ubt0hci hci 00000008 raw Name: btsock_l2c_raw Type: btsock_l2c_raw ID: 00000004 Num hooks: 1 Local hook Peer name Peer type Peer ID Peer hook ---------- --------- --------- ------- --------- ubt0ctl ubt0l2cap l2cap 0000000c ctl Name: btsock_l2c Type: btsock_l2c ID: 00000005 Num hooks: 1 Local hook Peer name Peer type Peer ID Peer hook ---------- --------- --------- ------- --------- ubt0l2c ubt0l2cap l2cap 0000000c l2c Name: btsock_sco Type: btsock_sco ID: 00000006 Num hooks: 0 Name: ubt0hci Type: hci ID: 00000008 Num hooks: 3 Local hook Peer name Peer type Peer ID Peer hook ---------- --------- --------- ------- --------- raw btsock_hci_raw btsock_hci_raw 00000003 ubt0raw acl ubt0l2cap l2cap 0000000c hci drv ubt0 ubt 00000002 hook Name: ubt0l2cap Type: l2cap ID: 0000000c Num hooks: 3 Local hook Peer name Peer type Peer ID Peer hook ---------- --------- --------- ------- --------- l2c btsock_l2c btsock_l2c 00000005 ubt0l2c ctl btsock_l2c_raw btsock_l2c_raw 00000004 ubt0ctl hci ubt0hci hci 00000008 acl Name: em0_NAT Type: nat ID: 00000014 Num hooks: 2 Local hook Peer name Peer type Peer ID Peer hook ---------- --------- --------- ------- --------- out em0 ether 00000001 upper in em0 ether 00000001 lower Name: ngctl1064 Type: socket ID: 00000019 Num hooks: 0 Thanks everyone for the help, this will allow me to start digging into the ng_nat functionality. Going forward I think we should add an example or two for ng_nat to /usr/share/examples/netgraph since there doesnt seem to be many other references besides the examples from man ng_nat.
(In reply to lutz from comment #13) I thought the man page suggested exactly the usage that triggered this bug report: https://svnweb.freebsd.org/base/head/share/man/man4/ng_nat.4?revision=344482&view=markup#l372
(In reply to Steve Wills from comment #15) Oh, I used an outdated version of the man page (obviously). Any idea what to write into the man page? Malicious packets might occur even in the other use cases.
(In reply to lutz from comment #16) Taking inspiration from Gleb's earlier comment : "Originally ng_nat was expected to be attached only via ipfw and later ng_ppp which both cleanse IPv4 traffic. In its current form ng_ether doesn't do that, thus ng_nat will panic if it recieves non IPv4 packets."
Updated https://reviews.freebsd.org/D23091 with a BUGS section in the man page.
(In reply to lutz from comment #18) And the addition to the man page is gone again (see Phabricator).
(In reply to lutz from comment #19) Its unfortunate that the manpage update was reverted but I can understand why it might not be a good idea to include it. Regardless, is there anything else that needs to be done on my end? I know the Phabricator changes are still pending review but I can create a follow up bug report for cleansed IPv4 traffic when using ng_ether if that'd be better vs hold up this existing fixes to ng_nat.
(In reply to Robert James Hernandez from comment #20) You can review the change in Phabricator and then we need to find someone which is willing to commit the change. I think all relevant persons are already named as reviewers of this change.
A commit references this bug: Author: markj Date: Thu Jan 23 16:45:49 UTC 2020 New revision: 357053 URL: https://svnweb.freebsd.org/changeset/base/357053 Log: ng_nat: Pass IPv6 packets through. ng_nat implements NAT for IPv4 traffic only. When connected to an ng_ether node it erroneously handled IPv6 packets as well. This change is not sufficient: ng_nat does not do any validation of IP packets in this mode, even though they have not yet passed through ip_input(). PR: 243096 Reported by: Robert James Hernandez <rob@sarcasticadmin.com> Reviewed by: julian Differential Revision: https://reviews.freebsd.org/D23080 Changes: head/share/man/man4/ng_nat.4 head/sys/netgraph/ng_nat.c
(In reply to Robert James Hernandez from comment #20) I committed the original diff. The issue of unvalidated IPv4 packets still exists. https://reviews.freebsd.org/D23091 makes some progress, but as I noted in the review it is not sufficient as-is: it does not validate checksums or IP fragments, etc..
(In reply to lutz from comment #21) Sounds good, Ill review the change in Phabricator once my account is approved (I hadnt created on until now).
(In reply to Mark Johnston from comment #23) Ok, understood. I just wasnt sure if this was something that could be done as a followup to this bug or needed to be handled in this revision.
(In reply to Robert James Hernandez from comment #24) phab account approved.
(In reply to Kurt Jaeger from comment #26) Thanks Kurt! I just commented in Phab.
A commit references this bug: Author: eugen Date: Wed Feb 12 00:31:01 UTC 2020 New revision: 357786 URL: https://svnweb.freebsd.org/changeset/base/357786 Log: ng_nat: avoid panic if attached directly to ng_ether and got short packet From the beginning, ng_nat safely assumed cleansed traffic because of limited ways it could be attached to NETGRAPH: ng_ipfw or ng_ppp only. Now as it may be attached with ng_ether too, the assumption proven wrong. Add needed check to the ng_nat. Thanks for markj for debugging this. PR: 243096 Submitted by: Lutz Donnerhacke <lutz@donnerhacke.de> Reported by: Robert James Hernandez <rob@sarcasticadmin.com> Reviewed by: markj and others MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23091 Changes: head/sys/netgraph/ng_nat.c
This bug ticket can be closed now. I don't know how to do it myself.
(In reply to lutz from comment #29) mark probably wants to wait for the MFC in about a week.
A commit references this bug: Author: eugen Date: Tue Apr 7 17:49:37 UTC 2020 New revision: 359707 URL: https://svnweb.freebsd.org/changeset/base/359707 Log: MFC r357053 by markj: ng_nat: Pass IPv6 packets through. ng_nat implements NAT for IPv4 traffic only. When connected to an ng_ether node it erroneously handled IPv6 packets as well. This change is not sufficient: ng_nat does not do any validation of IP packets in this mode, even though they have not yet passed through ip_input(). PR: 243096 Reported by: Robert James Hernandez <rob@sarcasticadmin.com> Reviewed by: julian Differential Revision: https://reviews.freebsd.org/D23080 Changes: _U stable/12/ stable/12/share/man/man4/ng_nat.4 stable/12/sys/netgraph/ng_nat.c
A commit references this bug: Author: eugen Date: Tue Apr 7 17:50:18 UTC 2020 New revision: 359708 URL: https://svnweb.freebsd.org/changeset/base/359708 Log: MFC r357053 by markj: ng_nat: Pass IPv6 packets through. ng_nat implements NAT for IPv4 traffic only. When connected to an ng_ether node it erroneously handled IPv6 packets as well. This change is not sufficient: ng_nat does not do any validation of IP packets in this mode, even though they have not yet passed through ip_input(). PR: 243096 Reported by: Robert James Hernandez <rob@sarcasticadmin.com> Reviewed by: julian Differential Revision: https://reviews.freebsd.org/D23080 Changes: _U stable/11/ stable/11/share/man/man4/ng_nat.4 stable/11/sys/netgraph/ng_nat.c
The changes were merged to stable/12. Thanks for the report!