Bug 193681

Summary: kernel panic: dhclient NULL pointer in ether_output_frame() when using dummynet at layer2
Product: Base System Reporter: Kate <kate>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Some People CC: david.carlier, sbruno
Priority: ---    
Version: CURRENT   
Hardware: mips   
OS: Any   

Description Kate 2014-09-16 11:17:38 UTC
# ifconfig bridge0
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 9e:e1:ad:03:c9:45
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        member: arge1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 6 priority 128 path cost 2000000
        member: arge0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 5 priority 128 path cost 200000
#
#
# sysctl net.link.ether.ipfw=1
net.link.ether.ipfw: 0 -> 1
# sysctl net.link.bridge.ipfw=1
net.link.bridge.ipfw: 0 -> 1
# ipfw add 100 pipe 1 ip from any to any layer2
00100 pipe 1 ip from any to any layer2
# ipfw pipe 1 config
#
#
# dhclient arge0
DHCPDISCOVER on arge0 to 255.255.255.255 port 67 interval 6
Trap cause = 2 (TLB miss (load or instr. fetch) - kernel mode)
[ thread pid 11 tid 100024 ]
Stopped at      ether_output_frame+0x50:        lw      v0,680(s0)
db>
db>
db>
db>
db> bt
Tracing pid 11 tid 100024 td 0x8077b000
db_trace_thread+30 (?,?,?,?) ra c232f58800000018 sp 0 sz 0
80085a94+114 (0,?,ffffffff,?) ra c232f5a000000020 sp 100000000 sz 1
80085170+388 (?,?,?,?) ra c232f5c0000000a8 sp 0 sz 0
db_command_loop+70 (?,?,?,?) ra c232f66800000018 sp 0 sz 0
80087cb0+f4 (?,?,?,?) ra c232f680000001a8 sp 0 sz 0
kdb_trap+110 (?,?,?,?) ra c232f82800000030 sp 0 sz 0
trap+13c0 (?,?,?,?) ra c232f858000000c0 sp 0 sz 0
MipsKernGenException+134 (805823b4,8077b000,40000000,0) ra c232f918000000c8 sp 100000001 sz 1
ether_output_frame+50 (?,80995900,?,?) ra c232f9e000000020 sp 1 sz 0
8039d150+14c (?,?,?,?) ra c232fa0000000030 sp 0 sz 0
dummynet_io+454 (?,?,?,?) ra c232fa3000000038 sp 0 sz 0
ipfw_check_frame+290 (?,?,?,?) ra c232fa6800000140 sp 0 sz 0
pfil_run_hooks+9c (?,?,?,?) ra c232fba800000060 sp 0 sz 0
ether_demux+50 (?,80995900,?,?) ra c232fc0800000028 sp 1 sz 0
802f9814+4e0 (80995900,?,?,?) ra c232fc3000000030 sp 100000000 sz 0
netisr_dispatch_src+f0 (?,?,?,?) ra c232fc6000000038 sp 0 sz 0
netisr_dispatch+14 (?,?,?,?) ra c232fc9800000018 sp 0 sz 0
802f941c+20 (?,?,?,?) ra c232fcb000000018 sp 0 sz 0
802f5d4c+3bc (?,80d74e00,?,?) ra c232fcc800000038 sp 1 sz 0
802f9814+40c (80d74e00,?,?,?) ra c232fd0000000030 sp 100000000 sz 0
netisr_dispatch_src+f0 (?,?,?,?) ra c232fd3000000038 sp 0 sz 0
netisr_dispatch+14 (?,?,?,?) ra c232fd6800000018 sp 0 sz 0
802f941c+20 (?,?,?,?) ra c232fd8000000018 sp 0 sz 0
80450f4c+348 (?,?,?,?) ra c232fd9800000048 sp 0 sz 0
intr_event_execute_handlers+1c0 (?,?,?,?) ra c232fde000000030 sp 0 sz 0
801e7344+b4 (?,?,?,?) ra c232fe1000000048 sp 0 sz 0
fork_exit+a0 (?,?,?,?) ra c232fe5800000028 sp 0 sz 0
fork_trampoline+10 (?,?,?,?) ra c232fe8000000000 sp 0 sz 0
pid 11
db>
db> show reg
at                   0
v0          0x80995900
v1                   0
a0          0x805823b4  pfil_lock
a1          0x8077b000
a2          0x40000000
a3                   0
t0                   0
t1                   0
t2                   0
t3          0x807081e0
t4                   0
t5                   0
t6                   0
t7                   0
s0                   0
s1          0x80995900
s2                   0
s3          0x4b3cc420
s4          0x804913a0  tcprexmtthresh+0x43c
s5                 0x4  _DYNAMIC_LINKING+0x3
s6          0x805820a0  link_pfil_hook
s7          0xc232fc34
t8                0x1c  _DYNAMIC_LINKING+0x1b
t9          0x4055ce10
k0          0xc232fbc0
k1          0x80560120  pcpu_space+0x120
gp                   0
sp          0xc232f9e0
s8          0x808c3074
ra          0x8039d29c  dn_enqueue+0x6bc
sr              0xcc03  _DYNAMIC_LINKING+0xcc02
lo          0x7c389600
hi             0x2916c
bad              0x2a8  _DYNAMIC_LINKING+0x2a7
cs                 0x8  _DYNAMIC_LINKING+0x7
pc          0x802f94e8  ether_output_frame+0x50
ether_output_frame+0x50:        lw      v0,680(s0)
db>

According to kdb, ether_output_frame+0x50 is:

(kgdb) list *ether_output_frame+0x50
0x802f94e8 is in ether_output_frame (/usr/home/kate/src/freebsd/sys/net/if_ethersubr.c:374).

which is:

        return ((ifp->if_transmit)(ifp, m));

I am using a Carambola 2, with 11.0-CURRENT r269949 built using Adrian's freebsd-wifi-build. I've no reason to believe this is specific to mips, but I don't have any other architectures around to test.
Comment 1 Kate 2014-09-16 11:20:33 UTC
In the absence of bridge0, I have a different symptom:

(from a fresh boot)
# ifconfig bridge0 destroy
arge1: promiscuous mode disabled
arge0: promiscuous mode disabled
bridge0: link state changed to DOWN
# ipfw add 100 pipe 1 ip from any to any layer2
00100 pipe 1 ip from any to any layer2
# ipfw pipe 1 config
# dhclient arge0
DHCPREQUEST on arge0 to 255.255.255.255 port 67
Trap cause = 4 (address error (load or I-fetch) - kernel mode)
[ thread pid 394 tid 100054 ]
Stopped at      ipfw_chk+0x840: lw      a0,0(a1)
db>
db>
db>
db> show reg
at          0xffffffe0
v0               0x800  _DYNAMIC_LINKING+0x7ff
v1               0x800  _DYNAMIC_LINKING+0x7ff
a0               0x148  _DYNAMIC_LINKING+0x147
a1          0x80dd800e
a2               0x156  _DYNAMIC_LINKING+0x155
a3          0x80d75400
t0          0x81d08990
t1          0x81d08990
t2                 0x4  _DYNAMIC_LINKING+0x3
t3                0x62  _DYNAMIC_LINKING+0x61
t4          0x4080012c
t5              0x2000  _DYNAMIC_LINKING+0x1fff
t6          0x40c5c000
t7            0x40da28
s0          0x80dd8000
s1          0xc0473bf4
s2                 0x2  _DYNAMIC_LINKING+0x1
s3          0x80790400
s4                   0
s5          0x80790400
s6          0x80d75400
s7          0xc0473aa4
t8                0x1c  _DYNAMIC_LINKING+0x1b
t9          0x405ad580
k0                   0
k1          0x54175305
gp          0x8056f200  _gp
sp          0xc04739c0
s8            0x429064
ra          0x803a61b4  ipfw_check_frame+0x14c
sr              0xfc03  _DYNAMIC_LINKING+0xfc02
lo            0x771ec0
hi                 0x6  _DYNAMIC_LINKING+0x5
bad         0x80dd800e
cs                0x10  _DYNAMIC_LINKING+0xf
pc          0x803a0208  ipfw_chk+0x840
ipfw_chk+0x840: lw      a0,0(a1)
db>

(kgdb) list *ipfw_chk+0x840
0x803a0208 is in ipfw_chk (/usr/home/kate/src/freebsd/sys/netpfil/ipfw/ip_fw2.c:1186).

which is:

1186:        } else if (pktlen >= sizeof(struct ip) &&
1187:            (args->eh == NULL || etype == ETHERTYPE_IP) && ip->ip_v == 4) {
Comment 2 Kate 2014-09-16 11:22:50 UTC
In both cases, I can confirm the presence of dummynet's pipe at layer2 is requisite for producing the symptoms:

#
# sysctl net.link.ether.ipfw=1
net.link.ether.ipfw: 0 -> 1
# sysctl net.link.bridge.ipfw=1
net.link.bridge.ipfw: 0 -> 1
#
# dhclient arge0
DHCPDISCOVER on arge0 to 255.255.255.255 port 67 interval 3
DHCPOFFER from 10.0.0.1
DHCPREQUEST on arge0 to 255.255.255.255 port 67
DHCPACK from 10.0.0.1
bound to 10.0.0.12 -- renewal in 300 seconds.
#
#
# ps auxww|grep dhclient
root  281  1.0  3.1 10644 2004 u0  S     7:17PM 0:00.04 grep dhclient
root  268  0.0  1.8 10428 1180  -  Is    7:16PM 0:00.02 dhclient: arge0 [priv] (dhclient)
_dhcp 276  0.0  1.6 10428 1032  -  Is    7:16PM 0:00.00 dhclient: arge0 (dhclient)
# kill 268
# kill 276
#
# ipfw add 100 pipe 1 ip from any to any layer2
00100 pipe 1 ip from any to any layer2
# ipfw pipe 1 config
# dhclient arge0
DHCPREQUEST on arge0 to 255.255.255.255 port 67
Trap cause = 2 (TLB miss (load or instr. fetch) - kernel mode)
[ thread pid 11 tid 100024 ]
Stopped at      ether_output_frame+0x50:        lw      v0,680(s0)
db>
Comment 3 Kate 2014-09-16 11:27:17 UTC
Without layer2, but with bridge0 present, this symptom also occurs:

# sysctl net.link.ether.ipfw=1
net.link.ether.ipfw: 0 -> 1
# sysctl net.link.bridge.ipfw=1
net.link.bridge.ipfw: 0 -> 1
#
# ipfw add 100 pipe 1 ip from any to any     # note no layer2
00100 pipe 1 ip from any to any
# ipfw pipe 1 config
#
# dhclient arge0
DHCPDISCOVER on arge0 to 255.255.255.255 port 67 interval 4
Trap cause = 2 (TLB miss (load or instr. fetch) - kernel mode)
[ thread pid 11 tid 100024 ]
Stopped at      ether_output_frame+0x50:        lw      v0,680(s0)
db>
Comment 4 Sean Bruno freebsd_committer freebsd_triage 2014-09-16 12:57:27 UTC
This is only for mips right?
Comment 5 Kate 2014-09-16 14:58:55 UTC
(In reply to Sean Bruno from comment #4)
> This is only for mips right?

I've no reason to believe this is specific to mips, but I don't have any other architectures around to test.
Comment 6 David CARLIER 2014-09-20 13:15:08 UTC
was able to reproduce it too, in amd64 arch.

This patch might solve it https://github.com/HardenedBSD/hardenedBSD/commit/4eef3881c64f6e3aa38eebbeaf27a947a5d47dd7
Comment 7 Kate 2014-09-24 18:51:45 UTC
#
#
# ifconfig wlan0 create wlandev ath0
wlan0: Ethernet address: c4:93:00:00:3c:c9
# sysctl net.link.ether.ipfw=1
net.link.ether.ipfw: 0 -> 1
# sysctl net.link.bridge.ipfw=1
net.link.bridge.ipfw: 0 -> 1
# ipfw add 100 pipe 1 ip from any to any layer2
00100 pipe 1 ip from any to any layer2
# ipfw pipe 1 config
# dhclient arge0
DHCPDISCOVER on arge0 to 255.255.255.255 port 67 interval 8
DHCPDISCOVER on arge0 to 255.255.255.255 port 67 interval 21
...

Thanks! The HardenedBSD patch seems good to me. I have no idea if it's the right thing to do or not, but with the patch I'm unable to produce the panic.
Comment 8 commit-hook freebsd_committer freebsd_triage 2014-09-25 02:26:47 UTC
A commit references this bug:

Author: sbruno
Date: Thu Sep 25 02:26:06 UTC 2014
New revision: 272089
URL: http://svnweb.freebsd.org/changeset/base/272089

Log:
  Fix NULL pointer deref in ipfw when using dummynet at layer 2.
  Drop packet if pkg->ifp is NULL, which is the case here.

  ref. https://github.com/HardenedBSD/hardenedBSD
  commit 4eef3881c64f6e3aa38eebbeaf27a947a5d47dd7

  PR 193861 --  DUMMYNET LAYER2: kernel panic

  in this case a kernel panic occurs. Hence, when we do not get an interface,
  we just drop the packet in question.

  PR:		193681
  Submitted by:	David Carlier <david.carlier@hardenedbsd.org>
  Obtained from:	Hardened BSD
  MFC after:	2 weeks
  Relnotes:	yes

Changes:
  head/sys/netpfil/ipfw/ip_dn_io.c