Bug 210379

Summary: [panic] in6_lltable_dump_entry bcopy page fault
Product: Base System Reporter: Allan Jude <allanjude>
Component: kernAssignee: freebsd-net (Nobody) <net>
Status: Closed FIXED    
Severity: Affects Many People CC: ae, amd64, hrs, markj, melifaro, sbruno
Priority: ---    
Version: CURRENT   
Hardware: amd64   
OS: Any   

Description Allan Jude freebsd_committer 2016-06-18 20:41:34 UTC
My router running 11.0-ALPHA1 r301090 has crashed repeatedly:


It is running net-snmp 5.7.3_11

It routes native v4 and v6, to a lagg(4) with some vlan(4) interfaces

the crash appears to be caused by snmpd calling sysctl to get some data.

I cannot consistently cause the crash, but it has happened 5 times in the last week, sometimes almost immediately upon boot. Anecdotally, it seems to sometimes correlate to when one of the machines behind the router reboots.
Comment 1 Allan Jude freebsd_committer 2016-06-19 18:01:55 UTC
may be related to the change discussed in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208067 which was committed as r297403
Comment 2 Mark Johnston freebsd_committer 2016-06-19 20:08:35 UTC
(In reply to Allan Jude from comment #1)
It's related in that it touched the line of code at which we're crashing, but without the change that bcopy was bogus: lle->ll_addr is a char *, so
bcopy(&lle->ll_addr, LLADDR(sdl), ifp->if_addrlen) just copies the address
of the MAC address into sdl, rather than the MAC address itself. The type
of lle->ll_addr changed in r292978.

Based on the panic, lle->ll_addr is NULL in your case.

Did this start occurring after an update? What was the from-revision?
Comment 3 Allan Jude freebsd_committer 2016-06-19 20:25:53 UTC
sadly, the 'from' version is: r283896: Mon Jun  1 22:08:43 UTC 2015
So literally represents a year worth of changes (the to kernel was Jun 1, 2016)
Comment 4 Mark Johnston freebsd_committer 2016-06-19 20:53:18 UTC
Could you paste the output of ndp -a?

If you're able to create a crash dump, it would be nice to see the output of
"p *lle" from in6_lltable_dump_entry()'s frame.
Comment 5 Andrey V. Elsukov freebsd_committer 2016-06-20 06:53:35 UTC
Recently I have the same panic when I did `ndp -c`.
This is not fresh CURRENT:

commit 3a7d342befa3ff4d0e3ecd5baf88e128a41b636f
Author: pfg <pfg@FreeBSD.org>
Date:   Tue Apr 12 17:23:03 2016 +0000

    Replace 0 with NULL for pointers in misc. device drivers.
    Found with devel/coccinelle.

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address	= 0x0
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80ae80d4
stack pointer	        = 0x28:0xfffffe0233953440
frame pointer	        = 0x28:0xfffffe0233953450
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 93382 (ndp)

(kgdb) bt
#0  doadump (textdump=865414752) at pcpu.h:221
#1  0xffffffff803473b6 in db_fncall (dummy1=<value optimized out>, dummy2=<value optimized out>, dummy3=<value optimized out>, 
    dummy4=<value optimized out>) at /usr/src/sys/ddb/db_command.c:568
#2  0xffffffff80346e59 in db_command (cmd_table=<value optimized out>) at /usr/src/sys/ddb/db_command.c:440
#3  0xffffffff80346bb4 in db_command_loop () at /usr/src/sys/ddb/db_command.c:493
#4  0xffffffff8034968b in db_trap (type=<value optimized out>, code=<value optimized out>) at /usr/src/sys/ddb/db_main.c:251
#5  0xffffffff8078e453 in kdb_trap (type=<value optimized out>, code=<value optimized out>, tf=<value optimized out>)
    at /usr/src/sys/kern/subr_kdb.c:654
#6  0xffffffff80aea591 in trap_fatal (frame=0xfffffe0233953390, eva=0) at /usr/src/sys/amd64/amd64/trap.c:836
#7  0xffffffff80aea7c3 in trap_pfault (frame=0xfffffe0233953390, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:691
#8  0xffffffff80ae9d6c in trap (frame=0xfffffe0233953390) at /usr/src/sys/amd64/amd64/trap.c:442
#9  0xffffffff80acd411 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
#10 0xffffffff80ae80d4 in bcopy () at /usr/src/sys/amd64/amd64/support.S:122
#11 0xffffffff809666fe in in6_lltable_dump_entry (llt=<value optimized out>, lle=0xfffff80173bb2200, wr=0xfffffe0233953858)
    at /usr/src/sys/netinet6/in6.c:2370
#12 0xffffffff80848103 in htable_foreach_lle (llt=<value optimized out>, f=<value optimized out>, farg=<value optimized out>)
    at /usr/src/sys/net/if_llatbl.c:143
#13 0xffffffff80846bad in lltable_sysctl_dumparp (af=<value optimized out>, wr=<value optimized out>) at /usr/src/sys/net/if_llatbl.c:658
#14 0xffffffff808580cb in sysctl_rtsock (oidp=<value optimized out>, arg1=<value optimized out>, arg2=<value optimized out>, req=0xfffffe0233953858)
    at /usr/src/sys/net/rtsock.c:1864
#15 0xffffffff80756301 in sysctl_root_handler_locked (oid=0xffffffff81170638, arg1=0xfffffe0233953928, arg2=4, req=0xfffffe0233953858, 
    tracker=0xfffffe02339537d0) at /usr/src/sys/kern/kern_sysctl.c:165
#16 0xffffffff80755ad6 in sysctl_root (arg1=<value optimized out>, arg2=<value optimized out>) at /usr/src/sys/kern/kern_sysctl.c:1841
#17 0xffffffff80756076 in userland_sysctl (td=<value optimized out>, name=0xfffffe0233953920, namelen=6, old=<value optimized out>, 
    oldlenp=<value optimized out>, inkernel=<value optimized out>, new=<value optimized out>, newlen=<value optimized out>, 
    retval=0xfffffe0233953520, flags=0) at /usr/src/sys/kern/kern_sysctl.c:1944
#18 0xffffffff80755e84 in sys___sysctl (td=0xfffff801c81539a0, uap=0xfffffe0233953a40) at /usr/src/sys/kern/kern_sysctl.c:1871
#19 0xffffffff80aeaf68 in amd64_syscall (td=<value optimized out>, traced=0) at subr_syscall.c:135

(kgdb) f 11
#11 0xffffffff809666fe in in6_lltable_dump_entry (llt=<value optimized out>, lle=0xfffff80173bb2200, wr=0xfffffe0233953858)
    at /usr/src/sys/netinet6/in6.c:2370
2370				bcopy(lle->ll_addr, LLADDR(sdl), ifp->if_addrlen);
(kgdb) p *lle
$1 = {lle_next = {le_next = 0x0, le_prev = 0xfffff800039bab08}, r_l3addr = {addr4 = {s_addr = 2917007613}, addr6 = {__u6_addr = {
        __u6_addr8 = 0xfffff80173bb2210 "�", __u6_addr16 = 0xfffff80173bb2210, __u6_addr32 = 0xfffff80173bb2210}}}, 
  r_linkdata = 0xfffff80173bb2220 "", r_hdrlen = 0 '\0', spare0 = 0xfffff80173bb2239 "", r_flags = 0, r_skip_req = 0, lle_tbl = 0xfffff800039bac00, 
  lle_head = 0xfffff800039bab08, lle_free = 0xffffffff80966920 <in6_lltable_destroy_lle>, la_hold = 0xfffff801d1c0ed00, la_numheld = 0, 
  la_expire = 793804, la_flags = 64, la_asked = 2, la_preempt = 0, ln_state = 0, ln_router = 0, ln_ntick = 0, lle_remtime = 0, lle_hittime = 0, 
  lle_refcnt = 2, ll_addr = 0x0, lle_chain = {le_next = 0x0, le_prev = 0x0}, lle_timer = {c_links = {le = {le_next = 0x0, 
        le_prev = 0xfffffe0000c9d030}, sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0xfffffe0000c9d030}}, c_time = 3409362326052764, 
    c_precision = 268435450, c_arg = 0xfffff80173bb2200, c_func = 0xffffffff80982620 <nd6_llinfo_timer>, c_lock = 0x0, c_flags = 2, c_iflags = 20, 
    c_cpu = 0}, lle_lock = {lock_object = {lo_name = 0xffffffff80e9b1a0 "lle", lo_flags = 90374144, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, 
  req_mtx = {lock_object = {lo_name = 0xffffffff80e9b1a4 "lle req", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}}
(kgdb) p lle->ll_addr
$2 = 0x0
Comment 6 Andrey V. Elsukov freebsd_committer 2016-06-20 06:56:01 UTC
(In reply to Andrey V. Elsukov from comment #5)
> Recently I have the same panic when I did `ndp -c`.

Probably this was not exact `ndp -c`, but just `ndp -a`.
Comment 7 Mark Johnston freebsd_committer 2016-06-20 11:38:02 UTC
(In reply to Andrey V. Elsukov from comment #6)

I think this is a regression from r292978. Before that change, entries created
by nd6_cache_lladdr() without a specified lladdr would have reported an address
consisting of zeroes; now we leave ll_addr set to NULL, which causes the
Comment 8 Andrey V. Elsukov freebsd_committer 2016-06-22 06:37:10 UTC
(In reply to Mark Johnston from comment #7)
> I think this is a regression from r292978. Before that change, entries
> created
> by nd6_cache_lladdr() without a specified lladdr would have reported an
> address
> consisting of zeroes; now we leave ll_addr set to NULL, which causes the
> problem.

It looks like inet6 code also needs to handle LLE_VALID flag like here https://svnweb.freebsd.org/base/head/sys/netinet/in.c?view=markup#l1412
Comment 9 Andrey V. Elsukov freebsd_committer 2016-06-22 06:42:56 UTC
In my coredump lle is for remote address and has only LLE_LINKED flag. So, to reliable trigger such panic we just need incomplete LLE entry and run `ndp -a`.
Comment 10 commit-hook freebsd_committer 2016-06-22 11:29:39 UTC
A commit references this bug:

Author: ae
Date: Wed Jun 22 11:29:22 UTC 2016
New revision: 302081
URL: https://svnweb.freebsd.org/changeset/base/302081

  Fix the NULL pointer dereference for unresolved link layer entries in
  the netinet6 code. Copy link layer address only when corresponding entry
  has LLE_VALID flag.

  PR:		210379
  Approved by:	re (kib)