Bug 246706 - [netgraph] kernel panic due to corrupted memory
Summary: [netgraph] kernel panic due to corrupted memory
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.3-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
Keywords: panic
Depends on:
Reported: 2020-05-24 21:22 UTC by Eugene Grosbein
Modified: 2021-05-15 10:02 UTC (History)
7 users (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Eugene Grosbein freebsd_committer 2020-05-24 21:22:10 UTC
I run multiple routers using FreeBSD 11.3-STABLE/amd64 355108 and net/mpd5 daemon that dynamically creates/destroys ngXXX interfaces for multiple PPPoE clients. Routers have ECC memory.

Since 11.1-RELEASE, the kernel was running it rock stable over 2 years until yesterday one of routers paniced inside NETGRAPH code producing usable crashdump and I have kernel.debug.

The server sends its logs to remote syslog collector and latest line sent before panic was "Accepting PPPoE connection" produced by PppoeListenEvent() function of mpd5 code: https://sourceforge.net/p/mpd/svn/2239/tree/trunk/src/pppoe.c#l1356

Then mpd5 continued executing the function PppoeListenEvent() but an attempt to create ng_tee(4) node and connect it to ng_pppoe(4) by sending NGM_MKPEER message resulted in kernel panic. Note that stock gdb 6.1.1 shows backtrace incorrectly so I use gdb 9.1:

Reading symbols from /data/crash/PPPOE11/kernel.debug...

Unread portion of the kernel message buffer:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x40
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80624dc0
stack pointer           = 0x28:0xfffffe012499f6d0
frame pointer           = 0x28:0xfffffe012499f700
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 2576 (mpd5)
trap number             = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff802fda6b = db_trace_self_wrapper+0x2b/frame 0xfffffe012499f380
vpanic() at 0xffffffff804f5e2e = vpanic+0x17e/frame 0xfffffe012499f3e0
panic() at 0xffffffff804f5ca3 = panic+0x43/frame 0xfffffe012499f440
trap_pfault() at 0xffffffff80778540 = trap_pfault/frame 0xfffffe012499f490
trap_pfault() at 0xffffffff80778589 = trap_pfault+0x49/frame 0xfffffe012499f4f0
trap() at 0xffffffff80777c1d = trap+0x29d/frame 0xfffffe012499f600
calltrap() at 0xffffffff80758983 = calltrap+0x8/frame 0xfffffe012499f600
--- trap 0xc, rip = 0xffffffff80624dc0, rsp = 0xfffffe012499f6d0, rbp = 0xfffffe012499f700 ---
ng_add_hook() at 0xffffffff80624dc0 = ng_add_hook+0x20/frame 0xfffffe012499f700
ng_mkpeer() at 0xffffffff80624a0c = ng_mkpeer+0x6c/frame 0xfffffe012499f750
ng_apply_item() at 0xffffffff80622d7f = ng_apply_item+0x3ef/frame 0xfffffe012499f7d0
ng_snd_item() at 0xffffffff8062278e = ng_snd_item+0x17e/frame 0xfffffe012499f800
ngc_send() at 0xffffffff806329b3 = ngc_send+0x1a3/frame 0xfffffe012499f8a0
sosend_generic() at 0xffffffff805868ea = sosend_generic+0x4fa/frame 0xfffffe012499f950
kern_sendit() at 0xffffffff8058d246 = kern_sendit+0x286/frame 0xfffffe012499fa10
sendit() at 0xffffffff8058d591 = sendit+0x191/frame 0xfffffe012499fa70
sys_sendto() at 0xffffffff8058d3ed = sys_sendto+0x4d/frame 0xfffffe012499fac0
amd64_syscall() at 0xffffffff80778f18 = amd64_syscall+0x378/frame 0xfffffe012499fbf0
fast_syscall_common() at 0xffffffff80759290 = fast_syscall_common+0x101/frame 0xfffffe012499fbf0
--- syscall (133, FreeBSD ELF64, sys_sendto), rip = 0x80279378a, rsp = 0x7fffdfffda08, rbp = 0x7fffdfffda50 ---
Uptime: 64d17h37m40s
Dumping 457 out of 4073 MB:..4%..11%..22%..32%..43%..53%..64%..71%..81%..92%

__curthread () at ./machine/pcpu.h:234
234             __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) bt
#0  __curthread () at ./machine/pcpu.h:234
#1  doadump (textdump=1) at /home/src/sys/kern/kern_shutdown.c:320
#2  0xffffffff804f5a1d in kern_reboot (howto=260) at /home/src/sys/kern/kern_shutdown.c:388
#3  0xffffffff804f5e68 in vpanic (fmt=<optimized out>, ap=0xfffffe012499f420)
    at /home/src/sys/kern/kern_shutdown.c:784
#4  0xffffffff804f5ca3 in panic (fmt=<unavailable>) at /home/src/sys/kern/kern_shutdown.c:715
#5  0xffffffff80778540 in trap_fatal (frame=0xfffffe012499f610, eva=64)
    at /home/src/sys/amd64/amd64/trap.c:899
#6  0xffffffff80778589 in trap_pfault (frame=0xfffffe012499f610, usermode=0)
    at /home/src/sys/amd64/amd64/trap.c:744
#7  0xffffffff80777c1d in trap (frame=0xfffffe012499f610) at /home/src/sys/amd64/amd64/trap.c:438
#8  <signal handler called>
#9  0xffffffff80624dc0 in ng_findhook (node=0xfffff80092840600,
    name=0xfffff800921e9978 "left2right") at /home/src/sys/netgraph/ng_base.c:1128
#10 ng_add_hook (node=0xfffff80092840600, name=0xfffff800921e9978 "left2right",
    hookp=0xfffffe012499f728) at /home/src/sys/netgraph/ng_base.c:1073
#11 0xffffffff80624a0c in ng_mkpeer (node=0xfffff8004f15fe00, name=<optimized out>,
    name2=0xfffff800921e9978 "left2right", type=<optimized out>)
    at /home/src/sys/netgraph/ng_base.c:1555
#12 0xffffffff80622d7f in ng_generic_msg (here=0xfffff8004f15fe00, item=<optimized out>,
    lasthook=<optimized out>) at /home/src/sys/netgraph/ng_base.c:2537
#13 ng_apply_item (node=0xfffff8004f15fe00, item=0xfffff800423b5c00, rw=1)
    at /home/src/sys/netgraph/ng_base.c:2437
#14 0xffffffff8062278e in ng_snd_item (item=0xfffff800423b5c00, flags=0)
    at /home/src/sys/netgraph/ng_base.c:2320
#15 0xffffffff806329b3 in ngc_send (so=<optimized out>, flags=<optimized out>,
    m=0xfffff80006d01000, addr=<optimized out>, control=<optimized out>, td=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--
    at /home/src/sys/netgraph/ng_socket.c:338
#16 0xffffffff805868ea in sosend_generic (so=0xfffff80006c0da38, addr=0xfffff8004f6da9f0,
    uio=0xfffffe012499f980, top=0xfffff80006d01000, control=<optimized out>,
    flags=<optimized out>, td=0xfffff8004f560000) at /home/src/sys/kern/uipc_socket.c:1360
#17 0xffffffff8058d246 in kern_sendit (td=<optimized out>, s=2, mp=<optimized out>, flags=0,
    control=0x0, segflg=UIO_USERSPACE) at /home/src/sys/kern/uipc_syscalls.c:884
#18 0xffffffff8058d591 in sendit (td=0xfffff8004f560000, s=2, mp=0xfffffe012499fa80, flags=-1)
    at /home/src/sys/kern/uipc_syscalls.c:804
#19 0xffffffff8058d3ed in sys_sendto (td=0xfffff80092840600, uap=<optimized out>)
    at /home/src/sys/kern/uipc_syscalls.c:935
#20 0xffffffff80778f18 in syscallenter (td=0xfffff8004f560000)
    at /home/src/sys/amd64/amd64/../../kern/subr_syscall.c:132
#21 amd64_syscall (td=0xfffff8004f560000, traced=0) at /home/src/sys/amd64/amd64/trap.c:1014
#22 <signal handler called>
#23 0x000000080279378a in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdfffda08

Note that "node" structure seems to be corrupted to the moment of panic:

(kgdb) frame 12
#12 0xffffffff80622d7f in ng_generic_msg (here=0xfffff8004f15fe00, item=<optimized out>,
    lasthook=<optimized out>) at /home/src/sys/netgraph/ng_base.c:2537
2537                    error = ng_mkpeer(here, mkp->ourhook, mkp->peerhook, mkp->type);
(kgdb) p *mkp
$1 = {type = "l858", '\000' <repeats 27 times>,
  ourhook = "Ю-$O\000ЬЪЪ\000\000\000\000\000\000\000\000\000кj\222\000ЬЪЪ\000Ч\025O\000ЬЪЪ",
  peerhook = "\200]\a\222\000ЬЪЪюё\n\222\000ЬЪЪ", '\000' <repeats 15 times>}
(kgdb) frame 10
#10 ng_add_hook (node=0xfffff80092840600, name=0xfffff800921e9978 "left2right",
    hookp=0xfffffe012499f728) at /home/src/sys/netgraph/ng_base.c:1073
1073            if (ng_findhook(node, name) != NULL) {
(kgdb) p *node
$2 = {nd_name = '\000' <repeats 31 times>, nd_type = 0x0, nd_flags = 0, nd_numhooks = 0,
  nd_private = 0xfffff80092840600, nd_ID = 0, nd_hooks = {lh_first = 0x0}, nd_nodes = {
    le_next = 0x0, le_prev = 0x0}, nd_idnodes = {le_next = 0x0, le_prev = 0x0}, nd_input_queue = {
    q_flags = 0, q_flags2 = 0, q_mtx = {lock_object = {lo_name = 0x0, lo_flags = 0, lo_data = 0,
        lo_witness = 0x0}, mtx_lock = 0}, q_work = {stqe_next = 0x0}, queue = {stqh_first = 0x0,
      stqh_last = 0x0}}, nd_refs = 0, nd_vnet = 0x0}
(kgdb) frame 9
#9  0xffffffff80624dc0 in ng_findhook (node=0xfffff80092840600,
    name=0xfffff800921e9978 "left2right") at /home/src/sys/netgraph/ng_base.c:1128
1128            if (node->nd_type->findhook != NULL)
(kgdb) p node->nd_type
$3 = (struct ng_type *) 0x0

Compressed crashdump and kernel.debug files are available here (101MB in total):
Comment 1 Lutz Donnerhacke freebsd_committer 2021-05-15 09:52:30 UTC
Do not let mpd5 delete the interfaces directly.

This causes a lot of trouble, so I decided to patch the mpd to delay the destruction of interfaces as long as possible (or reused them, if needed). This patch made my mpd installation rock stable.