Bug 227720 - Kernel panic in PPP server
Summary: Kernel panic in PPP server
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL:
Keywords: crash, needs-qa
: 230498 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-04-23 14:22 UTC by Matt Allanson
Modified: 2023-07-02 04:40 UTC (History)
7 users (show)

See Also:
grahamperrin: mfc-stable13?
grahamperrin: mfc-stable12?


Attachments
KCONF and nanobsd config (8.50 KB, application/x-tar)
2018-04-23 14:22 UTC, Matt Allanson
no flags Details
Client and server config (4.00 KB, application/x-tar)
2018-04-25 14:55 UTC, Matt Allanson
no flags Details
stunnel client and server config (8.00 KB, application/x-tar)
2018-09-18 16:19 UTC, Matt Allanson
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matt Allanson 2018-04-23 14:22:03 UTC
Created attachment 192755 [details]
KCONF and nanobsd config

I'm running a FreeBSD 11.1-RELEASE after build using nanobsd.sh. Hardware is amd64 running in a virtual machine (vmware). Attached is kernel config and nanobsd config. 

The system is rebooting every 30 minutes to 8 hours. Sample dmesg output is:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff805bbe57
stack pointer           = 0x28:0xfffffe0000242360
frame pointer           = 0x28:0xfffffe00002424b0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 37669 (ppp)
trap number             = 12
panic: page fault
cpuid = 0
Uptime: 8h6m0s


Any additional information needed please let me know.
Comment 1 Conrad Meyer freebsd_committer freebsd_triage 2018-04-23 16:53:32 UTC
Do you have a stack trace?
Comment 2 Matt Allanson 2018-04-23 17:49:49 UTC
(In reply to Conrad Meyer from comment #1)
No I don't, what would be the best way to do that?
Comment 3 Eugene Grosbein freebsd_committer freebsd_triage 2018-04-24 15:35:51 UTC
(In reply to Matt Allanson from comment #0)

Can you attach ppp configs?
Does it run as server or client of forms p2p tunnel?
If it's a server, how many clients does it have at max?

For crashdump you will need another (virtual) disk or add big enough additional partition to your nanobsd build equal to RAM size to be sure.

Add it to rc.conf:

dumpdev="/dev/ada1" # or /dev/ada0s1b

See https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html#kerneldebug-obtain for details.
Comment 4 Matt Allanson 2018-04-25 14:55:45 UTC
Created attachment 192809 [details]
Client and server config

Waiting for the next crash to get the stack trace, attached are client and server ppp configs. It is really simple; just pushing over tcp.
Comment 5 Matt Allanson 2018-04-25 14:56:50 UTC
(In reply to Eugene Grosbein from comment #3)

The machine that's failing is the server. We have 18 tunnels at max, currently.
Comment 6 Eugene Grosbein freebsd_committer freebsd_triage 2018-04-25 16:09:45 UTC
(In reply to Matt Allanson from comment #4)

Kernel crashdump will be much more useful if kernel config file has:

options KDB
options KDB_TRACE
options KDB_UNATTENDED
options INVARIANTS
options INVARIANT_SUPPORT
options WITHNESS
options WITNESS_SKIPSPIN
Comment 7 Matt Allanson 2018-04-26 18:19:39 UTC
#1  0xffffffff804c06b5 in mi_switch ()
#2  0xffffffff804ff2da in sleepq_wait ()
#3  0xffffffff804c0231 in _sleep ()
#4  0xffffffff80504211 in taskqueue_thread_loop ()
#5  0xffffffff804844e5 in fork_exit ()
#6  <signal handler called>

Here is the stack trace from the most recent reboot. I have additional information (from crashinfo) if it'd be helpful. Unfortunately, this kernel doesn't have debuginfo built in, but I can rebuild if absolutely necessary. Note that since this is a production server, putting a debug kernel on it would require change control and Wednesday (5/2) would be the earliest I could do it.
Comment 8 Eugene Grosbein freebsd_committer freebsd_triage 2018-04-26 19:24:56 UTC
(In reply to Matt Allanson from comment #7)

This is not very useful. You should have kernel.debug in the kernel build directory, do you? It is used to obtain kgdb backtrace.
Comment 9 Matt Allanson 2018-04-30 13:18:27 UTC
(In reply to Eugene Grosbein from comment #8)

Sorry for the late response, kernel.debug does not exist on any of our machines.
Comment 10 Eugene Grosbein freebsd_committer freebsd_triage 2018-04-30 13:24:57 UTC
GENERIC kernel has following line:

makeoptions     DEBUG=-g                # Build kernel with gdb(1) debug symbols

Add it to your kernel configuration file. It makes kernel build process to produce two kernel binaries instead of just one: ordinary kernel that gets installed by installkernel and additional kernel.debug that is not installed but is kept in the kernel build directory and used for kgdb later.
Comment 11 Matt Allanson 2018-05-04 13:56:51 UTC
(In reply to Eugene Grosbein from comment #10)

Hoping this is more what you were looking for...

(kgdb) #0  __curthread () at ./machine/pcpu.h:222
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:298
#2  0xffffffff804bf360 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:366
#3  0xffffffff804bf920 in vpanic (fmt=<optimized out>, ap=0xfffffe00968ee120)
    at /usr/src/sys/kern/kern_shutdown.c:759
#4  0xffffffff804bf963 in panic (fmt=<unavailable>)
    at /usr/src/sys/kern/kern_shutdown.c:690
#5  0xffffffff806e56a2 in trap_fatal (frame=0xfffffe00968ee2b0, eva=0)
    at /usr/src/sys/amd64/amd64/trap.c:801
#6  0xffffffff806e4d15 in trap (frame=0xfffffe00968ee2b0)
    at /usr/src/sys/amd64/amd64/trap.c:197
#7  <signal handler called>
#8  0xffffffff805d4297 in sysctl_dumpentry (rn=0xfffff80012b12c30,
    vw=0xfffffe00968ee648) at /usr/src/sys/net/rtsock.c:1543
#9  0xffffffff805ce5e0 in rn_walktree (h=<optimized out>,
    f=0xffffffff805d41a0 <sysctl_dumpentry>, w=0xfffffe00968ee648)
    at /usr/src/sys/net/radix.c:1094
#10 0xffffffff805d393b in sysctl_rtsock (oidp=<optimized out>,
    arg1=<optimized out>, arg2=<optimized out>, req=<optimized out>)
    at /usr/src/sys/net/rtsock.c:1900
#11 0xffffffff804cbe00 in sysctl_root_handler_locked (
    oid=0xffffffff80a5e818 <sysctl___net_routetable>, arg1=0xfffffe00968ee8b8,
    arg2=4, req=0xfffffe00968ee7f0, tracker=0xfffffe00968ee770)
    at /usr/src/sys/kern/kern_sysctl.c:165
#12 0xffffffff804cb622 in sysctl_root (oidp=<optimized out>, arg1=0x80,
    arg2=4, req=<optimized out>) at /usr/src/sys/kern/kern_sysctl.c:1877
#13 0xffffffff804cbb8d in userland_sysctl (td=<optimized out>,
    name=0xfffffe00968ee8b0, namelen=6, old=<optimized out>,
    oldlenp=<optimized out>, inkernel=<optimized out>, new=0x0,
    newlen=<optimized out>, retval=0xfffff80001e6b800, flags=0)
    at /usr/src/sys/kern/kern_sysctl.c:1980
#14 0xffffffff804cba2f in sys___sysctl (td=0xfffff800524e9000,
    uap=0xfffffe00968eea30) at /usr/src/sys/kern/kern_sysctl.c:1907
#15 0xffffffff806e5c60 in syscallenter (td=0xfffff800524e9000,
    sa=<optimized out>)
    at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135
#16 amd64_syscall (td=0xfffff800524e9000, traced=0)
    at /usr/src/sys/amd64/amd64/trap.c:902
#17 <signal handler called>
#18 0x0000000801dd22aa in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffffffdf08
(kgdb)

Please let me know if you need anything else.
Comment 12 Eugene Grosbein freebsd_committer freebsd_triage 2018-05-04 18:36:14 UTC
(In reply to Matt Allanson from comment #11)

Yes, that's it.

Can you please make files kernel.debug and vmcore.0 available for download? Preferably compressed.
Comment 13 Matt Allanson 2018-05-14 13:55:51 UTC
(In reply to Eugene Grosbein from comment #12)
Sorry for the late reply was ill last week, here is the info you requested.

http://rdmpackage.trimedx.com/crashinfo.tar.bz2 

Please let me know when you have retrieved it.
Comment 14 Eugene Grosbein freebsd_committer freebsd_triage 2018-05-14 14:46:29 UTC
I've downloaded it, thanks.

(kgdb) p *((struct rtentry *)rn)->rt_ifp
$7 = {if_link = {tqe_next = 0xdeadc0dedeadc0de, tqe_prev = 0xdeadc0dedeadc0de}, if_clones = {
    le_next = 0xdeadc0dedeadc0de, le_prev = 0xdeadc0dedeadc0de}, if_groups = {
    tqh_first = 0xdeadc0dedeadc0de, tqh_last = 0xdeadc0dedeadc0de}, if_alloctype = 222 'ч',
  if_softc = 0xdeadc0dedeadc0de, if_llsoftc = 0xdeadc0dedeadc0de, if_l2com = 0xdeadc0dedeadc0de,
  if_dname = 0xdeadc0dedeadc0de <Address 0xdeadc0dedeadc0de out of bounds>,
etc.

This means race condition in the kernel between interface removal procedure when some tunnel is being disconnected and sysctl handler for "net.routetable" that ppp calls, or some subroutine this handler uses.

Perhaps, this is guilt of sysctl_rtsock() function that uses RIB_RLOCK() before calling rnh->rnh_walktree(&rnh->head, sysctl_dumpentry, &w) but that does not protect from interface destruction:

https://svnweb.freebsd.org/base/release/11.1.0/sys/net/rtsock.c?annotate=321354#l1898

We need some more eyes of networking people here.
Comment 15 Matt Allanson 2018-05-18 14:13:55 UTC
(In reply to Eugene Grosbein from comment #14)

Is there something that I need to do in the meantime?
Comment 16 Eugene Grosbein freebsd_committer freebsd_triage 2018-05-18 14:52:36 UTC
(In reply to Matt Allanson from comment #15)

You could try using net/mpd5 package instead of built-in ppp(8) utility for same task. It may occur more stable for your environment.
Comment 17 Matt Allanson 2018-05-21 14:15:02 UTC
(In reply to Eugene Grosbein from comment #16)

Unfortunately it would be quite a significant change to our production environment and we aren't able to do that "in the meantime" type fix.
Comment 18 Eugene Grosbein freebsd_committer freebsd_triage 2018-05-21 16:19:17 UTC
(In reply to Matt Allanson from comment #17)

It would be nice if you write in details how one can reproduce the problem.
Comment 19 Matt Allanson 2018-05-21 16:26:00 UTC
(In reply to Eugene Grosbein from comment #18)

1. Install FreeBSD 11.1 (nanobsd built with provided KCONF)
2. Install stunnel
3. Configure ppp + stunnel to act as server on one machine and client on another
4. Connect client(s) to server
5. Communicate via network as normal
6. Wait for system to crash

We are not doing anything special in any way re: networking -- we're just pushing network comms over PPP over TLS. This solution has been 100% solid in FreeBSD 10.3.
Comment 20 Matt Allanson 2018-05-21 16:43:44 UTC
(In reply to Eugene Grosbein from comment #18)
 Apologies that is last stable in 10.2-RELEASE not 10.3-RELEASE
Comment 21 Matt Allanson 2018-05-21 17:25:27 UTC
(In reply to Matt Allanson from comment #20)
 We went from 10.2-RELEASE to 11.1-RELEASE. Just wanting to clarify
Comment 22 Brian Murrey 2018-05-31 17:37:34 UTC
(In reply to Eugene Grosbein from comment #18)

Matt is a co-worker and he is on leave right now, I was wondering if you could give me an update on this BUG 227720? 

Thank you, 

Brian Murrey
Comment 23 Eugene Grosbein freebsd_committer freebsd_triage 2018-05-31 18:01:34 UTC
(In reply to Brian Murrey from comment #22)

I cannot. Hopefully some other networking people will take a look at this racy panic.
Comment 24 Andrey V. Elsukov freebsd_committer freebsd_triage 2018-07-19 13:39:33 UTC
You can try to apply this patch https://people.freebsd.org/~ae/netgc.diff

It does deferred free for deleted ifnet structure. So, when interface destroyed and some code does access to the ifnet stucture in the same time, it can be done without page fault.
Comment 25 Andrey V. Elsukov freebsd_committer freebsd_triage 2018-07-19 13:46:01 UTC
> You can try to apply this patch https://people.freebsd.org/~ae/netgc.diff
> 
> It does deferred free for deleted ifnet structure. So, when interface
> destroyed and some code does access to the ifnet stucture in the same time,
> it can be done without page fault.

Note, you need to set sysctl net.gc.enable=1 to take an effect.
Comment 26 Matt Allanson 2018-08-06 13:09:50 UTC
(In reply to Andrey V. Elsukov from comment #25)

Just a heads up, replied in email forgot to comment it here. Sorry about that...

We applied the patch to enable garbage collection. Rebuilt the image and installed it, then set net.gc.enabled=1 as well as in /etc/sysctl.conf and it continued to fail. Here is the most recent backtrace:

(kgdb) bt
#0  0xffffffff804e7866 in sched_switch ()
#1  0xffffffff804c8d38 in mi_switch ()
#2  0xffffffff8050acfe in sleepq_switch ()
#3  0xffffffff8050abb3 in sleepq_wait ()
#4  0xffffffff804c8684 in _sleep ()
#5  0xffffffff80510401 in taskqueue_thread_loop ()
#6  0xffffffff804878d4 in fork_exit ()
#7  <signal handler called>
(kgdb)

Let me know any addition information you need.
Comment 27 Matt Allanson 2018-08-24 18:15:11 UTC
(In reply to Matt Allanson from comment #26)

Just wanted to check in on this and see if there is any movement or anything you guys need from me.
Comment 28 Matt Allanson 2018-09-17 17:33:06 UTC
(In reply to Matt Allanson from comment #27)
Just checking in again, this is currently affecting production... We would really like to continue to use FreeBSD.
Comment 29 Eugene Grosbein freebsd_committer freebsd_triage 2018-09-17 18:18:12 UTC
(In reply to Matt Allanson from comment #27)

If you still use 11.1-RELEASE, please upgrade to 11.2-RELEASE first and re-try.
Comment 30 Eugene Grosbein freebsd_committer freebsd_triage 2018-09-17 18:22:14 UTC
(In reply to Matt Allanson from comment #28)

If 11.2-RELEASE still panices for you, I need kernel.debug and crashdump got same way as you provided in the comment 11.
Comment 31 Eugene Grosbein freebsd_committer freebsd_triage 2018-09-17 19:49:58 UTC
(In reply to Matt Allanson from comment #19)

I'm going trying to reproduce the panic. Can yo chare your configs for system interfaces, stunnel (server and client) and ppp (server and client)? That would speedup things greatly.
Comment 32 Matt Allanson 2018-09-18 16:19:07 UTC
Created attachment 197207 [details]
stunnel client and server config
Comment 33 Matt Allanson 2018-09-18 16:20:35 UTC
(In reply to Eugene Grosbein from comment #31)
The client and server config for ppp is already uploaded and has not changed. Just uploaded the stunnel configs. Getting ready to build 11.2 to test. Thank you!
Comment 34 Eugene Grosbein freebsd_committer freebsd_triage 2018-09-18 19:08:51 UTC
(In reply to Matt Allanson from comment #32)

I also need to know version of stunnel you use and how do you run stunnel and ppp - just configure them with /etc/rc.conf? If so, I need corresponding lines from /etc/rc.conf. If you use some custom startup scripts, I need to know which arguments they pass to stunnel and ppp starting commands.
Comment 35 Franck Rousseau 2018-11-07 18:02:56 UTC
Hi everyone, same kind of crash here on a 11.2-RELEASE-p4, it also crashes on 11.2-RELEASE, but it seems to be a long standing bug, we had similar issues on previous versions. I only have the crash logs for this latest release.

# uname -a
FreeBSD testpc 11.2-RELEASE-p4 FreeBSD 11.2-RELEASE-p4 #0: Thu Sep 27 08:16:24 UTC 2018     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
# dmidecode -s system-product-name
OptiPlex 7010
# sysctl hw.model
hw.model: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz

The crash occurs when we play with forwarding, a PPP server and ARP proxy, but unfortunately we do not have a very precise procedure to reproduce it, also it crashes very consistently. We use FreeBSD for educational purposes, doing practical work in computer networking. This is a simple setup where we use 3 computers, configure PC1 as a PPP client, that connects to PC2 that is a PPP server and ARP proxy for PC1, and we finally check the IP connectivity from PC3 that is on the same Ethernet LAN as PC2 and observe ARP activity. 

If we activate forwarding, start the PPP server, connect the client, add an ARP published entry, it works ok as long as there is no mistake done (although I cannot guarantee that it would run for hours). But when students make several wrong tries, stop ppp, play with arp, restart ppp, at some point it crashes (we had dozens of crashes). It is rather easy to provoke this crash as I did to get the crash log, although there are various symptoms: ppp server not accepting connections anymore unless we reboot, arp failing with socket memory error, etc. until it crashes at some point. Quite hard to track this one down...

This bug seems to be somewhat similar to bug #230498 with the same location and cause.

The ppp client and server configs are trivial, over a serial line, plus proxy arp (arp -s 192.168.0.1 <eth@> pub)

default:
 set log Phase Chat LCP IPCP CCP tun command
 set device /dev/cuau0
 set speed 9600
 set accmap 000a0000
 set ctsrts off
 set cd off
 set timeout 0
server:
 set ifaddr 192.168.0.2 192.168.0.1 255.255.255.255

Here is the backtrace, I can provide the crash image and kernel.debug if needed.

(kgdb) bt
#0  doadump (textdump=<value optimized out>) at pcpu.h:229
#1  0xffffffff80af673b in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:383
#2  0xffffffff80af6b61 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:776
#3  0xffffffff80af69a3 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:707
#4  0xffffffff80f77fdf in trap_fatal (frame=0xfffffe04688ae320, eva=0) at /usr/src/sys/amd64/amd64/trap.c:875
#5  0xffffffff80f78039 in trap_pfault (frame=0xfffffe04688ae320, usermode=0) at pcpu.h:229
#6  0xffffffff80f77807 in trap (frame=0xfffffe04688ae320) at /usr/src/sys/amd64/amd64/trap.c:415
#7  0xffffffff80f57fbc in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231
#8  0xffffffff80c0ce96 in sysctl_dumpentry (rn=0xfffff8000bc48410, vw=0xfffffe04688ae690) at /usr/src/sys/net/rtsock.c:1559
#9  0xffffffff80c07aa0 in rn_walktree (h=<value optimized out>, f=<value optimized out>, w=<value optimized out>)
    at /usr/src/sys/net/radix.c:1094
#10 0xffffffff80c0c7ff in sysctl_rtsock (oidp=<value optimized out>, arg1=<value optimized out>, arg2=<value optimized out>, 
    req=<value optimized out>) at /usr/src/sys/net/rtsock.c:1916
#11 0xffffffff80b03ccb in sysctl_root_handler_locked (oid=0xffffffff81a33f38, arg1=0xfffffe04688ae908, arg2=4, req=0xfffffe04688ae840, 
    tracker=0xfffffe04688ae7b8) at /usr/src/sys/kern/kern_sysctl.c:165
#12 0xffffffff80b03521 in sysctl_root (arg1=0xfffffe04688ae908, arg2=4) at /usr/src/sys/kern/kern_sysctl.c:1915
#13 0xffffffff80b03a46 in userland_sysctl (td=<value optimized out>, name=0xfffffe04688ae900, namelen=6, old=0x0, 
    oldlenp=<value optimized out>, inkernel=<value optimized out>, new=0x0, newlen=0, retval=0xfffffe04688ae968, flags=0)
    at /usr/src/sys/kern/kern_sysctl.c:2011
#14 0xffffffff80b038cf in sys___sysctl (td=0xfffff80098410000, uap=0xfffff80098410538) at /usr/src/sys/kern/kern_sysctl.c:1945
#15 0xffffffff80f79068 in amd64_syscall (td=0xfffff80098410000, traced=0) at subr_syscall.c:132
#16 0xffffffff80f5880d in fast_syscall_common () at /usr/src/sys/amd64/amd64/exception.S:479
#17 0x0000000801de047a in ?? ()
(kgdb) f 8   
#8  0xffffffff80c0ce96 in sysctl_dumpentry (rn=0xfffff8000bc48410, vw=0xfffffe04688ae690) at /usr/src/sys/net/rtsock.c:1559
1559			info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr;
(kgdb) p rt->rt_ifp->if_addr
$1 = (struct ifaddr *) 0x0
Comment 36 Eugene Grosbein freebsd_committer freebsd_triage 2018-11-07 22:28:48 UTC
Please try the patch from https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230498
Comment 37 Franck Rousseau 2018-11-08 17:58:27 UTC
Thanks for the fast reply! Not sure if I continue here or in bug #230498 but since this is still related to PPP, I put it here.

I only had 15 min to test, but it crashed right away on the first try. Here is the procedure:
- setup PC3: configure address on Ethernet interface;
- setup PC2: configure address on Ethernet interface, add ARP pub entry, activate forwarding, start ppp server and wait for connection;
- setup PC3: start pinging PC3, obviously it fails, start ppp client and open connection, add default route, everything works correctly.
Leave everything running as it is, then quit ppp on both sides, restart the server waiting for the connection, connect from client -> crash on PC2.

Here is the trace, it crashes one call further line rtsock.c:1559 after the patch

 	info.rti_info[RTAX_GENMASK] = 0;
 	if (rt->rt_ifp) {
-		info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr;
+		IF_ADDR_RLOCK(rt->rt_ifp);
+		if (rt->rt_ifp->if_addr != NULL)
+			info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr;
 		info.rti_info[RTAX_IFA] = rt->rt_ifa->ifa_addr;

I also add a somewhat tidied up version of the (struct ifnet *)

(kgdb) bt
#0  doadump (textdump=<value optimized out>) at pcpu.h:229
#1  0xffffffff80af673b in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:383
#2  0xffffffff80af6b61 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:776
#3  0xffffffff80af69a3 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:707
#4  0xffffffff80f77fdf in trap_fatal (frame=0xfffffe0468486290, eva=1208) at /usr/src/sys/amd64/amd64/trap.c:875
#5  0xffffffff80f78039 in trap_pfault (frame=0xfffffe0468486290, usermode=0) at pcpu.h:229
#6  0xffffffff80f77807 in trap (frame=0xfffffe0468486290) at /usr/src/sys/amd64/amd64/trap.c:415
#7  0xffffffff80f57fdc in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231
#8  0xffffffff80af2893 in __rw_rlock_hard (rw=0xfffff800be4bc990, td=0xfffff80105056620, v=<value optimized out>) at /usr/src/sys/kern/kern_rwlock.c:493
#9  0xffffffff80c0ce9b in sysctl_dumpentry (rn=0xfffff80008e74270, vw=0xfffffe0468486690) at /usr/src/sys/net/rtsock.c:1559
#10 0xffffffff80c07aa0 in rn_walktree (h=<value optimized out>, f=<value optimized out>, w=<value optimized out>) at /usr/src/sys/net/radix.c:1094
#11 0xffffffff80c0c7ff in sysctl_rtsock (oidp=<value optimized out>, arg1=<value optimized out>, arg2=<value optimized out>, req=<value optimized out>) at /usr/src/sys/net/rtsock.c:1919
#12 0xffffffff80b03ccb in sysctl_root_handler_locked (oid=0xffffffff81a33f38, arg1=0xfffffe0468486908, arg2=4, req=0xfffffe0468486840, tracker=0xfffffe04684867b8) at /usr/src/sys/kern/kern_sysctl.c:165
#13 0xffffffff80b03521 in sysctl_root (arg1=0xfffffe0468486908, arg2=4) at /usr/src/sys/kern/kern_sysctl.c:1915
#14 0xffffffff80b03a46 in userland_sysctl (td=<value optimized out>, name=0xfffffe0468486900, namelen=6, old=0x0, oldlenp=<value optimized out>, inkernel=<value optimized out>, new=0x0, newlen=0, retval=0xfffffe0468486968, flags=0) at /usr/src/sys/kern/kern_sysctl.c:2011
#15 0xffffffff80b038cf in sys___sysctl (td=0xfffff80105056620, uap=0xfffff80105056b58) at /usr/src/sys/kern/kern_sysctl.c:1945
#16 0xffffffff80f79068 in amd64_syscall (td=0xfffff80105056620, traced=0) at subr_syscall.c:132
#17 0xffffffff80f5882d in fast_syscall_common () at /usr/src/sys/amd64/amd64/exception.S:479
#18 0x0000000801de047a in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal
(kgdb) f 8
#8  0xffffffff80af2893 in __rw_rlock_hard (rw=0xfffff800be4bc990, td=0xfffff80105056620, v=<value optimized out>) at /usr/src/sys/kern/kern_rwlock.c:493
493				owner = (struct thread *)RW_OWNER(v);
Current language:  auto; currently minimal
(kgdb) f 9
#9  0xffffffff80c0ce9b in sysctl_dumpentry (rn=0xfffff80008e74270, vw=0xfffffe0468486690) at /usr/src/sys/net/rtsock.c:1559
1559			IF_ADDR_RLOCK(rt->rt_ifp);
(kgdb) p rt->rt_ifp->if_addr_lock
$1 = {lock_object = {lo_name = 0xfffff800be4bc9f0 "P?K?", lo_flags = 3192637744, lo_data = 4294965248, lo_witness = 0xfffff80007085848}, rw_lock = 256}
(kgdb) p rt->rt_ifp->if_addr->ifa_addr
Cannot access memory at address 0x3700000018
(kgdb) p *rt->rt_ifp
$2 = {
    if_link = { tqe_next = 0xfffff800be9c9210, tqe_prev = 0xfffff800be9c9000 },
    if_clones = { le_next = 0xfffff800be4bc870, le_prev = 0xfffff800be4bcb70 }, 
    if_groups = { tqh_first = 0xfffff800be9c9048, tqh_last = 0x100 },
    if_alloctype = 0 '\0',
    if_softc = 0xfffff800be9c9000,
    if_llsoftc = 0x3e50000, 
    if_l2com = 0x400000004,
    if_dname = 0x0,
    if_dunit = 51,
    if_index = 36,
    if_index_reserved = 0,
    if_xname = 0xfffff800be4bc860 "\020>y\b", 
    if_description = 0xfffff800be4bc8d0 "0?K?",
    if_flags = -1102329840,
    if_drv_flags = -2048,
    if_capabilities = 142163016,
    if_capenable = -2048,
    if_linkmib = 0x100, 
    if_linkmiblen = 0,
    if_refcount = 142162944,
    if_type = 0 '\0',
    if_addrlen = 248 '?',
    if_hdrlen = 255 '?',
    if_link_state = 255 '?',
    if_mtu = 1078468608, 
    if_metric = 0,
    if_baudrate = 2,
    if_hwassist = 0,
    if_epoch = 90194313239,
    if_lastchange = { tv_sec = -8796001543664, tv_usec = -8796001544192 },
    if_snd = {  ifq_head = 0xfffff800be4bc930,
                ifq_tail = 0xfffff800be4bc870,
                ifq_len = 91478088, ifq_maxlen = -2048,
                ifq_mtx = { lock_object = { lo_name = 0x100 <Address 0x100 out of bounds>,
                                            lo_flags = 0,
                                            lo_data = 0,
                                            lo_witness = 0xfffff8000573d800},
                            mtx_lock = 1079562240
                          },
                ifq_drv_head = 0x2, 
                ifq_drv_tail = 0x0,
                ifq_drv_len = 149,
                ifq_drv_maxlen = 21,
                altq_type = 141323792,
                altq_flags = -2048,
                altq_disc = 0xfffff800086c6c00, 
                altq_ifp = 0xfffff800be4bc990,
                altq_enqueue = 0xfffff800be4bc8d0,
                altq_dequeue = 0xfffff800086c6c48,
                altq_request = 0x100, altq_clfier = 0x0, 
                altq_classify = 0xfffff800086c6c00,
                altq_tbr = 0x84a000,
                altq_cdnr = 0x4
             },
    if_linktask = { ta_link = { stqe_next = 0x0},
                                ta_pending = 6,
                                ta_priority = 0, 
                                ta_func = 0xfffff80007085a10,
                                ta_context = 0xfffff80007085800
                  },
    if_addr_lock = {    lock_object = { lo_name = 0xfffff800be4bc9f0 "P?K?",
                                        lo_flags = 3192637744,
                                        lo_data = 4294965248,
                                        lo_witness = 0xfffff80007085848
                                      },
                        rw_lock = 256
                   },
    if_addrhead = { tqh_first = 0x0, tqh_last = 0xfffff80007085800 },
    if_multiaddrs = { tqh_first = 0xf7d000, tqh_last = 0x4 },
    if_amcount = 0,
    if_addr = 0x3700000018,
    if_broadcastaddr = 0xfffff80007090a10 "\001",
    if_afdata_lock = {  lock_object = { lo_name = 0xfffff80007090800 "",
                                        lo_flags = 3192638032,
                                        lo_data = 4294965248,
                                        lo_witness = 0xfffff800be4bc990
                                      },
                        rw_lock = 18446735277734561864
                      }, 
    if_afdata = 0xfffff800be4bca08,
    if_afdata_initialized = 63,
    if_fib = 55,
    if_vnet = 0xfffff800be3dd610,
    if_home_vnet = 0xfffff800be3dd400, 
    if_vlantrunk = 0xfffff800be4bc810,
    if_bpf = 0xfffff800be4bccf0,
    if_pcount = -1103244216,
    if_bridge = 0x100,
    if_lagg = 0x0,
    if_pf_kif = 0xfffff800be3dd400, 
    if_carp = 0x220a000,
    if_label = 0x400000004,
    if_netmap = 0x0,
    if_output = 0x2400000039,
    if_input = 0xfffff80007075a10,
    if_start = 0xfffff80007075800, 
    if_ioctl = 0xfffff800be4bcc30,
    if_init = 0xfffff800be4bcb10,
    if_resolvemulti = 0xfffff80007075848,
    if_qflush = 0x100, if_transmit = 0, 
    if_reassign = 0xfffff80007075800,
    if_get_counter = 0x40460000,
    if_requestencap = 0x2,
    if_counters = 0xfffff800be4bcc10,
    if_hw_tsomax = 0, 
    if_hw_tsomaxsegcount = 0,
    if_hw_tsomaxsegsize = 17,
    if_pspare = 0xfffff800be4bcc80,
    if_hw_addr = 0xfffff800be4bcc30,
    if_pcp = 72 'H', 
    if_bspare = 0xfffff800be4bcca1 "?\b\a",
    if_ispare = 0xfffff800be4bcca4
}
Comment 38 Franck Rousseau 2018-11-19 15:16:12 UTC
Hi all, some additional information on this crash.

The procedure that I describe in the previous post crashes consistently, which is a good point to start debugging. I suspect the crash to come from internal structures that are left corrupted at some point, after which there are several symptoms, like cannot intuit interface, no memory allocated, and ultimately a kernel crash.

I have compiled a new kernel with DDB support enabled, hopping to be able to inspect memory at runtime, but the address of the (struct rtentry *) is different each time. Does anyone has an idea on how can I get the address that I need to watch to track who is modifying the routing table?
Comment 39 commit-hook freebsd_committer freebsd_triage 2018-11-27 09:04:44 UTC
A commit references this bug:

Author: ae
Date: Tue Nov 27 09:04:06 UTC 2018
New revision: 341008
URL: https://svnweb.freebsd.org/changeset/base/341008

Log:
  Fix possible panic during ifnet detach in rtsock.

  The panic can happen, when some application does dump of routing table
  using sysctl interface. To prevent this, set IFF_DYING flag in
  if_detach_internal() function, when ifnet under lock is removed from
  the chain. In sysctl_rtsock() take IFNET_RLOCK_NOSLEEP() to prevent
  ifnet detach during routes enumeration. In case, if some interface was
  detached in the time before we take the lock, add the check, that ifnet
  is not DYING. This prevents access to memory that could be freed after
  ifnet is unlinked.

  PR:		227720, 230498, 233306
  Reviewed by:	bz, eugen
  MFC after:	1 week
  Sponsored by:	Yandex LLC
  Differential Revision:	https://reviews.freebsd.org/D18338

Changes:
  head/sys/net/if.c
  head/sys/net/rtsock.c
Comment 40 Andrey V. Elsukov freebsd_committer freebsd_triage 2018-11-28 09:29:36 UTC
(In reply to Franck Rousseau from comment #37)
> Thanks for the fast reply! Not sure if I continue here or in bug #230498 but
> since this is still related to PPP, I put it here.
> 
> I only had 15 min to test, but it crashed right away on the first try. Here
> is the procedure:
> - setup PC3: configure address on Ethernet interface;
> - setup PC2: configure address on Ethernet interface, add ARP pub entry,
> activate forwarding, start ppp server and wait for connection;
> - setup PC3: start pinging PC3, obviously it fails, start ppp client and
> open connection, add default route, everything works correctly.
> Leave everything running as it is, then quit ppp on both sides, restart the
> server waiting for the connection, connect from client -> crash on PC2.
> 
> Here is the trace, it crashes one call further line rtsock.c:1559 after the
> patch
> 
>  	info.rti_info[RTAX_GENMASK] = 0;
>  	if (rt->rt_ifp) {
> -		info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr;
> +		IF_ADDR_RLOCK(rt->rt_ifp);
> +		if (rt->rt_ifp->if_addr != NULL)
> +			info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr;
>  		info.rti_info[RTAX_IFA] = rt->rt_ifa->ifa_addr;

If this patch is full version that you used, you missed IF_ADDR_RUNLOCK() here, and this is why it panics.

> #8  0xffffffff80af2893 in __rw_rlock_hard (rw=0xfffff800be4bc990,
> td=0xfffff80105056620, v=<value optimized out>) at
> /usr/src/sys/kern/kern_rwlock.c:493
> #9  0xffffffff80c0ce9b in sysctl_dumpentry (rn=0xfffff80008e74270,
> vw=0xfffffe0468486690) at /usr/src/sys/net/rtsock.c:1559
Comment 41 Franck Rousseau 2018-11-28 10:00:17 UTC
(In reply to Andrey V. Elsukov from comment #40)

The patch used was attachment #199064 [details]

I have tried all proposed patches on 11.2 and 12, and the latest on the svn devel, none works.
Comment 42 Andrey V. Elsukov freebsd_committer freebsd_triage 2018-11-28 10:34:52 UTC
(In reply to Franck Rousseau from comment #41)
> (In reply to Andrey V. Elsukov from comment #40)
> 
> The patch used was attachment #199064 [details]
> 
> I have tried all proposed patches on 11.2 and 12, and the latest on the svn
> devel, none works.

Ok, can you do the following and then report back? Assume, you use 11.2.
1. cd /usr/src # (or path where are your source code is)
2. svnlite revert -R .
3. svnlite up
4. fetch -o rtsock.diff  "https://bz-attachments.freebsd.org/attachment.cgi?id=199450&action=diff&format=raw&headers=1"
5. svnlite patch rtsock.diff
6. svnlite diff # (to be sure that all looks good)
7. make buildkernel installkernel
8. shutdown -r now
9. Try you test and if it will panic, post the panic message and backtrace from kgdb.
Comment 43 Franck Rousseau 2018-11-30 09:48:35 UTC
(In reply to Andrey V. Elsukov from comment #42)

This is what I report in bug #230498 at comment #20 and at comment #37 in this thread. I did it again from a clean SVN repo as you asked to be sure of the conclusion.

How to crash :
- boot with the new kernel
- ifconfig bge0 192.168.0.2
- ppp server then term, wait for ppp open from client, with local server address set to the same 192.168.0.2
- connection ok, it pings, then quit
- restart ppp server then term, wait for ppp open from client, after getting PPp at the prompt, IP config is starting I guess, I get the crash, trying to access a NULL pointer

In the dump:
(kgdb) bt
#0  doadump (textdump=1) at pcpu.h:229
#1  0xffffffff80b072a0 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:383
#2  0xffffffff80b076e1 in vpanic (fmt=<value optimized out>, ap=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:776
#3  0xffffffff80b07523 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:707
#4  0xffffffff803aefc7 in db_panic (addr=<value optimized out>, have_addr=<value optimized out>, 
    count=<value optimized out>, modif=<value optimized out>) at /usr/src/sys/ddb/db_command.c:499
#5  0xffffffff803ae539 in db_command (cmd_table=<value optimized out>) at /usr/src/sys/ddb/db_command.c:466
#6  0xffffffff803ae2b4 in db_command_loop () at /usr/src/sys/ddb/db_command.c:519
#7  0xffffffff803b14ff in db_trap (type=<value optimized out>, code=<value optimized out>)
    at /usr/src/sys/ddb/db_main.c:248
#8  0xffffffff80b4ed63 in kdb_trap (type=12, code=0, tf=<value optimized out>) at /usr/src/sys/kern/subr_kdb.c:689
#9  0xffffffff80f99501 in trap_fatal (frame=0xfffffe0467edd320, eva=0) at /usr/src/sys/amd64/amd64/trap.c:867
#10 0xffffffff80f99609 in trap_pfault (frame=0xfffffe0467edd320, usermode=0) at pcpu.h:229
#11 0xffffffff80f98dd7 in trap (frame=0xfffffe0467edd320) at /usr/src/sys/amd64/amd64/trap.c:415
#12 0xffffffff80f78e6c in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231
#13 0xffffffff80c24da4 in sysctl_dumpentry (rn=0xfffff80008954410, vw=0xfffffe0467edd690)
    at /usr/src/sys/net/rtsock.c:1559
#14 0xffffffff80c1f990 in rn_walktree (h=<value optimized out>, f=<value optimized out>, w=<value optimized out>)
    at /usr/src/sys/net/radix.c:1094
#15 0xffffffff80c246fb in sysctl_rtsock (oidp=<value optimized out>, arg1=<value optimized out>, 
    arg2=<value optimized out>, req=<value optimized out>) at /usr/src/sys/net/rtsock.c:1917
#16 0xffffffff80b14a6b in sysctl_root_handler_locked (oid=0xffffffff81a690d8, arg1=0xfffffe0467edd908, arg2=4, 
    req=0xfffffe0467edd840, tracker=0xfffffe0467edd7b8) at /usr/src/sys/kern/kern_sysctl.c:165
#17 0xffffffff80b142c1 in sysctl_root (arg1=0xfffffe0467edd908, arg2=4) at /usr/src/sys/kern/kern_sysctl.c:1915
#18 0xffffffff80b147e6 in userland_sysctl (td=<value optimized out>, name=0xfffffe0467edd900, namelen=6, old=0x0, 
    oldlenp=<value optimized out>, inkernel=<value optimized out>, new=0x0, newlen=0, retval=0xfffffe0467edd968, 
    flags=0) at /usr/src/sys/kern/kern_sysctl.c:2011
#19 0xffffffff80b1466f in sys___sysctl (td=0xfffff80008837000, uap=0xfffff80008837538)
    at /usr/src/sys/kern/kern_sysctl.c:1945
#20 0xffffffff80f9a638 in amd64_syscall (td=0xfffff80008837000, traced=0) at subr_syscall.c:132
#21 0xffffffff80f796bd in fast_syscall_common () at /usr/src/sys/amd64/amd64/exception.S:479
#22 0x0000000801de047a in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal
(kgdb) f 13
#13 0xffffffff80c24da4 in sysctl_dumpentry (rn=0xfffff80008954410, vw=0xfffffe0467edd690)
    at /usr/src/sys/net/rtsock.c:1559
1559			info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr;
(kgdb) print rt->rt_ifp->if_addr 
$1 = (struct ifaddr *) 0x0
(kgdb) print rt->rt_ifp->if_flags
$2 = 0
(kgdb) print rt->rt_ifp->if_index
$3 = 0
(kgdb) print rt->rt_ifp          
$4 = (struct ifnet *) 0xfffff8002be6c800
(kgdb) print *rt->rt_ifp
$5 = {if_link = {tqe_next = 0xfffff800b0cfe050, tqe_prev = 0xfffff800b0cfe0a0}, if_clones = {le_next = 0x0, 
    le_prev = 0x0}, if_groups = {tqh_first = 0x0, tqh_last = 0x0}, if_alloctype = 0 '\0', if_softc = 0x0, 
  if_llsoftc = 0x0, if_l2com = 0x0, if_dname = 0x0, if_dunit = 0, if_index = 0, if_index_reserved = 0, 
  if_xname = 0xfffff8002be6c860 "", if_description = 0x0, if_flags = 0, if_drv_flags = 0, 
  if_capabilities = -1325336224, if_capenable = -2048, if_linkmib = 0xfffff800b100f9b0, 
  if_linkmiblen = 18446735280583750992, if_refcount = 2967221664, if_type = 0 '\0', if_addrlen = 248 '�', 
  if_hdrlen = 255 '�', if_link_state = 255 '�', if_mtu = 2967221744, if_metric = 4294965248, 
  if_baudrate = 18446735280583751232, if_hwassist = 18446735280582943280, if_epoch = -8793126608256, if_lastchange = {
    tv_sec = -8793126608176, tv_usec = 0}, if_snd = {ifq_head = 0x0, ifq_tail = 0x0, ifq_len = 0, ifq_maxlen = 0, 
    ifq_mtx = {lock_object = {lo_name = 0x0, lo_flags = 503152064, lo_data = 4294965252, 
        lo_witness = 0xfffff800053ee3c0}, mtx_lock = 18446735277704537104}, ifq_drv_head = 0xfffff800053ee460, 
    ifq_drv_tail = 0x0, ifq_drv_len = -1326900496, ifq_drv_maxlen = -2048, altq_type = -1326900416, 
    altq_flags = -2048, altq_disc = 0xfffff800b0cfe320, altq_ifp = 0xfffff800b0cfe370, 
    altq_enqueue = 0xfffff800b0cfe3c0, altq_dequeue = 0xfffff800b0cfe410, altq_request = 0xfffff800b0dc3870, 
    altq_clfier = 0xfffff800b100f8c0, altq_classify = 0xfffff800b100f910, altq_tbr = 0x0, altq_cdnr = 0x0}, 
  if_linktask = {ta_link = {stqe_next = 0x0}, ta_pending = 0, ta_priority = 0, ta_func = 0xfffff800b100fa00, 
    ta_context = 0x0}, if_addr_lock = {lock_object = {lo_name = 0xfffff800b0b8a1e0 "\200}�\035\004���\220���", 
      lo_flags = 2964890160, lo_data = 4294965248, lo_witness = 0xfffff800b0b8a280}, rw_lock = 18446735280581419728}, 
  if_addrhead = {tqh_first = 0x0, tqh_last = 0xfffff800b1044960}, if_multiaddrs = {tqh_first = 0x0, tqh_last = 0x0}, 
  if_amcount = 0, if_addr = 0x0, if_broadcastaddr = 0xfffff800b0e91d70 "\200}�\035\004����\033��", if_afdata_lock = {
    lock_object = {lo_name = 0xfffff800b0e91dc0 "\200}�\035\004���p\035��", lo_flags = 2967222464, 
      lo_data = 4294965248, lo_witness = 0xfffff800b0dc3910}, rw_lock = 18446735280583752032}, 
  if_afdata = 0xfffff8002be6ca08, if_afdata_initialized = -1330076256, if_fib = 4294965248, 
  if_vnet = 0xfffff800b0b8a5f0, if_home_vnet = 0xfffff800b0b8a640, if_vlantrunk = 0xfffff800b100fe60, 
  if_bpf = 0xfffff800b100feb0, if_pcount = -1325334784, if_bridge = 0xfffff800b100ff50, if_lagg = 0x0, 
  if_pf_kif = 0xfffff800b1072000, if_carp = 0xfffff800b1072050, if_label = 0xfffff800b10720a0, 
  if_netmap = 0xfffff800b0b8a690, if_output = 0xfffff800b0b8a6e0, if_input = 0xfffff800b0b8a730, 
  if_start = 0xfffff800b0f5c280, if_ioctl = 0xfffff800b0f5c2d0, if_init = 0, if_resolvemulti = 0, 
  if_qflush = 0xfffff800b0cfea00, if_transmit = 0xfffff800b0cfea50, if_reassign = 0xfffff800b0cfeaa0, 
  if_get_counter = 0xfffff800b0dc3f50, if_requestencap = 0xfffff800b1072320, if_counters = 0xfffff8002be6cc10, 
  if_hw_tsomax = 2968896528, if_hw_tsomaxsegcount = 4294965248, if_hw_tsomaxsegsize = 2970036096, 
  if_pspare = 0xfffff8002be6cc80, if_hw_addr = 0xfffff800b0cfebe0, if_pcp = 160 '�', 
  if_bspare = 0xfffff8002be6cca1 "\020��", if_ispare = 0xfffff8002be6cca4}
Comment 44 Andrey V. Elsukov freebsd_committer freebsd_triage 2018-11-30 09:57:43 UTC
(In reply to Franck Rousseau from comment #43)
> (In reply to Andrey V. Elsukov from comment #42)
> 
> This is what I report in bug #230498 at comment #20 and at comment #37 in
> this thread. I did it again from a clean SVN repo as you asked to be sure of
> the conclusion.
> 
> How to crash :
> - boot with the new kernel
> - ifconfig bge0 192.168.0.2
> - ppp server then term, wait for ppp open from client, with local server
> address set to the same 192.168.0.2
> - connection ok, it pings, then quit
> - restart ppp server then term, wait for ppp open from client, after getting
> PPp at the prompt, IP config is starting I guess, I get the crash, trying to
> access a NULL pointer

Can you show the output of these commands:
# cd /usr/src
# svnlite info
# svnlite diff
Comment 45 Franck Rousseau 2018-11-30 10:06:41 UTC
(In reply to Andrey V. Elsukov from comment #44)

[/usr/src]# svnlite info
Path: .
Working Copy Root Path: /usr/src
URL: https://svn.freebsd.org/base/releng/11.2
Relative URL: ^/releng/11.2
Repository Root: https://svn.freebsd.org/base
Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Revision: 341162
Node Kind: directory
Schedule: normal
Last Changed Author: gordon
Last Changed Rev: 341093
Last Changed Date: 2018-11-27 20:45:25 +0100 (Tue, 27 Nov 2018)

[/usr/src]# svnlite diff
Index: sys/amd64/conf/GENERIC
===================================================================
--- sys/amd64/conf/GENERIC	(revision 341162)
+++ sys/amd64/conf/GENERIC	(working copy)
@@ -82,6 +82,8 @@
 # Debugging support.  Always need this:
 options 	KDB			# Enable kernel debugger support.
 options 	KDB_TRACE		# Print a stack trace for a panic.
+options         DDB                     # Support DDB.
+options         GDB                     # Support remote GDB.
 
 # Make an SMP-capable kernel by default
 options 	SMP			# Symmetric MultiProcessor Kernel
Index: sys/net/if.c
===================================================================
--- sys/net/if.c	(revision 341162)
+++ sys/net/if.c	(working copy)
@@ -1032,6 +1032,8 @@
 		if (iter == ifp) {
 			TAILQ_REMOVE(&V_ifnet, ifp, if_link);
 			found = 1;
+			if (!vmove)
+				ifp->if_flags |= IFF_DYING;
 			break;
 		}
 	IFNET_WUNLOCK();
Index: sys/net/rtsock.c
===================================================================
--- sys/net/rtsock.c	(revision 341162)
+++ sys/net/rtsock.c	(working copy)
@@ -1555,7 +1555,7 @@
 	info.rti_info[RTAX_NETMASK] = rtsock_fix_netmask(rt_key(rt),
 	    rt_mask(rt), &ss);
 	info.rti_info[RTAX_GENMASK] = 0;
-	if (rt->rt_ifp) {
+	if (rt->rt_ifp && !(rt->rt_ifp->if_flags & IFF_DYING)) {
 		info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr;
 		info.rti_info[RTAX_IFA] = rt->rt_ifa->ifa_addr;
 		if (rt->rt_ifp->if_flags & IFF_POINTOPOINT)
@@ -1913,8 +1913,10 @@
 			rnh = rt_tables_get_rnh(fib, i);
 			if (rnh != NULL) {
 				RIB_RLOCK(rnh); 
+				IFNET_RLOCK_NOSLEEP();
 			    	error = rnh->rnh_walktree(&rnh->head,
 				    sysctl_dumpentry, &w);
+				IFNET_RUNLOCK_NOSLEEP();
 				RIB_RUNLOCK(rnh);
 			} else if (af != 0)
 				error = EAFNOSUPPORT;
Comment 46 Andrey V. Elsukov freebsd_committer freebsd_triage 2018-11-30 11:06:24 UTC
(In reply to Franck Rousseau from comment #43)
> #13 0xffffffff80c24da4 in sysctl_dumpentry (rn=0xfffff80008954410,
> vw=0xfffffe0467edd690)
>     at /usr/src/sys/net/rtsock.c:1559
> 1559			info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr;
> (kgdb) print rt->rt_ifp->if_addr 
> $1 = (struct ifaddr *) 0x0
> (kgdb) print rt->rt_ifp->if_flags
> $2 = 0
> (kgdb) print rt->rt_ifp->if_index
> $3 = 0
> (kgdb) print rt->rt_ifp          
> $4 = (struct ifnet *) 0xfffff8002be6c800
> (kgdb) print *rt->rt_ifp
> $5 = {if_link = {tqe_next = 0xfffff800b0cfe050, tqe_prev =
> 0xfffff800b0cfe0a0}, if_clones = {le_next = 0x0, 
>     le_prev = 0x0}, if_groups = {tqh_first = 0x0, tqh_last = 0x0},
> if_alloctype = 0 '\0', if_softc = 0x0, 
>   if_llsoftc = 0x0, if_l2com = 0x0, if_dname = 0x0, if_dunit = 0, if_index =
> 0, if_index_reserved = 0, 
>   if_xname = 0xfffff8002be6c860 "", if_description = 0x0, if_flags = 0,
> if_drv_flags = 0, 

Ok, it seems all was correctly patched and the problem is because we have garbage in the ifnet pointer.
Comment 47 Sergey M. Uralov 2019-10-01 10:52:56 UTC
Please take a look. Maybe this is similar bug. 

#uname -a
FreeBSD vpn01 11.3-RELEASE FreeBSD 11.3-RELEASE #0: Thu Sep  5 10:41:12 MSK 2019     root@vpn01:/usr/obj/usr/src/sys/VPN01  amd64

# diff /root/kernels/VPN01 /usr/src/sys/amd64/conf/GENERIC
< ident         VPN01
> ident         GENERIC
---
< #options      INET6                   # IPv6 communications protocols
> options       INET6                   # IPv6 communications protocols
---
< #device               lpt                     # Printer
> device                lpt                     # Printer
---
< #device               snd_cmi                 # CMedia CMI8338/CMI8738
< #device               snd_csa                 # Crystal Semiconductor CS461x/428x
< #device               snd_emu10kx             # Creative SoundBlaster Live! and Audigy
< #device               snd_es137x              # Ensoniq AudioPCI ES137x
< #device               snd_hda                 # Intel High Definition Audio
< #device               snd_ich                 # Intel, NVidia and other ICH AC'97 Audio
< #device               snd_via8233             # VIA VT8233x Audio
---
> device                snd_cmi                 # CMedia CMI8338/CMI8738
> device                snd_csa                 # Crystal Semiconductor CS461x/428x
> device                snd_emu10kx             # Creative SoundBlaster Live! and Audigy
> device                snd_es137x              # Ensoniq AudioPCI ES137x
> device                snd_hda                 # Intel High Definition Audio
> device                snd_ich                 # Intel, NVidia and other ICH AC'97 Audio
> device                snd_via8233             # VIA VT8233x Audio
---
< options         INCLUDE_CONFIG_FILE     # Include this file in kernel
< options         KDB         # Kernel debugger related code
< options         KDB_TRACE       # Print a stack trace for a panic
---
< options         IPFIREWALL
< options         IPFIREWALL_VERBOSE
< options         IPFIREWALL_VERBOSE_LIMIT=1000
< options         IPFIREWALL_DEFAULT_TO_ACCEPT
< options         DUMMYNET
< options         IPDIVERT
< options         IPFILTER
< options         IPFILTER_LOG
< options         IPFILTER_LOOKUP
< options         IPSTEALTH
---
< options         NETGRAPH
< options         NETGRAPH_SOCKET
< options         NETGRAPH_IPFW
< options         NETGRAPH_ETHER
< options         NETGRAPH_BPF
< options         NETGRAPH_PPPOE


# kgdb kernel.debug /var/crash/vmcore.2
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 25; apic id = 21
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80c21212
stack pointer           = 0x28:0xfffffe085c59e3e0
frame pointer           = 0x28:0xfffffe085c59e520
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 18740 (ppp)
trap number             = 12
panic: page fault
cpuid = 25
KDB: stack backtrace:
#0 0xffffffff80b51a07 at kdb_backtrace+0x67
#1 0xffffffff80b0aa1e at vpanic+0x17e
#2 0xffffffff80b0a893 at panic+0x43
#3 0xffffffff80f854f9 at trap_fatal+0x369
#4 0xffffffff80f85559 at trap_pfault+0x49
#5 0xffffffff80f84bdd at trap+0x29d
#6 0xffffffff80f649cc at calltrap+0x8
#7 0xffffffff80c1bf80 at rn_walktree+0x80
#8 0xffffffff80c20b83 at sysctl_rtsock+0x1f3
#9 0xffffffff80b17e8b at sysctl_root_handler_locked+0x8b
#10 0xffffffff80b176e2 at sysctl_root+0x1f2
#11 0xffffffff80b17c06 at userland_sysctl+0x136
#12 0xffffffff80b17a8f at sys___sysctl+0x5f
#13 0xffffffff80f865f6 at amd64_syscall+0xa86
#14 0xffffffff80f652ad at fast_syscall_common+0x101
Uptime: 7m55s
Dumping 1351 out of 32604 MB:..2%..11%..21%..31%..41%..51%..61%..72%..81%..92%

#0  doadump () at pcpu.h:234
234             __asm("movq %%gs:%1,%0" : "=r" (td)


(kgdb) list *0xffffffff80c21212
0xffffffff80c21212 is in sysctl_dumpentry (/usr/src/sys/net/rtsock.c:1566).
1561            info.rti_info[RTAX_GATEWAY] = rt->rt_gateway;
1562            info.rti_info[RTAX_NETMASK] = rtsock_fix_netmask(rt_key(rt),
1563                rt_mask(rt), &ss);
1564            info.rti_info[RTAX_GENMASK] = 0;
1565            if (rt->rt_ifp && !(rt->rt_ifp->if_flags & IFF_DYING)) {
1566                    info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr;
1567                    info.rti_info[RTAX_IFA] = rt->rt_ifa->ifa_addr;
1568                    if (rt->rt_ifp->if_flags & IFF_POINTOPOINT)
1569                            info.rti_info[RTAX_BRD] = rt->rt_ifa->ifa_dstaddr;
1570            }
Current language:  auto; currently minimal

(kgdb) backtrace
#0  doadump () at pcpu.h:234
#1  0xffffffff80b0a638 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:388
#2  0xffffffff80b0aa58 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:781
#3  0xffffffff80b0a893 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:712
#4  0xffffffff80f854f9 in trap_fatal (frame=0xfffffe085c59e320, eva=0) at /usr/src/sys/amd64/amd64/trap.c:904
#5  0xffffffff80f85559 in trap_pfault (frame=0xfffffe085c59e320, usermode=0) at pcpu.h:234
#6  0xffffffff80f84bdd in trap (frame=0xfffffe085c59e320) at /usr/src/sys/amd64/amd64/trap.c:438
#7  0xffffffff80f649cc in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231
#8  0xffffffff80c21212 in sysctl_dumpentry (rn=0xfffff80188cef000, vw=0xfffffe085c59e690) at /usr/src/sys/net/rtsock.c:1566
#9  0xffffffff80c1bf80 in rn_walktree (h=<value optimized out>, f=0xffffffff80c21110 <sysctl_dumpentry>, w=0xfffffe085c59e690) at /usr/src/sys/net/radix.c:1094
#10 0xffffffff80c20b83 in sysctl_rtsock (oidp=<value optimized out>, arg1=<value optimized out>, arg2=<value optimized out>, req=<value optimized out>) at /usr/src/sys/net/rtsock.c:1931
#11 0xffffffff80b17e8b in sysctl_root_handler_locked (oid=0xffffffff81a51b28, arg1=0xfffffe085c59e908, arg2=4, req=0xfffffe085c59e840, tracker=0xfffffe085c59e7b8)
    at /usr/src/sys/kern/kern_sysctl.c:165
#12 0xffffffff80b176e2 in sysctl_root (arg1=0xfffffe085c59e908, arg2=4, req=0xfffffe085c59e840) at /usr/src/sys/kern/kern_sysctl.c:1915
#13 0xffffffff80b17c06 in userland_sysctl (td=<value optimized out>, name=0xfffffe085c59e900, namelen=6, old=0x0, oldlenp=<value optimized out>, inkernel=<value optimized out>, new=0x0,
    newlen=0, retval=0xfffffe085c59e968, flags=0) at /usr/src/sys/kern/kern_sysctl.c:2011
#14 0xffffffff80b17a8f in sys___sysctl (td=0xfffff80233776620, uap=0xfffff80233776b58) at /usr/src/sys/kern/kern_sysctl.c:1945
#15 0xffffffff80f865f6 in amd64_syscall (td=0xfffff80233776620, traced=0) at src/sys/amd64/amd64/../../kern/subr_syscall.c:132
#16 0xffffffff80f652ad in fast_syscall_common () at /usr/src/sys/amd64/amd64/exception.S:494
#17 0x0000000801de0bba in ?? ()
Previous frame inner to this frame (corrupt stack?)
Comment 48 Eugene Grosbein freebsd_committer freebsd_triage 2023-07-02 04:38:36 UTC
Fixed with https://svnweb.freebsd.org/base?view=revision&revision=341677 and merged 11.3-RELEASE and 12.1-RELEASE.
Comment 49 Eugene Grosbein freebsd_committer freebsd_triage 2023-07-02 04:40:42 UTC
*** Bug 230498 has been marked as a duplicate of this bug. ***