Bug 242406

Summary: mpd & 3G USB modem: Fatal trap 12: page fault while in kernel mode
Product: Base System Reporter: Andrey Khlebutin <gadskypapa>
Component: kernAssignee: Mark Johnston <markj>
Status: Closed FIXED    
Severity: Affects Only Me CC: bsd, franco, m.muenz, markj, ozkan.kirik, sigsys
Priority: ---    
Version: 12.1-RELEASE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
core dump info
none
mpd config
none
mpd script
none
mpd log
none
proposed patch
none
patch for stable/12 none

Description Andrey Khlebutin 2019-12-03 13:57:30 UTC
Created attachment 209654 [details]
core dump info

I need a backup internet channel and like to use a 3G USB modem for the purpose. I've configured mpd5 and it's up and running until a first packet will go through ng0 interface. And I've a kernel panic.

FreeBSD nucbox 12.1-RELEASE-p1 FreeBSD 12.1-RELEASE-p1 GENERIC  amd64

u3g0: <HUAWEI Technology HUAWEI Mobile, class 0/0, rev 2.00/0.00, addr 4> on usbus0

ng0: flags=88d1<UP,POINTOPOINT,RUNNING,NOARP,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: TELE2 3G
inet 10.163.59.45 --> 1.1.1.1 netmask 0xffffffff
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
Comment 1 Andrey Khlebutin 2019-12-03 13:58:40 UTC
Created attachment 209655 [details]
mpd config
Comment 2 Andrey Khlebutin 2019-12-03 13:59:00 UTC
Created attachment 209656 [details]
mpd script
Comment 3 Andrey Khlebutin 2019-12-03 13:59:19 UTC
Created attachment 209657 [details]
mpd log
Comment 4 Andrey Khlebutin 2019-12-03 14:39:35 UTC
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address	= 0x28
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80cf2f16
stack pointer	        = 0x28:0xfffffe00004481f0
frame pointer	        = 0x28:0xfffffe0000448230
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 15 (usbus0)
trap number		= 12

panic: page fault
cpuid = 1
time = 1575375614
KDB: stack backtrace:
#0 0xffffffff80c1d297 at kdb_backtrace+0x67
#1 0xffffffff80bd05cd at vpanic+0x19d
#2 0xffffffff80bd0423 at panic+0x43
#3 0xffffffff810a7dcc at trap_fatal+0x39c
#4 0xffffffff810a7e19 at trap_pfault+0x49
#5 0xffffffff810a740f at trap+0x29f
#6 0xffffffff81081a0c at calltrap+0x8
#7 0xffffffff8350c739 at ng_iface_rcvdata+0x129
#8 0xffffffff834eb98d at ng_apply_item+0x2bd
#9 0xffffffff834eb4d6 at ng_snd_item+0x186
#10 0xffffffff834eb98d at ng_apply_item+0x2bd
#11 0xffffffff834eb4d6 at ng_snd_item+0x186
#12 0xffffffff834eb98d at ng_apply_item+0x2bd
#13 0xffffffff834eb4d6 at ng_snd_item+0x186
#14 0xffffffff834eb98d at ng_apply_item+0x2bd
#15 0xffffffff834eb4d6 at ng_snd_item+0x186
#16 0xffffffff834eb98d at ng_apply_item+0x2bd
#17 0xffffffff834eb4d6 at ng_snd_item+0x186

__curthread () at /usr/src/sys/amd64/include/pcpu.h:234
warning: Source file is more recent than executable.
234		__asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (OFFSETOF_CURTHREAD));
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu.h:234
#1  doadump (textdump=<optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:371
#2  0xffffffff80bd01c8 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:451
#3  0xffffffff80bd0629 in vpanic (fmt=<optimized out>, ap=<optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:877
#4  0xffffffff80bd0423 in panic (fmt=<unavailable>)
    at /usr/src/sys/kern/kern_shutdown.c:804
#5  0xffffffff810a7dcc in trap_fatal (frame=0xfffffe0000448130, eva=40)
    at /usr/src/sys/amd64/amd64/trap.c:943
#6  0xffffffff810a7e19 in trap_pfault (frame=0xfffffe0000448130, usermode=0)
    at /usr/src/sys/amd64/amd64/trap.c:767
#7  0xffffffff810a740f in trap (frame=0xfffffe0000448130)
    at /usr/src/sys/amd64/amd64/trap.c:443
#8  <signal handler called>
#9  0xffffffff80cf2f16 in netisr_dispatch_src (proto=1, source=0, 
    m=0xfffff8027496cd00) at /usr/src/sys/net/netisr.c:1100
#10 0xffffffff8350c739 in ng_iface_rcvdata (hook=<optimized out>, 
    item=<optimized out>) at /usr/src/sys/netgraph/ng_iface.c:734
#11 0xffffffff834eb98d in ng_apply_item (node=0xfffff80286dbcd00, 
    item=0xfffff802c69d8d00, rw=0) at /usr/src/sys/netgraph/ng_base.c:2403
#12 0xffffffff834eb4d6 in ng_snd_item (item=0xfffff802c69d8d00, flags=0)
    at /usr/src/sys/netgraph/ng_base.c:2320
#13 0xffffffff834eb98d in ng_apply_item (node=0xfffff802a7fa2500, 
    item=0xfffff802c69d8d00, rw=0) at /usr/src/sys/netgraph/ng_base.c:2403
#14 0xffffffff834eb4d6 in ng_snd_item (item=0xfffff802c69d8d00, flags=0)
    at /usr/src/sys/netgraph/ng_base.c:2320
#15 0xffffffff834eb98d in ng_apply_item (node=0xfffff802a7f9bd00, 
    item=0xfffff802c69d8d00, rw=1) at /usr/src/sys/netgraph/ng_base.c:2403
#16 0xffffffff834eb4d6 in ng_snd_item (item=0xfffff802c69d8d00, flags=0)
    at /usr/src/sys/netgraph/ng_base.c:2320
#17 0xffffffff834eb98d in ng_apply_item (node=0xfffff802a79d3900, 
    item=0xfffff802c69d8d00, rw=0) at /usr/src/sys/netgraph/ng_base.c:2403
#18 0xffffffff834eb4d6 in ng_snd_item (item=0xfffff802c69d8d00, flags=0)
    at /usr/src/sys/netgraph/ng_base.c:2320
#19 0xffffffff834eb98d in ng_apply_item (node=0xfffff80286d67800, 
    item=0xfffff802c69d8d00, rw=0) at /usr/src/sys/netgraph/ng_base.c:2403
#20 0xffffffff834eb4d6 in ng_snd_item (item=0xfffff802c69d8d00, flags=0)
    at /usr/src/sys/netgraph/ng_base.c:2320
#21 0xffffffff83517702 in nga_rcv_async (sc=<optimized out>, 
    item=<optimized out>) at /usr/src/sys/netgraph/ng_async.c:545
#22 nga_rcvdata (hook=<optimized out>, item=<optimized out>)
    at /usr/src/sys/netgraph/ng_async.c:248
#23 0xffffffff834eb98d in ng_apply_item (node=0xfffff80286dabe00, 
    item=0xfffff802c69d8d00, rw=1) at /usr/src/sys/netgraph/ng_base.c:2403
#24 0xffffffff834eb4d6 in ng_snd_item (item=0xfffff802c69d8d00, flags=0)
    at /usr/src/sys/netgraph/ng_base.c:2320
#25 0xffffffff83515bcb in ngt_rint_bypass (tp=<optimized out>, 
    buf=0xfffff802bfd93800, len=<optimized out>)
    at /usr/src/sys/netgraph/ng_tty.c:446
#26 0xffffffff82ab9aba in ucom_put_data (sc=0xfffffe00ac1d2088, 
    pc=<optimized out>, offset=0, len=<optimized out>)
    at /usr/src/sys/dev/usb/serial/usb_serial.c:1540
#27 0xffffffff82aafda5 in u3g_read_callback (xfer=0xfffffe00abb4e278, 
    error=USB_ERR_NORMAL_COMPLETION) at /usr/src/sys/dev/usb/serial/u3g.c:1137
#28 0xffffffff80a0b87b in usbd_callback_wrapper (pq=<optimized out>)
    at /usr/src/sys/dev/usb/usb_transfer.c:2437
#29 0xffffffff80a0cf93 in usb_command_wrapper (pq=0xfffffe00abb4e060, 
    xfer=<optimized out>) at /usr/src/sys/dev/usb/usb_transfer.c:3091
#30 0xffffffff80a0bb4b in usb_callback_proc (_pm=<optimized out>)
    at /usr/src/sys/dev/usb/usb_transfer.c:2298
#31 0xffffffff80a064a5 in usb_process (arg=0xfffffe00007e4500)
    at /usr/src/sys/dev/usb/usb_process.c:178
#32 0xffffffff80b90c23 in fork_exit (
    callout=0xffffffff80a063b0 <usb_process>, arg=0xfffffe00007e4500, 
    frame=0xfffffe00004489c0) at /usr/src/sys/kern/kern_fork.c:1065
#33 <signal handler called>
(kgdb)
Comment 5 Ozkan KIRIK 2020-06-06 17:01:51 UTC
Same problem with same panic message is still exists in FreeBSD stabe/12 r361862.

Modem is Huawei K3765.

I wonder is there any solution for this problem?
Comment 6 Mark Johnston freebsd_committer freebsd_triage 2020-06-08 16:18:01 UTC
We are crashing here:

1120 #ifdef VIMAGE                                                                                                                                            
1121         if (V_netisr_enable[proto] == 0) {                                                                                                               
1122                 m_freem(m);                                                                                                                              
1123                 return (ENOPROTOOPT);                                                                                                                    
1124         }                                                                                                                                                
1125 #endif

I guess we are missing a CURVNET_SET() somewhere in netgraph?
Comment 7 Mark Johnston freebsd_committer freebsd_triage 2020-06-08 16:32:15 UTC
Created attachment 215365 [details]
proposed patch

Please give this patch a try.  I believe it will apply cleanly to head or stable/12.
Comment 8 Mark Johnston freebsd_committer freebsd_triage 2020-06-08 16:43:35 UTC
*** Bug 218252 has been marked as a duplicate of this bug. ***
Comment 9 Andrey Khlebutin 2020-06-17 06:08:09 UTC
Sorry for the delay.
The patch isn't applicable to stable/12 and I've not found the appropriate code for the hunk #1:

[root@nucbox /usr/src]# patch < diff
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff --git a/sys/netgraph/ng_tty.c b/sys/netgraph/ng_tty.c
|index 6dcd262a47e2..be8e9ab0308d 100644
|--- a/sys/netgraph/ng_tty.c
|+++ b/sys/netgraph/ng_tty.c
--------------------------
Patching file sys/netgraph/ng_tty.c using Plan A...
Hunk #1 failed at 439.
Hunk #2 succeeded at 497 with fuzz 2 (offset 2 lines).
1 out of 2 hunks failed--saving rejects to sys/netgraph/ng_tty.c.rej
done


[root@nucbox /usr/src]# svnlite info
Path: .
Working Copy Root Path: /usr/src
URL: svn://svn.freebsd.org/base/stable/12
Relative URL: ^/stable/12
Repository Root: svn://svn.freebsd.org/base
Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Revision: 362257
Node Kind: directory
Schedule: normal
Last Changed Author: manu
Last Changed Rev: 362245
Last Changed Date: 2020-06-17 01:44:51 +0500 (ср, 17 июня 2020)
Comment 10 Mark Johnston freebsd_committer freebsd_triage 2020-06-17 13:42:32 UTC
Created attachment 215657 [details]
patch for stable/12

Here is a patch that applies to stable/12.
Comment 11 Michael Muenz 2020-07-23 05:49:37 UTC
We had the same issue in OPNsense project, since update to 20.7 which moves from 11.2 to 12.1 the LTE modems were crashing.
I added your patch and build a new img which is reported to work great:

https://github.com/opnsense/core/issues/4218
Comment 12 Mark Johnston freebsd_committer freebsd_triage 2020-07-23 14:53:01 UTC
(In reply to Michael Muenz from comment #11)
Thanks.  Do you happen to use VNET jails at all?  If so it would be useful to know that ng_tty still works as expected if you pass the associated interface into a jail.
Comment 13 Mark Johnston freebsd_committer freebsd_triage 2020-07-23 14:57:36 UTC
https://reviews.freebsd.org/D25788
Comment 14 Michael Muenz 2020-07-24 05:09:25 UTC
(In reply to Mark Johnston from comment #12)
Sorry, VNET jails are not in use. 
Thanks for all your efforts you have done, truly appreciated :)
Comment 15 Mark Johnston freebsd_committer freebsd_triage 2020-07-28 14:47:34 UTC
I have a different patch in https://reviews.freebsd.org/D25788 , please give it a try.
Comment 16 Michael Muenz 2020-07-29 20:13:45 UTC
Is the new patch only changing ng_iface.c and not tty? 
I reverted ttv patch and applied the two changes, but I get an error compiling the kernel:


/usr/src/sys/netgraph/ng_iface.c:735:18: error: too many arguments provided to function-like macro invocation
        NET_EPOCH_ENTER(et);
                        ^
/usr/src/sys/net/if_var.h:409:9: note: macro 'NET_EPOCH_ENTER' defined here
#define NET_EPOCH_ENTER() struct epoch_tracker nep_et; epoch_enter_preempt(net_epoch_preempt, &nep_et)
        ^
/usr/src/sys/netgraph/ng_iface.c:735:2: error: use of undeclared identifier 'NET_EPOCH_ENTER'
        NET_EPOCH_ENTER(et);
        ^
/usr/src/sys/netgraph/ng_iface.c:737:17: error: too many arguments provided to function-like macro invocation
        NET_EPOCH_EXIT(et);
                       ^
/usr/src/sys/net/if_var.h:411:9: note: macro 'NET_EPOCH_EXIT' defined here
#define NET_EPOCH_EXIT() epoch_exit_preempt(net_epoch_preempt, &nep_et)
        ^
/usr/src/sys/netgraph/ng_iface.c:737:2: error: use of undeclared identifier 'NET_EPOCH_EXIT'
        NET_EPOCH_EXIT(et);
        ^
4 errors generated.
--- ng_iface.o ---
*** [ng_iface.o] Error code 1

make[5]: stopped in /usr/src/sys/modules/netgraph/iface
1 error

make[5]: stopped in /usr/src/sys/modules/netgraph/iface
--- all_subdir_netgraph/iface ---
*** [all_subdir_netgraph/iface] Error code 2

make[4]: stopped in /usr/src/sys/modules/netgraph
1 error

make[4]: stopped in /usr/src/sys/modules/netgraph
--- all_subdir_netgraph ---
*** [all_subdir_netgraph] Error code 2

make[3]: stopped in /usr/src/sys/modules
ERROR: ctfconvert: r92c_attach.o doesn't have type data to convert
A failure has been detected in another branch of the parallel make

make[4]: stopped in /usr/src/sys/modules/rtwn
--- all_subdir_rtwn ---
*** [all_subdir_rtwn] Error code 2

make[3]: stopped in /usr/src/sys/modules
ERROR: ctfconvert: r92ce_init.o doesn't have type data to convert
A failure has been detected in another branch of the parallel make

make[4]: stopped in /usr/src/sys/modules/rtwn_pci
--- all_subdir_rtwn_pci ---
*** [all_subdir_rtwn_pci] Error code 2

make[3]: stopped in /usr/src/sys/modules
ERROR: ctfconvert: ql_boot.o doesn't have type data to convert
A failure has been detected in another branch of the parallel make

make[4]: stopped in /usr/src/sys/modules/qlxgbe
--- all_subdir_qlxgbe ---
*** [all_subdir_qlxgbe] Error code 2

make[3]: stopped in /usr/src/sys/modules
ERROR: ctfconvert: ocs_cam.o doesn't have type data to convert
A failure has been detected in another branch of the parallel make

make[4]: stopped in /usr/src/sys/modules/ocs_fc
--- all_subdir_ocs_fc ---
*** [all_subdir_ocs_fc] Error code 2

make[3]: stopped in /usr/src/sys/modules
ERROR: ctfconvert: ecore_l2.o doesn't have type data to convert
A failure has been detected in another branch of the parallel make

make[5]: stopped in /usr/src/sys/modules/qlnx/qlnxe
--- all_subdir_qlnx/qlnxe ---
*** [all_subdir_qlnx/qlnxe] Error code 2

make[4]: stopped in /usr/src/sys/modules/qlnx
1 error

make[4]: stopped in /usr/src/sys/modules/qlnx
--- all_subdir_qlnx ---
*** [all_subdir_qlnx] Error code 2

make[3]: stopped in /usr/src/sys/modules
ERROR: ctfconvert: tdport.o doesn't have type data to convert
A failure has been detected in another branch of the parallel make

make[4]: stopped in /usr/src/sys/modules/pms
--- all_subdir_pms ---
*** [all_subdir_pms] Error code 2

make[3]: stopped in /usr/src/sys/modules
ERROR: ctfconvert: if_re.o doesn't have type data to convert
A failure has been detected in another branch of the parallel make

make[4]: stopped in /usr/src/sys/modules/re
--- all_subdir_re ---
*** [all_subdir_re] Error code 2

make[3]: stopped in /usr/src/sys/modules
8 errors

make[3]: stopped in /usr/src/sys/modules
--- modules-all ---
*** [modules-all] Error code 2

make[2]: stopped in /usr/obj/usr/src/amd64.amd64/sys/SMP
1 error

make[2]: stopped in /usr/obj/usr/src/amd64.amd64/sys/SMP
--- buildkernel ---
*** [buildkernel] Error code 2

make[1]: stopped in /usr/src
1 error

make[1]: stopped in /usr/src
--- buildkernel ---
*** [buildkernel] Error code 2

make: stopped in /usr/src
1 error

make: stopped in /usr/src
*** Error code 2

Stop.
make: stopped in /usr/tools
Comment 17 Mark Johnston freebsd_committer freebsd_triage 2020-07-29 20:15:59 UTC
(In reply to Michael Muenz from comment #16)
Yes, the ng_tty.c patch should be reverted.

I guess you are using an older kernel with different internal interfaces.  Try applying the patch by hand, i.e., without modifying any other lines.  Just add the CURVNET_SET/RESTORE() calls.
Comment 18 Michael Muenz 2020-07-29 20:58:54 UTC
(In reply to Mark Johnston from comment #17)
Very good guess :)

I reverted to initial file and just applied the two lines by hand. Seems to build now. When the img is ready I let the guys test the equipment.

Thanks for your efforts Mark!
Comment 19 Michael Muenz 2020-07-30 10:21:36 UTC
(In reply to Mark Johnston from comment #17)
I build an image with the new patch (really, just the two lines without the old one?) and tester reported again a crash:
https://github.com/opnsense/src/issues/67#issuecomment-666272502
Comment 20 Mark Johnston freebsd_committer freebsd_triage 2020-07-30 13:49:12 UTC
(In reply to Michael Muenz from comment #19)
Can you show the diff that you applied?  It would be useful to see the panic message as well, I will be surprised if it is the same as before.  Thanks in advance.
Comment 21 Michael Muenz 2020-07-31 11:52:15 UTC
(In reply to Mark Johnston from comment #20)
Sorry, I screwed the patch and build a IMG with the old file.
Now with this patch:

--- ng_iface.c.orig     2020-07-31 13:50:23.074673000 +0200
+++ ng_iface.c  2020-07-30 16:18:19.927067000 +0200
@@ -731,7 +731,9 @@
        }
        random_harvest_queue(m, sizeof(*m), RANDOM_NET_NG);
        M_SETFIB(m, ifp->if_fib);
+       CURVNET_SET(ifp->if_vnet);
        netisr_dispatch(isr, m);
+       CURVNET_RESTORE();
        return (0);
 }


I got a positive feedback:
https://github.com/opnsense/src/issues/67#issuecomment-667079572
Comment 22 Mark Johnston freebsd_committer freebsd_triage 2020-07-31 13:45:39 UTC
(In reply to Michael Muenz from comment #21)
Great, thank you very much.
Comment 23 commit-hook freebsd_committer freebsd_triage 2020-07-31 14:09:07 UTC
A commit references this bug:

Author: markj
Date: Fri Jul 31 14:08:33 UTC 2020
New revision: 363735
URL: https://svnweb.freebsd.org/changeset/base/363735

Log:
  ng_iface(4): Set the current VNET before calling netisr_dispatch().

  This is normally handled by a netgraph thread, but netgraph messages may
  be dispatched directly to a node, in which case no VNET is set before
  ng_iface calls into the network stack.  Netgraph could probably handle
  this more generally, but for now just be sure to set the current VNET in
  ng_iface.

  PR:		242406
  Tested by:	Michael Muenz <m.muenz@gmail.com>
  Reviewed by:	Lutz Donnerhacke
  MFC after:	1 week
  Sponsored by:	The FreeBSD Foundation
  Differential Revision:	https://reviews.freebsd.org/D25788

Changes:
  head/sys/netgraph/ng_iface.c
Comment 24 commit-hook freebsd_committer freebsd_triage 2020-08-07 13:41:23 UTC
A commit references this bug:

Author: markj
Date: Fri Aug  7 13:40:50 UTC 2020
New revision: 364014
URL: https://svnweb.freebsd.org/changeset/base/364014

Log:
  MFC r363735:
  ng_iface(4): Set the current VNET before calling netisr_dispatch().

  PR:	242406

Changes:
_U  stable/12/
  stable/12/sys/netgraph/ng_iface.c
Comment 25 Mark Johnston freebsd_committer freebsd_triage 2020-08-07 13:46:26 UTC
Thanks everyone for the report and follow-up testing.