Bug 267704

Summary: net/realtek-re-kmod: frequent kernel panic
Product: Ports & Packages Reporter: Henry Hu <henry.hu.sh>
Component: Individual Port(s)Assignee: Alex Dupre <ale>
Status: Closed Overcome By Events    
Severity: Affects Only Me CC: ashafer, grahamperrin, jbolla, kib
Priority: --- Keywords: crash
Version: LatestFlags: bugzilla: maintainer-feedback? (ale)
ale: maintainer-feedback? (kib)
Hardware: Any   
OS: Any   
URL: https://www.freshports.org/net/realtek-re-kmod/
Attachments:
Description Flags
realtek-re-kmod v1.97 panics
none
196.04_3 binary that also crashed none

Description Henry Hu 2022-11-11 01:22:59 UTC
I'm getting non-deterministic kernel panic with this kmod.
I'm using FreeBSD 14-CURRENT (1400073):
FreeBSD goldpeak.local 14.0-CURRENT FreeBSD 14.0-CURRENT #3 main-n259127-689a9368eb60: Wed Nov  9 22:34:33 EST 2022     root@goldpeak.local:/usr/obj/usr/src/amd64.amd64/sys/MYKERNEL amd64

and I'm getting repeated kernel panic (~30m to 1h). It crashed more than 10 times today, when I was not at home and the machine was idle.

(kgdb) where
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59
#1  dump_savectx () at /usr/src/sys/kern/kern_shutdown.c:405
#2  0xffffffff807a59d8 in dumpsys (di=0x0) at /usr/src/sys/x86/include/dump.h:87
#3  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:434
#4  kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:541
#5  0xffffffff807a5ef2 in vpanic (fmt=0xffffffff80bbdc7d "%s", ap=0xfffffe014dd95a50) at /usr/src/sys/kern/kern_shutdown.c:979
#6  0xffffffff807a5cf3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:903
#7  0xffffffff80b63f9a in trap_fatal (frame=0xfffffe014dd95c80, eva=65543) at /usr/src/sys/amd64/amd64/trap.c:955
#8  0xffffffff80b63fef in trap_pfault (frame=frame@entry=0xfffffe014dd95c80, usermode=false, signo=signo@entry=0x0, ucode=ucode@entry=0x0) at /usr/src/sys/amd64/amd64/trap.c:763
#9  0xffffffff80b63672 in trap (frame=frame@entry=0xfffffe014dd95c80) at /usr/src/sys/amd64/amd64/trap.c:445
#10 0xffffffff80b64339 in trap_check (frame=0xfffffe014dd95c80) at /usr/src/sys/amd64/amd64/trap.c:667
#11 <signal handler called>
#12 ether_input (ifp=<optimized out>, m=0xffff) at /usr/src/sys/net/if_ethersubr.c:822
#13 0xffffffff85387ea2 in re_rxeof () from /boot/modules/if_re.ko
#14 0x0000000000000000 in ?? ()

The common thing seems to be that ether_input is always called with m = 0xffff. I'm trying to figure out how to get debug info from the kernel module.
Comment 1 Henry Hu 2022-11-11 01:42:37 UTC
okay, it just crashed again and I got a better trace:

(kgdb) where
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59
#1  dump_savectx () at /usr/src/sys/kern/kern_shutdown.c:405
#2  0xffffffff807a59d8 in dumpsys (di=0x0) at /usr/src/sys/x86/include/dump.h:87
#3  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:434
#4  kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:541
#5  0xffffffff807a5ef2 in vpanic (fmt=0xffffffff80bbdc7d "%s", ap=0xfffffe014db56a50) at /usr/src/sys/kern/kern_shutdown.c:979
#6  0xffffffff807a5cf3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:903
#7  0xffffffff80b63f9a in trap_fatal (frame=0xfffffe014db56c80, eva=65543) at /usr/src/sys/amd64/amd64/trap.c:955
#8  0xffffffff80b63fef in trap_pfault (frame=frame@entry=0xfffffe014db56c80, usermode=false, signo=signo@entry=0x0, ucode=ucode@entry=0x0) at /usr/src/sys/amd64/amd64/trap.c:763
#9  0xffffffff80b63672 in trap (frame=frame@entry=0xfffffe014db56c80) at /usr/src/sys/amd64/amd64/trap.c:445
#10 0xffffffff80b64339 in trap_check (frame=0xfffffe014db56c80) at /usr/src/sys/amd64/amd64/trap.c:667
#11 <signal handler called>
#12 ether_input (ifp=<optimized out>, m=0xffff) at /usr/src/sys/net/if_ethersubr.c:822
#13 0xffffffff85387ea2 in re_rxeof (sc=0xfffffe015aaf0000) at if_re.c:7089
#14 0xffffffff85374ad7 in re_int_task_8125 (arg=0xfffffe015aaf0000, npending=1) at if_re.c:7315
#15 0xffffffff80808d50 in taskqueue_run_locked (queue=queue@entry=0xfffff8012b3f4600) at /usr/src/sys/kern/subr_taskqueue.c:514
#16 0xffffffff80809f93 in taskqueue_thread_loop (arg=arg@entry=0xfffffe015aaf2238) at /usr/src/sys/kern/subr_taskqueue.c:826
#17 0xffffffff80761ccd in fork_exit (callout=0xffffffff80809ed0 <taskqueue_thread_loop>, arg=0xfffffe015aaf2238, frame=0xfffffe014db56f40) at /usr/src/sys/kern/kern_fork.c:1102
#18 <signal handler called>
#19 0x000032d58f8d6eea in ?? ()
Backtrace stopped: Cannot access memory at address 0x32d5a7eb8f48

(kgdb) frame 13
#13 0xffffffff85387ea2 in re_rxeof (sc=0xfffffe015aaf0000) at if_re.c:7089
7089                    RE_UNLOCK(sc);
(kgdb) p *m
$12 = {{m_next = 0x0, m_slist = {sle_next = 0x0}, m_stailq = {stqe_next = 0x0}}, {m_nextpkt = 0x0, m_slistpkt = {sle_next = 0x0}, m_stailqpkt = {stqe_next = 0x0}}, m_data = 0xfffffe019896400e "E", m_len = 216, m_type = 1, m_flags = 19, {{{m_pkthdr = {{snd_tag = 0xfffff8013dcef800, rcvif = 0xfffff8013dcef800,
            {rcvidx = 63488, rcvgen = 15822}}, {leaf_rcvif = 0x0, {leaf_rcvidx = 0, leaf_rcvgen = 0}}, tags = {slh_first = 0x0}, len = 216, flowid = 0, csum_flags = 251658240, fibnum = 0, numa_domain = 255 '\377', rsstype = 0 '\000', {rcv_tstmp = 0, {l2hlen = 0 '\000', l3hlen = 0 '\000', l4hlen = 0 '\000',
              l5hlen = 0 '\000', inner_l2hlen = 0 '\000', inner_l3hlen = 0 '\000', inner_l4hlen = 0 '\000', inner_l5hlen = 0 '\000'}}, PH_per = {eight = "\000\000\000\000\377\377\000", sixteen = {0, 0, 65535, 0}, thirtytwo = {0, 65535}, sixtyfour = {281470681743360}, unintptr = {281470681743360},
            ptr = 0xffff00000000}, {PH_loc = {eight = "\000\000\000\000\000\000\000", sixteen = {0, 0, 0, 0}, thirtytwo = {0, 0}, sixtyfour = {0}, unintptr = {0}, ptr = 0x0}, memlen = 0}}, {m_epg_npgs = 0 '\000', m_epg_nrdy = 248 '\370', m_epg_hdrlen = 206 '\316', m_epg_trllen = 61 '=',
          m_epg_1st_off = 63489, m_epg_last_len = 65535, m_epg_flags = 0 '\000', m_epg_record_type = 0 '\000', __spare = "\000", m_epg_enc_cnt = 0, m_epg_tls = 0x0, m_epg_so = 0xd8, m_epg_seqno = 71776119312875520, m_epg_stailq = {stqe_next = 0x0}}}, {m_ext = {{ext_count = 1, ext_cnt = 0x38fff2eb00000001},
          ext_size = 9216, ext_type = 4, ext_flags = 1, {{ext_buf = 0xfffffe0198964000 "\377\377\377\377\377\377,\364\062%\032\v\b", ext_arg2 = 0x0}, {extpg_pa = {18446741881541246976, 0, 14418586273075800256, 8565010310643185937, 149001056032896},
              extpg_trail = "\001\001\b\n\302\353\301\270\244ˡ\330status\n*\177\373\031\006%", '\000' <repeats 15 times>, "(", '\000' <repeats 22 times>, extpg_hdr = "\000\000\000\000\000\000\000\000\071\002;\224\256踸f\000\020/\253s\273"}}, ext_free = 0x0, ext_arg1 = 0x0},
        m_pktdat = 0xfffff8013db9f660 "\001"}}, m_dat = 0xfffff8013db9f620 ""}}

The mbuf seems to be reasonable at callsite...
Comment 2 Konstantin Belousov freebsd_committer freebsd_triage 2022-11-11 08:50:09 UTC
What is exact content of the kernel panic messages?
Provide consistent panic message + kgdb backtrace, i.e. show them both
from the same panic.
Comment 3 Konstantin Belousov freebsd_committer freebsd_triage 2022-11-11 08:57:13 UTC
Also pls show me the driver code around if_re.c:7089, in your sources.
Comment 4 Alex Dupre freebsd_committer freebsd_triage 2022-11-11 09:04:19 UTC
Here is the code: https://github.com/alexdupre/rtl_bsd_drv

Both rows (7089 and 7315) are plain RE_UNLOCK(sc)
Comment 5 Konstantin Belousov freebsd_committer freebsd_triage 2022-11-11 09:08:30 UTC
(In reply to Alex Dupre from comment #4)
It cannot be that line, there is no call to either ether_input() or
ifp->if_input() right before it.  This is why I asked.
Comment 6 Alex Dupre freebsd_committer freebsd_triage 2022-11-11 09:16:06 UTC
True, it looks like there is a 4-5 lines offset, but that means it's a different code.
Comment 7 Henry Hu 2022-11-18 01:40:08 UTC
Message (from a brand new panic):

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 13; apic id = 3a
fault virtual address   = 0x10007
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff808c6e70
stack pointer           = 0x28:0xfffffe014dd8ad40
frame pointer           = 0x28:0xfffffe014dd8ad90
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (re0 taskq)
rdi: fffffe015e1fa090 rsi:                0 rdx:                0
rcx:                1  r8: fffffe015e1f9cd8  r9:                0
rax:                0 rbx:             ffff rbp: fffffe014dd8ad90
r10:                0 r11:                6 r12:             8803
r13:             ffff r14: fffffe014f7ecac0 r15:                0
trap number             = 12
panic: page fault
cpuid = 13
time = 1668735340
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe014dd8a900
kdb_backtrace() at kdb_backtrace+0x37/frame 0xfffffe014dd8a9b0
vpanic() at vpanic+0x184/frame 0xfffffe014dd8aa10
panic() at panic+0x43/frame 0xfffffe014dd8aa70
trap_fatal() at trap_fatal+0x3fa/frame 0xfffffe014dd8aad0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe014dd8ab30
trap() at trap+0x262/frame 0xfffffe014dd8ac50
trap_check() at trap_check+0x29/frame 0xfffffe014dd8ac70
calltrap() at calltrap+0x8/frame 0xfffffe014dd8ac70
--- trap 0xc, rip = 0xffffffff808c6e70, rsp = 0xfffffe014dd8ad40, rbp = 0xfffffe014dd8ad90 ---
ether_input() at ether_input+0x50/frame 0xfffffe014dd8ad90
re_rxeof() at re_rxeof+0x442/frame 0xfffffe014dd8adf0
re_int_task_8125() at re_int_task_8125+0x137/frame 0xfffffe014dd8ae30
taskqueue_run_locked() at taskqueue_run_locked+0x1c0/frame 0xfffffe014dd8aec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xc3/frame 0xfffffe014dd8aef0
fork_exit() at fork_exit+0x7d/frame 0xfffffe014dd8af30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe014dd8af30
--- trap 0xc, rip = 0x32e2e6377eea, rsp = 0x32e2f5ae5f48, rbp = 0x32e2f5ae5f60 ---
Uptime: 3m32s
Dumping 2165 out of 32500 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59
59              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,

regarding the code: I've not modified the code. The debug info may not be very accurate, but I think it's actually the ifp->if_input call at line 7097.
Comment 8 Henry Hu 2022-11-18 02:38:39 UTC
I added

		if (m->m_nextpkt && (int64_t)m->m_nextpkt < 0x1000000) {
			printf("Wrong offset! m->m_nextpkt=%p\n", m->m_nextpkt);
			m->m_nextpkt = 0;
		}

before the if_input call, and it really captured an error:

Wrong offset! m->m_nextpkt=0xffff

Which means that somehow m->m_nextpkt was 0xffff, before the mbuf is sent to ether_input.
Comment 9 Henry Hu 2022-11-18 05:14:52 UTC
I thought 0xffff may come from the checksum code just before the if_input call. I'm wrong. I've added a few debug statements (as early as just after line 7046), and m->m_nextpkt is already 0xffff there.
Given that m is initialized in line 7040, it might be earlier than that. I'll add more debug statements.
Comment 10 Austin Shafer 2023-01-06 18:26:58 UTC
Created attachment 239310 [details]
realtek-re-kmod v1.97 panics

I've seen something similar to this, although not the same. I've attached a couple different core.txt.* files for reference. Version 1.97 of realtek-re-kmod is unstable.

Initially I didn't have any problems with v1.97, ran totally fine for a couple weeks straight without issue. I go to open a folder in thunderbird and my system hangs. Found that odd so I restarted, and that's when my machine started hanging every 5 minutes (give or take). Constant panics, even if I just let the machine reboot and didn't touch anything it would still hang.

Two main panic reasons:
"panic: Memory modified after free 0xfffff8018fcd0a00(256) val=deadf5ee @ 0xfffff8018fcd0a08"
"Fatal trap 12: page fault while in kernel mode" in ether_input()

As hinted by the first panic and the panic originally mentioned in this bug, I think there is a use after free hanging around somewhere. Most likely something allocated by UMA, since looked like the part of the code failing validation and seeing the deadf5ee value.

The good news is that realtek-re-kmod version 196.04 works fine, so anyone else running into this can just roll back to that version like I did.
Comment 11 Alex Dupre freebsd_committer freebsd_triage 2023-01-08 11:36:01 UTC
I've just pushed the version v1.98, I don't think there is anything that will fix your issue, but can you give it a try? If it continues to panic, can you also try to change the port to use the commit hash 2c11277, that is almost the stock driver, and tell me if it still has the issue? Thanks.
Comment 12 Austin Shafer 2023-01-08 18:14:03 UTC
Thanks I'll give it a try when I get the chance, although the computer I reproduce the issue on is my daily driver for work so I'm not always able to experiment ont it.
Comment 13 Austin Shafer 2023-01-11 18:15:30 UTC
Tested with 2c11277 and still saw the crashes. I didn't test with 1.98 though.

The exact 196.04 port hash that is "known good" for me is:
commit ad28aec4d627e45be486e052331eabbdf922cff7 (HEAD)
Author: John Baldwin <jhb@FreeBSD.org>
Date:   Fri May 20 10:12:21 2022 -0700

    net/realtek-re-kmod: Remove unused DRIVER_MODULE devclass on recent main.
    
    Reviewed by:    ale (maintainer)
    Differential Revision:  https://reviews.freebsd.org/D35203


And by system is:
FreeBSD mick 14.0-CURRENT FreeBSD 14.0-CURRENT #18 main-n259955-45396fda8b73: Fri Jan  6 11:14:38 EST 2023     root@mick:/usr/obj/usr/freebsd-src/amd64.amd64/sys/GENERIC amd64


I did actually set the version to 196.04 when my ports tree was at 1.97, and saw the crashes. So that is interesting. I unfortunately didn't record the port commit I was on when I did that, my best guess is commit cd6de38cc36145f3d222623da1bc61060d1cdf16. I don't have time to validate that today.

From peeking at it in kgdb it looks like corruption in something allocated out of zone_mbuf in UMA:
#9  0xffffffff80bec683 in m_get (how=2, type=1) at /usr/freebsd-src/sys/sys/mbuf.h:1000
1000		m = uma_zalloc_arg(zone_mbuf, &args, how);
(kgdb) l
995		struct mbuf *m;
996		struct mb_args args;
997	
998		args.flags = 0;
999		args.type = type;
1000		m = uma_zalloc_arg(zone_mbuf, &args, how);
1001		MBUF_PROBE3(m__get, how, type, m);
1002		return (m);
1003	}

The validation that fails and panics is kicked off by this uma_zalloc_arg call I think. I do still think this is something going wrong on if_re's end since I've tried with 13.1-STABLE and other CURRENT versions, but only realtek re v196.04 actually solves the panics.
Comment 14 Alex Dupre freebsd_committer freebsd_triage 2023-01-11 18:28:15 UTC
Setting the PORTVERSION doesn't actually make any change, what's important is the commit hash you are using. If you tried the 2c11277, then you were already using the 1.98 version (whatever you set in the port's Makefile), and you should have seen it from the kernel message at startup. If this is confirmed, it means that the issue is in the stock kernel driver and not in our patches.

@kib: any idea?
Comment 15 Alex Dupre freebsd_committer freebsd_triage 2023-01-14 07:52:04 UTC
Apparently a similar issue is present also on the 1.96 release, according to https://www.truenas.com/community/threads/new-truenas-core-build-crashing-and-rebooting.106862/

Perhaps later versions trigger it more frequently under certain conditions.
Comment 16 Austin Shafer 2023-01-14 22:07:22 UTC
I think it's something introduced between commit ad28aec4d627e45be486e052331eabbdf922cff7 and when the version was bumped to 1.97. Like I said, I reproduced it with 1.96 too in that range. I'll see if I can get around to bisecting it eventually.
Comment 17 jbolla 2023-01-16 05:48:33 UTC
Created attachment 239499 [details]
196.04_3 binary that also crashed

I'm uploading this binary that showed a similar crash in case that helps determine versions with the behavior. Here are some example call stacks:

db:0:kdb.enter.default>  bt
Tracing pid 12 tid 100045 td 0xfffffe00e09aa740
kdb_enter() at kdb_enter+0x37/frame 0xfffffe00e06da650
vpanic() at vpanic+0x1b0/frame 0xfffffe00e06da6a0
panic() at panic+0x43/frame 0xfffffe00e06da700
trap_fatal() at trap_fatal+0x385/frame 0xfffffe00e06da760
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00e06da7c0
calltrap() at calltrap+0x8/frame 0xfffffe00e06da7c0
--- trap 0xc, rip = 0xffffffff80bbb558, rsp = 0xfffffe00e06da890, rbp = 0xfffffe00e06da8d0 ---
sbcut_internal() at sbcut_internal+0xa8/frame 0xfffffe00e06da8d0
tcp_do_segment() at tcp_do_segment+0x18c8/frame 0xfffffe00e06da9b0
tcp_input_with_port() at tcp_input_with_port+0xb61/frame 0xfffffe00e06daae0
tcp_input() at tcp_input+0xb/frame 0xfffffe00e06daaf0
ip_input() at ip_input+0x11f/frame 0xfffffe00e06dab80
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe00e06dabd0
ether_demux() at ether_demux+0x138/frame 0xfffffe00e06dac00
ether_nh_input() at ether_nh_input+0x355/frame 0xfffffe00e06dac60
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe00e06dacb0
ether_input() at ether_input+0x69/frame 0xfffffe00e06dad10
re_rxeof() at re_rxeof+0x2ad/frame 0xfffffe00e06dad80
re_int_task_8125() at re_int_task_8125+0xb4/frame 0xfffffe00e06dadc0
taskqueue_run_locked() at taskqueue_run_locked+0x181/frame 0xfffffe00e06dae40
taskqueue_run() at taskqueue_run+0x68/frame 0xfffffe00e06dae60
ithread_loop() at ithread_loop+0x25a/frame 0xfffffe00e06daef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe00e06daf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e06daf30
--- trap 0x80b7b350, rip = 0xffffffff80aa32cf, rsp = 0, rbp = 0x3408000 ---
mi_startup() at mi_startup+0xdf/frame 0x3408000
db:0:kdb.enter.default>  show allpcpu
Current CPU: 4
...
curthread    = 0xfffffe00e09aa740: pid 12 tid 100045 critnest 1 "swi5: fast taskq"

------------------------------------------------------

db:0:kdb.enter.default>  bt
Tracing pid 0 tid 100206 td 0xfffffe010617c560
kdb_enter() at kdb_enter+0x37/frame 0xfffffe01064edab0
vpanic() at vpanic+0x1b0/frame 0xfffffe01064edb00
panic() at panic+0x43/frame 0xfffffe01064edb60
trap_fatal() at trap_fatal+0x385/frame 0xfffffe01064edbc0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe01064edc20
calltrap() at calltrap+0x8/frame 0xfffffe01064edc20
--- trap 0xc, rip = 0xffffffff80c55c6c, rsp = 0xfffffe01064edcf0, rbp = 0xfffffe01064edd40 ---
ether_nh_input() at ether_nh_input+0x1c/frame 0xfffffe01064edd40
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe01064edd90
ether_input() at ether_input+0x69/frame 0xfffffe01064eddf0
epair_tx_start_deferred() at epair_tx_start_deferred+0x177/frame 0xfffffe01064ede40
taskqueue_run_locked() at taskqueue_run_locked+0x181/frame 0xfffffe01064edec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe01064edef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe01064edf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01064edf30
--- trap 0x3407000, rip = 0xffffffff80aa32cf, rsp = 0, rbp = 0x3407000 ---
mi_startup() at mi_startup+0xdf/frame 0x3407000
db:0:kdb.enter.default>  show allpcpu
Current CPU: 0
...
curthread    = 0xfffffe010617c560: pid 0 tid 100206 critnest 1 "epair_task"
Comment 18 Alex Dupre freebsd_committer freebsd_triage 2023-01-16 06:47:09 UTC
(In reply to Austin Shafer from comment #16)

There isn't any other code commit between those two version, commit ad28aec4d627e45be486e052331eabbdf922cff7 is the last one for the 1.96 release.
Comment 19 Henry Hu 2023-03-19 22:40:19 UTC
Just upgraded to latest 14-CURRENT (1400083) and latest driver (1.98), and it just crashed in the same way:

> sudo kgdb -n 4
...
Reading symbols from /usr/obj/usr/src/amd64.amd64/sys/MYKERNEL/kernel.full...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 13; apic id = 3a
fault virtual address   = 0x10007
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff808c8cd0
stack pointer           = 0x28:0xfffffe015a39dd40
frame pointer           = 0x28:0xfffffe015a39dd90
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (re0 taskq)
rdi: fffff8036513a6f0 rsi: ffffffff8112ea80 rdx:       d61f187b6b
rcx:                1  r8: ffffffff80982ba0  r9:               10
rax:                0 rbx:             ffff rbp: fffffe015a39dd90
r10:                0 r11:                a r12:             8803
r13:             ffff r14: fffffe0110b07900 r15:                0
trap number             = 12
panic: page fault
cpuid = 13
time = 1679265285
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015a39d900
kdb_backtrace() at kdb_backtrace+0x37/frame 0xfffffe015a39d9b0
vpanic() at vpanic+0x184/frame 0xfffffe015a39da10
panic() at panic+0x43/frame 0xfffffe015a39da70
trap_fatal() at trap_fatal+0x3fa/frame 0xfffffe015a39dad0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe015a39db30
trap() at trap+0x262/frame 0xfffffe015a39dc50
trap_check() at trap_check+0x29/frame 0xfffffe015a39dc70
calltrap() at calltrap+0x8/frame 0xfffffe015a39dc70
--- trap 0xc, rip = 0xffffffff808c8cd0, rsp = 0xfffffe015a39dd40, rbp = 0xfffffe015a39dd90 ---
ether_input() at ether_input+0x50/frame 0xfffffe015a39dd90
re_rxeof() at re_rxeof+0x442/frame 0xfffffe015a39ddf0
re_int_task_8125() at re_int_task_8125+0x137/frame 0xfffffe015a39de30
taskqueue_run_locked() at taskqueue_run_locked+0x1c0/frame 0xfffffe015a39dec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xc3/frame 0xfffffe015a39def0
fork_exit() at fork_exit+0x7d/frame 0xfffffe015a39df30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe015a39df30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 3m34s
Dumping 2218 out of 32492 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59
59              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) where
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59
#1  dump_savectx () at /usr/src/sys/kern/kern_shutdown.c:403
#2  0xffffffff807a5948 in dumpsys (di=0x0) at /usr/src/sys/x86/include/dump.h:87
#3  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:432
#4  kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:539
#5  0xffffffff807a5eb2 in vpanic (fmt=0xffffffff80bc4280 "%s", ap=0xfffffe015a39da50) at /usr/src/sys/kern/kern_shutdown.c:983
#6  0xffffffff807a5cb3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:907
#7  0xffffffff80b6a1ca in trap_fatal (frame=0xfffffe015a39dc80, eva=65543) at /usr/src/sys/amd64/amd64/trap.c:954
#8  0xffffffff80b6a21f in trap_pfault (frame=frame@entry=0xfffffe015a39dc80, usermode=false, signo=signo@entry=0x0, ucode=ucode@entry=0x0) at /usr/src/sys/amd64/amd64/trap.c:762
#9  0xffffffff80b698a2 in trap (frame=frame@entry=0xfffffe015a39dc80) at /usr/src/sys/amd64/amd64/trap.c:444
#10 0xffffffff80b6a569 in trap_check (frame=0xfffffe015a39dc80) at /usr/src/sys/amd64/amd64/trap.c:666
#11 <signal handler called>
#12 ether_input (ifp=<optimized out>, m=0xffff) at /usr/src/sys/net/if_ethersubr.c:822
#13 0xffffffff85520ea2 in re_rxeof () from /boot/modules/if_re.ko
#14 0xffffffff8550dad7 in re_int_task_8125 () from /boot/modules/if_re.ko
#15 0xffffffff80808f70 in taskqueue_run_locked (queue=queue@entry=0xfffff8001875be00) at /usr/src/sys/kern/subr_taskqueue.c:514
#16 0xffffffff8080a1b3 in taskqueue_thread_loop (arg=arg@entry=0xfffffe0158926238) at /usr/src/sys/kern/subr_taskqueue.c:826
#17 0xffffffff80761b4d in fork_exit (callout=0xffffffff8080a0f0 <taskqueue_thread_loop>, arg=0xfffffe0158926238, frame=0xfffffe015a39df40) at /usr/src/sys/kern/kern_fork.c:1102
#18 <signal handler called>
Comment 20 Henry Hu 2023-05-08 03:20:33 UTC
Now, I'm not really sure if it's a realtek issue.
Today I've got another panic which has nothing to do with realtek, it seems like:

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 15; apic id = 3e
fault virtual address   = 0x10007
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff8084c423
stack pointer           = 0x28:0xfffffe015f87dbc0
frame pointer           = 0x28:0xfffffe015f87dbc0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 2956 (MainThread)
rdi: fffff8011320d940 rsi: fffff8026d21eb00 rdx:           1fffff
rcx:             ffff  r8:                0  r9:                0
rax:             ffff rbx: fffff801135e7600 rbp: fffffe015f87dbc0
r10:                0 r11: fffffe015f773540 r12: fffffe015f773020
r13: fffff8011320d780 r14:                0 r15: fffffe015f773020
trap number             = 12
panic: page fault
cpuid = 15
time = 1683507753
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015f87d970
vpanic() at vpanic+0x185/frame 0xfffffe015f87d9d0
panic() at panic+0x43/frame 0xfffffe015f87da30
trap_fatal() at trap_fatal+0x3fa/frame 0xfffffe015f87da90
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe015f87daf0
calltrap() at calltrap+0x8/frame 0xfffffe015f87daf0
--- trap 0xc, rip = 0xffffffff8084c423, rsp = 0xfffffe015f87dbc0, rbp = 0xfffffe015f87dbc0 ---
sbappend_locked() at sbappend_locked+0x43/frame 0xfffffe015f87dbc0
uipc_send() at uipc_send+0x302/frame 0xfffffe015f87dc30
sosend_generic() at sosend_generic+0x605/frame 0xfffffe015f87dd00
sousrsend() at sousrsend+0x62/frame 0xfffffe015f87dd60
dofilewrite() at dofilewrite+0x88/frame 0xfffffe015f87ddb0
sys_writev() at sys_writev+0x6e/frame 0xfffffe015f87de00
amd64_syscall() at amd64_syscall+0x117/frame 0xfffffe015f87df30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe015f87df30
--- syscall (121, FreeBSD ELF64, writev), rip = 0x82b38b1da, rsp = 0x820d37d88, rbp = 0x820d37dc0 ---
Uptime: 6h46m32s
Dumping 2502 out of 32488 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59
59              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) up
#1  doadump (textdump=textdump@entry=1) at /usr/src/sys/kern/kern_shutdown.c:407
407             dump_savectx();
(kgdb)
#2  0xffffffff807aa972 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:528
528                     doadump(TRUE);
(kgdb)
#3  0xffffffff807aaea3 in vpanic (fmt=0xffffffff80bef7d1 "%s", ap=0xfffffe015f87da10) at /usr/src/sys/kern/kern_shutdown.c:972
972             kern_reboot(bootopt);
(kgdb)
#4  0xffffffff807aaca3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:896
896             vpanic(fmt, ap);
(kgdb)
#5  0xffffffff80b8331a in trap_fatal (frame=0xfffffe015f87db00, eva=65543) at /usr/src/sys/amd64/amd64/trap.c:954
954             panic("%s", type < nitems(trap_msg) ? trap_msg[type] :
(kgdb)
#6  0xffffffff80b8336f in trap_pfault (frame=0xfffffe015f87db00, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:762
762                     if (td->td_critnest != 0 ||
(kgdb)
#7  <signal handler called>
(kgdb)
#8  sbappend_locked (sb=0xfffff8011320d940, m=m@entry=0xfffff8026d21eb00, flags=flags@entry=0) at /usr/src/sys/kern/uipc_sockbuf.c:917
917                     while (n->m_nextpkt)
(kgdb) p n
$1 = (struct mbuf *) 0xffff
(kgdb) l
912             kmsan_check_mbuf(m, "sbappend");
913             sbm_clrprotoflags(m, flags);
914             SBLASTRECORDCHK(sb);
915             n = sb->sb_mb;
916             if (n) {
917                     while (n->m_nextpkt)
918                             n = n->m_nextpkt;
919                     do {
920                             if (n->m_flags & M_EOR) {
921                                     sbappendrecord_locked(sb, m); /* XXXXXX!!!! */
(kgdb) p sb->sb_mb
$2 = (struct mbuf *) 0xfffff8010d9d6a00
(kgdb) p sb->sb_mb->m_nextpkt
$3 = (struct mbuf *) 0xffff
(kgdb) p *sb->sb_mb
$4 = {{m_next = 0x0, m_slist = {sle_next = 0x0}, m_stailq = {stqe_next = 0x0}}, {m_nextpkt = 0xffff, m_slistpkt = {sle_next = 0xffff}, m_stailqpkt = {stqe_next = 0xffff}}, m_data = 0xfffff8010d9d6a20 "\"", m_len = 64, m_type = 1, m_flags = 0, {{{m_pkthdr = {{snd_tag = 0xf8080100110022,
            rcvif = 0xf8080100110022, {rcvidx = 34, rcvgen = 17}}, {leaf_rcvif = 0x0, {leaf_rcvidx = 0, leaf_rcvgen = 0}}, tags = {slh_first = 0x0}, len = 0, flowid = 0, csum_flags = 1114146, fibnum = 0, numa_domain = 0 '\000', rsstype = 0 '\000', {rcv_tstmp = 0, {l2hlen = 0 '\000', l3hlen = 0 '\000',
              l4hlen = 0 '\000', l5hlen = 0 '\000', inner_l2hlen = 0 '\000', inner_l3hlen = 0 '\000', inner_l4hlen = 0 '\000', inner_l5hlen = 0 '\000'}}, PH_per = {eight = "\000\000\000\000\000\000\000", sixteen = {0, 0, 0, 0}, thirtytwo = {0, 0}, sixtyfour = {0}, unintptr = {0}, ptr = 0x0}, {PH_loc = {
              eight = "\000\000\000\000\000\000\000", sixteen = {0, 0, 0, 0}, thirtytwo = {0, 0}, sixtyfour = {0}, unintptr = {0}, ptr = 0x0}, memlen = 0}}, {m_epg_npgs = 34 '"', m_epg_nrdy = 0 '\000', m_epg_hdrlen = 17 '\021', m_epg_trllen = 0 '\000', m_epg_1st_off = 2049, m_epg_last_len = 248,
          m_epg_flags = 0 '\000', m_epg_record_type = 0 '\000', __spare = "\000", m_epg_enc_cnt = 0, m_epg_tls = 0x0, m_epg_so = 0x0, m_epg_seqno = 1114146, m_epg_stailq = {stqe_next = 0x0}}}, {m_ext = {{ext_count = 69206042, ext_cnt = 0x800000420001a}, ext_size = 0, ext_type = 3, ext_flags = 1, {{
              ext_buf = 0xfffff8026d5f4000 "", ext_arg2 = 0x0}, {extpg_pa = {18446735288041422848, 0, 3699290600556744276, 16098143292755828101, 56311366815872},
              extpg_trail = "\001\001\b\n\242\361\212G\2102\2037\000\000\000\a\000\000\000\000\000\000\000\005\000\000\000,\000\000\000D\000\000\000t\000\000\000\221\000\000\003\230", '\000' <repeats 11 times>, "\b\000 \000\000", <incomplete sequence \347>,
              extpg_hdr = "\232\204光芒》-田中", <incomplete sequence \346\240>}}, ext_free = 0x0, ext_arg1 = 0x0}, m_pktdat = 0xfffff8010d9d6a60 "\032"}}, m_dat = 0xfffff8010d9d6a20 "\""}}
(kgdb)

so, again, something has overwritten m->nextpkt to 0xffff, the same symptom as what we've observed above. Maybe it's still done by realtek driver, maybe not.
Comment 21 Henry Hu 2023-10-22 15:50:34 UTC
I've got a new machine and moved FreeBSD to the new machine, which has an Intel I226-V NIC (supported by igc and works well), so this problem no longer bothers me.
Comment 22 Austin Shafer 2024-01-18 23:42:58 UTC
Very late update, but I finally got around to testing v1.98 and it resolved the panic I saw here. v1.99 panics for me, but that's tracked by another bug.