| Summary: | net/realtek-re-kmod: frequent kernel panic | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Ports & Packages | Reporter: | Henry Hu <henry.hu.sh> | ||||||
| Component: | Individual Port(s) | Assignee: | Alex Dupre <ale> | ||||||
| Status: | Closed Overcome By Events | ||||||||
| Severity: | Affects Only Me | CC: | ashafer, grahamperrin, jbolla, kib | ||||||
| Priority: | --- | Keywords: | crash | ||||||
| Version: | Latest | Flags: | bugzilla:
maintainer-feedback?
(ale) ale: maintainer-feedback? (kib) |
||||||
| Hardware: | Any | ||||||||
| OS: | Any | ||||||||
| URL: | https://www.freshports.org/net/realtek-re-kmod/ | ||||||||
| Attachments: |
|
||||||||
|
Description
Henry Hu
2022-11-11 01:22:59 UTC
okay, it just crashed again and I got a better trace:
(kgdb) where
#0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59
#1 dump_savectx () at /usr/src/sys/kern/kern_shutdown.c:405
#2 0xffffffff807a59d8 in dumpsys (di=0x0) at /usr/src/sys/x86/include/dump.h:87
#3 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:434
#4 kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:541
#5 0xffffffff807a5ef2 in vpanic (fmt=0xffffffff80bbdc7d "%s", ap=0xfffffe014db56a50) at /usr/src/sys/kern/kern_shutdown.c:979
#6 0xffffffff807a5cf3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:903
#7 0xffffffff80b63f9a in trap_fatal (frame=0xfffffe014db56c80, eva=65543) at /usr/src/sys/amd64/amd64/trap.c:955
#8 0xffffffff80b63fef in trap_pfault (frame=frame@entry=0xfffffe014db56c80, usermode=false, signo=signo@entry=0x0, ucode=ucode@entry=0x0) at /usr/src/sys/amd64/amd64/trap.c:763
#9 0xffffffff80b63672 in trap (frame=frame@entry=0xfffffe014db56c80) at /usr/src/sys/amd64/amd64/trap.c:445
#10 0xffffffff80b64339 in trap_check (frame=0xfffffe014db56c80) at /usr/src/sys/amd64/amd64/trap.c:667
#11 <signal handler called>
#12 ether_input (ifp=<optimized out>, m=0xffff) at /usr/src/sys/net/if_ethersubr.c:822
#13 0xffffffff85387ea2 in re_rxeof (sc=0xfffffe015aaf0000) at if_re.c:7089
#14 0xffffffff85374ad7 in re_int_task_8125 (arg=0xfffffe015aaf0000, npending=1) at if_re.c:7315
#15 0xffffffff80808d50 in taskqueue_run_locked (queue=queue@entry=0xfffff8012b3f4600) at /usr/src/sys/kern/subr_taskqueue.c:514
#16 0xffffffff80809f93 in taskqueue_thread_loop (arg=arg@entry=0xfffffe015aaf2238) at /usr/src/sys/kern/subr_taskqueue.c:826
#17 0xffffffff80761ccd in fork_exit (callout=0xffffffff80809ed0 <taskqueue_thread_loop>, arg=0xfffffe015aaf2238, frame=0xfffffe014db56f40) at /usr/src/sys/kern/kern_fork.c:1102
#18 <signal handler called>
#19 0x000032d58f8d6eea in ?? ()
Backtrace stopped: Cannot access memory at address 0x32d5a7eb8f48
(kgdb) frame 13
#13 0xffffffff85387ea2 in re_rxeof (sc=0xfffffe015aaf0000) at if_re.c:7089
7089 RE_UNLOCK(sc);
(kgdb) p *m
$12 = {{m_next = 0x0, m_slist = {sle_next = 0x0}, m_stailq = {stqe_next = 0x0}}, {m_nextpkt = 0x0, m_slistpkt = {sle_next = 0x0}, m_stailqpkt = {stqe_next = 0x0}}, m_data = 0xfffffe019896400e "E", m_len = 216, m_type = 1, m_flags = 19, {{{m_pkthdr = {{snd_tag = 0xfffff8013dcef800, rcvif = 0xfffff8013dcef800,
{rcvidx = 63488, rcvgen = 15822}}, {leaf_rcvif = 0x0, {leaf_rcvidx = 0, leaf_rcvgen = 0}}, tags = {slh_first = 0x0}, len = 216, flowid = 0, csum_flags = 251658240, fibnum = 0, numa_domain = 255 '\377', rsstype = 0 '\000', {rcv_tstmp = 0, {l2hlen = 0 '\000', l3hlen = 0 '\000', l4hlen = 0 '\000',
l5hlen = 0 '\000', inner_l2hlen = 0 '\000', inner_l3hlen = 0 '\000', inner_l4hlen = 0 '\000', inner_l5hlen = 0 '\000'}}, PH_per = {eight = "\000\000\000\000\377\377\000", sixteen = {0, 0, 65535, 0}, thirtytwo = {0, 65535}, sixtyfour = {281470681743360}, unintptr = {281470681743360},
ptr = 0xffff00000000}, {PH_loc = {eight = "\000\000\000\000\000\000\000", sixteen = {0, 0, 0, 0}, thirtytwo = {0, 0}, sixtyfour = {0}, unintptr = {0}, ptr = 0x0}, memlen = 0}}, {m_epg_npgs = 0 '\000', m_epg_nrdy = 248 '\370', m_epg_hdrlen = 206 '\316', m_epg_trllen = 61 '=',
m_epg_1st_off = 63489, m_epg_last_len = 65535, m_epg_flags = 0 '\000', m_epg_record_type = 0 '\000', __spare = "\000", m_epg_enc_cnt = 0, m_epg_tls = 0x0, m_epg_so = 0xd8, m_epg_seqno = 71776119312875520, m_epg_stailq = {stqe_next = 0x0}}}, {m_ext = {{ext_count = 1, ext_cnt = 0x38fff2eb00000001},
ext_size = 9216, ext_type = 4, ext_flags = 1, {{ext_buf = 0xfffffe0198964000 "\377\377\377\377\377\377,\364\062%\032\v\b", ext_arg2 = 0x0}, {extpg_pa = {18446741881541246976, 0, 14418586273075800256, 8565010310643185937, 149001056032896},
extpg_trail = "\001\001\b\n\302\353\301\270\244ˡ\330status\n*\177\373\031\006%", '\000' <repeats 15 times>, "(", '\000' <repeats 22 times>, extpg_hdr = "\000\000\000\000\000\000\000\000\071\002;\224\256踸f\000\020/\253s\273"}}, ext_free = 0x0, ext_arg1 = 0x0},
m_pktdat = 0xfffff8013db9f660 "\001"}}, m_dat = 0xfffff8013db9f620 ""}}
The mbuf seems to be reasonable at callsite...
What is exact content of the kernel panic messages? Provide consistent panic message + kgdb backtrace, i.e. show them both from the same panic. Also pls show me the driver code around if_re.c:7089, in your sources. Here is the code: https://github.com/alexdupre/rtl_bsd_drv Both rows (7089 and 7315) are plain RE_UNLOCK(sc) (In reply to Alex Dupre from comment #4) It cannot be that line, there is no call to either ether_input() or ifp->if_input() right before it. This is why I asked. True, it looks like there is a 4-5 lines offset, but that means it's a different code. Message (from a brand new panic): Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 13; apic id = 3a fault virtual address = 0x10007 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff808c6e70 stack pointer = 0x28:0xfffffe014dd8ad40 frame pointer = 0x28:0xfffffe014dd8ad90 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (re0 taskq) rdi: fffffe015e1fa090 rsi: 0 rdx: 0 rcx: 1 r8: fffffe015e1f9cd8 r9: 0 rax: 0 rbx: ffff rbp: fffffe014dd8ad90 r10: 0 r11: 6 r12: 8803 r13: ffff r14: fffffe014f7ecac0 r15: 0 trap number = 12 panic: page fault cpuid = 13 time = 1668735340 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe014dd8a900 kdb_backtrace() at kdb_backtrace+0x37/frame 0xfffffe014dd8a9b0 vpanic() at vpanic+0x184/frame 0xfffffe014dd8aa10 panic() at panic+0x43/frame 0xfffffe014dd8aa70 trap_fatal() at trap_fatal+0x3fa/frame 0xfffffe014dd8aad0 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe014dd8ab30 trap() at trap+0x262/frame 0xfffffe014dd8ac50 trap_check() at trap_check+0x29/frame 0xfffffe014dd8ac70 calltrap() at calltrap+0x8/frame 0xfffffe014dd8ac70 --- trap 0xc, rip = 0xffffffff808c6e70, rsp = 0xfffffe014dd8ad40, rbp = 0xfffffe014dd8ad90 --- ether_input() at ether_input+0x50/frame 0xfffffe014dd8ad90 re_rxeof() at re_rxeof+0x442/frame 0xfffffe014dd8adf0 re_int_task_8125() at re_int_task_8125+0x137/frame 0xfffffe014dd8ae30 taskqueue_run_locked() at taskqueue_run_locked+0x1c0/frame 0xfffffe014dd8aec0 taskqueue_thread_loop() at taskqueue_thread_loop+0xc3/frame 0xfffffe014dd8aef0 fork_exit() at fork_exit+0x7d/frame 0xfffffe014dd8af30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe014dd8af30 --- trap 0xc, rip = 0x32e2e6377eea, rsp = 0x32e2f5ae5f48, rbp = 0x32e2f5ae5f60 --- Uptime: 3m32s Dumping 2165 out of 32500 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59 59 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, regarding the code: I've not modified the code. The debug info may not be very accurate, but I think it's actually the ifp->if_input call at line 7097. I added
if (m->m_nextpkt && (int64_t)m->m_nextpkt < 0x1000000) {
printf("Wrong offset! m->m_nextpkt=%p\n", m->m_nextpkt);
m->m_nextpkt = 0;
}
before the if_input call, and it really captured an error:
Wrong offset! m->m_nextpkt=0xffff
Which means that somehow m->m_nextpkt was 0xffff, before the mbuf is sent to ether_input.
I thought 0xffff may come from the checksum code just before the if_input call. I'm wrong. I've added a few debug statements (as early as just after line 7046), and m->m_nextpkt is already 0xffff there. Given that m is initialized in line 7040, it might be earlier than that. I'll add more debug statements. Created attachment 239310 [details]
realtek-re-kmod v1.97 panics
I've seen something similar to this, although not the same. I've attached a couple different core.txt.* files for reference. Version 1.97 of realtek-re-kmod is unstable.
Initially I didn't have any problems with v1.97, ran totally fine for a couple weeks straight without issue. I go to open a folder in thunderbird and my system hangs. Found that odd so I restarted, and that's when my machine started hanging every 5 minutes (give or take). Constant panics, even if I just let the machine reboot and didn't touch anything it would still hang.
Two main panic reasons:
"panic: Memory modified after free 0xfffff8018fcd0a00(256) val=deadf5ee @ 0xfffff8018fcd0a08"
"Fatal trap 12: page fault while in kernel mode" in ether_input()
As hinted by the first panic and the panic originally mentioned in this bug, I think there is a use after free hanging around somewhere. Most likely something allocated by UMA, since looked like the part of the code failing validation and seeing the deadf5ee value.
The good news is that realtek-re-kmod version 196.04 works fine, so anyone else running into this can just roll back to that version like I did.
I've just pushed the version v1.98, I don't think there is anything that will fix your issue, but can you give it a try? If it continues to panic, can you also try to change the port to use the commit hash 2c11277, that is almost the stock driver, and tell me if it still has the issue? Thanks. Thanks I'll give it a try when I get the chance, although the computer I reproduce the issue on is my daily driver for work so I'm not always able to experiment ont it. Tested with 2c11277 and still saw the crashes. I didn't test with 1.98 though. The exact 196.04 port hash that is "known good" for me is: commit ad28aec4d627e45be486e052331eabbdf922cff7 (HEAD) Author: John Baldwin <jhb@FreeBSD.org> Date: Fri May 20 10:12:21 2022 -0700 net/realtek-re-kmod: Remove unused DRIVER_MODULE devclass on recent main. Reviewed by: ale (maintainer) Differential Revision: https://reviews.freebsd.org/D35203 And by system is: FreeBSD mick 14.0-CURRENT FreeBSD 14.0-CURRENT #18 main-n259955-45396fda8b73: Fri Jan 6 11:14:38 EST 2023 root@mick:/usr/obj/usr/freebsd-src/amd64.amd64/sys/GENERIC amd64 I did actually set the version to 196.04 when my ports tree was at 1.97, and saw the crashes. So that is interesting. I unfortunately didn't record the port commit I was on when I did that, my best guess is commit cd6de38cc36145f3d222623da1bc61060d1cdf16. I don't have time to validate that today. From peeking at it in kgdb it looks like corruption in something allocated out of zone_mbuf in UMA: #9 0xffffffff80bec683 in m_get (how=2, type=1) at /usr/freebsd-src/sys/sys/mbuf.h:1000 1000 m = uma_zalloc_arg(zone_mbuf, &args, how); (kgdb) l 995 struct mbuf *m; 996 struct mb_args args; 997 998 args.flags = 0; 999 args.type = type; 1000 m = uma_zalloc_arg(zone_mbuf, &args, how); 1001 MBUF_PROBE3(m__get, how, type, m); 1002 return (m); 1003 } The validation that fails and panics is kicked off by this uma_zalloc_arg call I think. I do still think this is something going wrong on if_re's end since I've tried with 13.1-STABLE and other CURRENT versions, but only realtek re v196.04 actually solves the panics. Setting the PORTVERSION doesn't actually make any change, what's important is the commit hash you are using. If you tried the 2c11277, then you were already using the 1.98 version (whatever you set in the port's Makefile), and you should have seen it from the kernel message at startup. If this is confirmed, it means that the issue is in the stock kernel driver and not in our patches. @kib: any idea? Apparently a similar issue is present also on the 1.96 release, according to https://www.truenas.com/community/threads/new-truenas-core-build-crashing-and-rebooting.106862/ Perhaps later versions trigger it more frequently under certain conditions. I think it's something introduced between commit ad28aec4d627e45be486e052331eabbdf922cff7 and when the version was bumped to 1.97. Like I said, I reproduced it with 1.96 too in that range. I'll see if I can get around to bisecting it eventually. Created attachment 239499 [details]
196.04_3 binary that also crashed
I'm uploading this binary that showed a similar crash in case that helps determine versions with the behavior. Here are some example call stacks:
db:0:kdb.enter.default> bt
Tracing pid 12 tid 100045 td 0xfffffe00e09aa740
kdb_enter() at kdb_enter+0x37/frame 0xfffffe00e06da650
vpanic() at vpanic+0x1b0/frame 0xfffffe00e06da6a0
panic() at panic+0x43/frame 0xfffffe00e06da700
trap_fatal() at trap_fatal+0x385/frame 0xfffffe00e06da760
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00e06da7c0
calltrap() at calltrap+0x8/frame 0xfffffe00e06da7c0
--- trap 0xc, rip = 0xffffffff80bbb558, rsp = 0xfffffe00e06da890, rbp = 0xfffffe00e06da8d0 ---
sbcut_internal() at sbcut_internal+0xa8/frame 0xfffffe00e06da8d0
tcp_do_segment() at tcp_do_segment+0x18c8/frame 0xfffffe00e06da9b0
tcp_input_with_port() at tcp_input_with_port+0xb61/frame 0xfffffe00e06daae0
tcp_input() at tcp_input+0xb/frame 0xfffffe00e06daaf0
ip_input() at ip_input+0x11f/frame 0xfffffe00e06dab80
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe00e06dabd0
ether_demux() at ether_demux+0x138/frame 0xfffffe00e06dac00
ether_nh_input() at ether_nh_input+0x355/frame 0xfffffe00e06dac60
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe00e06dacb0
ether_input() at ether_input+0x69/frame 0xfffffe00e06dad10
re_rxeof() at re_rxeof+0x2ad/frame 0xfffffe00e06dad80
re_int_task_8125() at re_int_task_8125+0xb4/frame 0xfffffe00e06dadc0
taskqueue_run_locked() at taskqueue_run_locked+0x181/frame 0xfffffe00e06dae40
taskqueue_run() at taskqueue_run+0x68/frame 0xfffffe00e06dae60
ithread_loop() at ithread_loop+0x25a/frame 0xfffffe00e06daef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe00e06daf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e06daf30
--- trap 0x80b7b350, rip = 0xffffffff80aa32cf, rsp = 0, rbp = 0x3408000 ---
mi_startup() at mi_startup+0xdf/frame 0x3408000
db:0:kdb.enter.default> show allpcpu
Current CPU: 4
...
curthread = 0xfffffe00e09aa740: pid 12 tid 100045 critnest 1 "swi5: fast taskq"
------------------------------------------------------
db:0:kdb.enter.default> bt
Tracing pid 0 tid 100206 td 0xfffffe010617c560
kdb_enter() at kdb_enter+0x37/frame 0xfffffe01064edab0
vpanic() at vpanic+0x1b0/frame 0xfffffe01064edb00
panic() at panic+0x43/frame 0xfffffe01064edb60
trap_fatal() at trap_fatal+0x385/frame 0xfffffe01064edbc0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe01064edc20
calltrap() at calltrap+0x8/frame 0xfffffe01064edc20
--- trap 0xc, rip = 0xffffffff80c55c6c, rsp = 0xfffffe01064edcf0, rbp = 0xfffffe01064edd40 ---
ether_nh_input() at ether_nh_input+0x1c/frame 0xfffffe01064edd40
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe01064edd90
ether_input() at ether_input+0x69/frame 0xfffffe01064eddf0
epair_tx_start_deferred() at epair_tx_start_deferred+0x177/frame 0xfffffe01064ede40
taskqueue_run_locked() at taskqueue_run_locked+0x181/frame 0xfffffe01064edec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe01064edef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe01064edf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01064edf30
--- trap 0x3407000, rip = 0xffffffff80aa32cf, rsp = 0, rbp = 0x3407000 ---
mi_startup() at mi_startup+0xdf/frame 0x3407000
db:0:kdb.enter.default> show allpcpu
Current CPU: 0
...
curthread = 0xfffffe010617c560: pid 0 tid 100206 critnest 1 "epair_task"
(In reply to Austin Shafer from comment #16) There isn't any other code commit between those two version, commit ad28aec4d627e45be486e052331eabbdf922cff7 is the last one for the 1.96 release. Just upgraded to latest 14-CURRENT (1400083) and latest driver (1.98), and it just crashed in the same way: > sudo kgdb -n 4 ... Reading symbols from /usr/obj/usr/src/amd64.amd64/sys/MYKERNEL/kernel.full... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 13; apic id = 3a fault virtual address = 0x10007 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff808c8cd0 stack pointer = 0x28:0xfffffe015a39dd40 frame pointer = 0x28:0xfffffe015a39dd90 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (re0 taskq) rdi: fffff8036513a6f0 rsi: ffffffff8112ea80 rdx: d61f187b6b rcx: 1 r8: ffffffff80982ba0 r9: 10 rax: 0 rbx: ffff rbp: fffffe015a39dd90 r10: 0 r11: a r12: 8803 r13: ffff r14: fffffe0110b07900 r15: 0 trap number = 12 panic: page fault cpuid = 13 time = 1679265285 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015a39d900 kdb_backtrace() at kdb_backtrace+0x37/frame 0xfffffe015a39d9b0 vpanic() at vpanic+0x184/frame 0xfffffe015a39da10 panic() at panic+0x43/frame 0xfffffe015a39da70 trap_fatal() at trap_fatal+0x3fa/frame 0xfffffe015a39dad0 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe015a39db30 trap() at trap+0x262/frame 0xfffffe015a39dc50 trap_check() at trap_check+0x29/frame 0xfffffe015a39dc70 calltrap() at calltrap+0x8/frame 0xfffffe015a39dc70 --- trap 0xc, rip = 0xffffffff808c8cd0, rsp = 0xfffffe015a39dd40, rbp = 0xfffffe015a39dd90 --- ether_input() at ether_input+0x50/frame 0xfffffe015a39dd90 re_rxeof() at re_rxeof+0x442/frame 0xfffffe015a39ddf0 re_int_task_8125() at re_int_task_8125+0x137/frame 0xfffffe015a39de30 taskqueue_run_locked() at taskqueue_run_locked+0x1c0/frame 0xfffffe015a39dec0 taskqueue_thread_loop() at taskqueue_thread_loop+0xc3/frame 0xfffffe015a39def0 fork_exit() at fork_exit+0x7d/frame 0xfffffe015a39df30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe015a39df30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Uptime: 3m34s Dumping 2218 out of 32492 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59 59 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) where #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59 #1 dump_savectx () at /usr/src/sys/kern/kern_shutdown.c:403 #2 0xffffffff807a5948 in dumpsys (di=0x0) at /usr/src/sys/x86/include/dump.h:87 #3 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:432 #4 kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:539 #5 0xffffffff807a5eb2 in vpanic (fmt=0xffffffff80bc4280 "%s", ap=0xfffffe015a39da50) at /usr/src/sys/kern/kern_shutdown.c:983 #6 0xffffffff807a5cb3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:907 #7 0xffffffff80b6a1ca in trap_fatal (frame=0xfffffe015a39dc80, eva=65543) at /usr/src/sys/amd64/amd64/trap.c:954 #8 0xffffffff80b6a21f in trap_pfault (frame=frame@entry=0xfffffe015a39dc80, usermode=false, signo=signo@entry=0x0, ucode=ucode@entry=0x0) at /usr/src/sys/amd64/amd64/trap.c:762 #9 0xffffffff80b698a2 in trap (frame=frame@entry=0xfffffe015a39dc80) at /usr/src/sys/amd64/amd64/trap.c:444 #10 0xffffffff80b6a569 in trap_check (frame=0xfffffe015a39dc80) at /usr/src/sys/amd64/amd64/trap.c:666 #11 <signal handler called> #12 ether_input (ifp=<optimized out>, m=0xffff) at /usr/src/sys/net/if_ethersubr.c:822 #13 0xffffffff85520ea2 in re_rxeof () from /boot/modules/if_re.ko #14 0xffffffff8550dad7 in re_int_task_8125 () from /boot/modules/if_re.ko #15 0xffffffff80808f70 in taskqueue_run_locked (queue=queue@entry=0xfffff8001875be00) at /usr/src/sys/kern/subr_taskqueue.c:514 #16 0xffffffff8080a1b3 in taskqueue_thread_loop (arg=arg@entry=0xfffffe0158926238) at /usr/src/sys/kern/subr_taskqueue.c:826 #17 0xffffffff80761b4d in fork_exit (callout=0xffffffff8080a0f0 <taskqueue_thread_loop>, arg=0xfffffe0158926238, frame=0xfffffe015a39df40) at /usr/src/sys/kern/kern_fork.c:1102 #18 <signal handler called> Now, I'm not really sure if it's a realtek issue. Today I've got another panic which has nothing to do with realtek, it seems like: Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 15; apic id = 3e fault virtual address = 0x10007 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8084c423 stack pointer = 0x28:0xfffffe015f87dbc0 frame pointer = 0x28:0xfffffe015f87dbc0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2956 (MainThread) rdi: fffff8011320d940 rsi: fffff8026d21eb00 rdx: 1fffff rcx: ffff r8: 0 r9: 0 rax: ffff rbx: fffff801135e7600 rbp: fffffe015f87dbc0 r10: 0 r11: fffffe015f773540 r12: fffffe015f773020 r13: fffff8011320d780 r14: 0 r15: fffffe015f773020 trap number = 12 panic: page fault cpuid = 15 time = 1683507753 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015f87d970 vpanic() at vpanic+0x185/frame 0xfffffe015f87d9d0 panic() at panic+0x43/frame 0xfffffe015f87da30 trap_fatal() at trap_fatal+0x3fa/frame 0xfffffe015f87da90 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe015f87daf0 calltrap() at calltrap+0x8/frame 0xfffffe015f87daf0 --- trap 0xc, rip = 0xffffffff8084c423, rsp = 0xfffffe015f87dbc0, rbp = 0xfffffe015f87dbc0 --- sbappend_locked() at sbappend_locked+0x43/frame 0xfffffe015f87dbc0 uipc_send() at uipc_send+0x302/frame 0xfffffe015f87dc30 sosend_generic() at sosend_generic+0x605/frame 0xfffffe015f87dd00 sousrsend() at sousrsend+0x62/frame 0xfffffe015f87dd60 dofilewrite() at dofilewrite+0x88/frame 0xfffffe015f87ddb0 sys_writev() at sys_writev+0x6e/frame 0xfffffe015f87de00 amd64_syscall() at amd64_syscall+0x117/frame 0xfffffe015f87df30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe015f87df30 --- syscall (121, FreeBSD ELF64, writev), rip = 0x82b38b1da, rsp = 0x820d37d88, rbp = 0x820d37dc0 --- Uptime: 6h46m32s Dumping 2502 out of 32488 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59 59 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) up #1 doadump (textdump=textdump@entry=1) at /usr/src/sys/kern/kern_shutdown.c:407 407 dump_savectx(); (kgdb) #2 0xffffffff807aa972 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:528 528 doadump(TRUE); (kgdb) #3 0xffffffff807aaea3 in vpanic (fmt=0xffffffff80bef7d1 "%s", ap=0xfffffe015f87da10) at /usr/src/sys/kern/kern_shutdown.c:972 972 kern_reboot(bootopt); (kgdb) #4 0xffffffff807aaca3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:896 896 vpanic(fmt, ap); (kgdb) #5 0xffffffff80b8331a in trap_fatal (frame=0xfffffe015f87db00, eva=65543) at /usr/src/sys/amd64/amd64/trap.c:954 954 panic("%s", type < nitems(trap_msg) ? trap_msg[type] : (kgdb) #6 0xffffffff80b8336f in trap_pfault (frame=0xfffffe015f87db00, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:762 762 if (td->td_critnest != 0 || (kgdb) #7 <signal handler called> (kgdb) #8 sbappend_locked (sb=0xfffff8011320d940, m=m@entry=0xfffff8026d21eb00, flags=flags@entry=0) at /usr/src/sys/kern/uipc_sockbuf.c:917 917 while (n->m_nextpkt) (kgdb) p n $1 = (struct mbuf *) 0xffff (kgdb) l 912 kmsan_check_mbuf(m, "sbappend"); 913 sbm_clrprotoflags(m, flags); 914 SBLASTRECORDCHK(sb); 915 n = sb->sb_mb; 916 if (n) { 917 while (n->m_nextpkt) 918 n = n->m_nextpkt; 919 do { 920 if (n->m_flags & M_EOR) { 921 sbappendrecord_locked(sb, m); /* XXXXXX!!!! */ (kgdb) p sb->sb_mb $2 = (struct mbuf *) 0xfffff8010d9d6a00 (kgdb) p sb->sb_mb->m_nextpkt $3 = (struct mbuf *) 0xffff (kgdb) p *sb->sb_mb $4 = {{m_next = 0x0, m_slist = {sle_next = 0x0}, m_stailq = {stqe_next = 0x0}}, {m_nextpkt = 0xffff, m_slistpkt = {sle_next = 0xffff}, m_stailqpkt = {stqe_next = 0xffff}}, m_data = 0xfffff8010d9d6a20 "\"", m_len = 64, m_type = 1, m_flags = 0, {{{m_pkthdr = {{snd_tag = 0xf8080100110022, rcvif = 0xf8080100110022, {rcvidx = 34, rcvgen = 17}}, {leaf_rcvif = 0x0, {leaf_rcvidx = 0, leaf_rcvgen = 0}}, tags = {slh_first = 0x0}, len = 0, flowid = 0, csum_flags = 1114146, fibnum = 0, numa_domain = 0 '\000', rsstype = 0 '\000', {rcv_tstmp = 0, {l2hlen = 0 '\000', l3hlen = 0 '\000', l4hlen = 0 '\000', l5hlen = 0 '\000', inner_l2hlen = 0 '\000', inner_l3hlen = 0 '\000', inner_l4hlen = 0 '\000', inner_l5hlen = 0 '\000'}}, PH_per = {eight = "\000\000\000\000\000\000\000", sixteen = {0, 0, 0, 0}, thirtytwo = {0, 0}, sixtyfour = {0}, unintptr = {0}, ptr = 0x0}, {PH_loc = { eight = "\000\000\000\000\000\000\000", sixteen = {0, 0, 0, 0}, thirtytwo = {0, 0}, sixtyfour = {0}, unintptr = {0}, ptr = 0x0}, memlen = 0}}, {m_epg_npgs = 34 '"', m_epg_nrdy = 0 '\000', m_epg_hdrlen = 17 '\021', m_epg_trllen = 0 '\000', m_epg_1st_off = 2049, m_epg_last_len = 248, m_epg_flags = 0 '\000', m_epg_record_type = 0 '\000', __spare = "\000", m_epg_enc_cnt = 0, m_epg_tls = 0x0, m_epg_so = 0x0, m_epg_seqno = 1114146, m_epg_stailq = {stqe_next = 0x0}}}, {m_ext = {{ext_count = 69206042, ext_cnt = 0x800000420001a}, ext_size = 0, ext_type = 3, ext_flags = 1, {{ ext_buf = 0xfffff8026d5f4000 "", ext_arg2 = 0x0}, {extpg_pa = {18446735288041422848, 0, 3699290600556744276, 16098143292755828101, 56311366815872}, extpg_trail = "\001\001\b\n\242\361\212G\2102\2037\000\000\000\a\000\000\000\000\000\000\000\005\000\000\000,\000\000\000D\000\000\000t\000\000\000\221\000\000\003\230", '\000' <repeats 11 times>, "\b\000 \000\000", <incomplete sequence \347>, extpg_hdr = "\232\204光芒》-田中", <incomplete sequence \346\240>}}, ext_free = 0x0, ext_arg1 = 0x0}, m_pktdat = 0xfffff8010d9d6a60 "\032"}}, m_dat = 0xfffff8010d9d6a20 "\""}} (kgdb) so, again, something has overwritten m->nextpkt to 0xffff, the same symptom as what we've observed above. Maybe it's still done by realtek driver, maybe not. I've got a new machine and moved FreeBSD to the new machine, which has an Intel I226-V NIC (supported by igc and works well), so this problem no longer bothers me. Very late update, but I finally got around to testing v1.98 and it resolved the panic I saw here. v1.99 panics for me, but that's tracked by another bug. |