Created attachment 243309 [details] re-driver-crash Hello all, I'm not sure if this should be filed for the base system or for the port given that my system needs the "net/realtek-re-kmod" to have a working ethernet connection. I'm using 13.2-RELEASE with a mostly stock kernel, with the only addition of me adding a DELAY(5000) that I'm testing for a separate crash happening for my AMD Raven HDA Controller (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268393#c48). Given the different subsystems, hopefully this isn't interfering. I believe I've experienced this crash before it would only happen seldomly if anything. I wanted to report just so that we can start collecting data on the problem. I've attached a crash dump as well. re0@pci0:12:0:0: class=0x020000 rev=0x05 hdr=0x00 vendor=0x10ec device=0x8125 subvendor=0x1043 subdevice=0x87d7 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8125 2.5GbE Controller' class = network subclass = ethernet /etc/rc.conf hostname="weshly" dumpdev="AUTO" kld_list="amdgpu vboxdrv fusefs" clear_tmp_enable="YES" # Disable Mail sendmail_enable="NO" sendmail_submit_enable="NO" sendmail_outbound_enable="NO" sendmail_msp_queue_enable="NO" # Networking ifconfig_re0="DHCP" ifconfig_re0_ipv6="inet6 accept_rtadv" # Services zfs_enable="YES" dbus_enable="YES" seatd_enable="YES" mixer_enable="YES" syncthing_enable="YES" syncthing_user="jon" syncthing_group="jon" # NFS nfs_client_enable="YES" rpc_lockd_enable="YES" rpc_statd_enable="YES" When the crash happened, I did have virtualbox running in the background (although it was mostly idle). The VM was a Windows 10 VM and it had the network adapter settings set to NAT. Thank you, Jonathan
I forgot to post this: root@weshly:~ # pkg info net/realtek-re-kmod realtek-re-kmod-198.00_2 Name : realtek-re-kmod Version : 198.00_2 Installed on : Mon Jul 3 10:19:36 2023 EDT Origin : net/realtek-re-kmod Architecture : FreeBSD:13:amd64 Prefix : /usr/local Categories : net kld Licenses : BSD4CLAUSE Maintainer : ale@FreeBSD.org WWW : https://github.com/alexdupre/rtl_bsd_drv Comment : Kernel driver for Realtek PCIe Ethernet Controllers Annotations : FreeBSD_version: 1302001 repo_type : binary repository : Weshly Flat size : 1023KiB Description : Realtek PCIe FE / GBE / 2.5G / Gaming Ethernet Family Controller kernel driver. This is the official driver from Realtek with a few patches to improve stability and performance. It can be loaded instead of the FreeBSD driver built into the GENERIC kernel if you experience issues with it (eg. watchdog timeouts), or your card is not supported. Supported devices: * 2.5G Gigabit Ethernet - RTL8125 / RTL8125B(S)(G) * 10/100/1000M Gigabit Ethernet - RTL8111B / RTL8111C / RTL8111D / RTL8111E / RTL8111F / RTL8111G(S) RTL8111H(S) / RTL8118(A)(S) / RTL8119i / RTL8111L / RTL8111K - RTL8168B / RTL8168E / RTL8168H - RTL8111DP / RTL8111EP / RTL8111FP - RTL8411 / RTL8411B * 10/100M Fast Ethernet - RTL8101E / RTL8102E / RTL8103E / RTL8105E / RTL8106E / RTL8107E - RTL8401 / RTL8402 See also: https://www.realtek.com/en/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-pci-express-software WWW: https://github.com/alexdupre/rtl_bsd_drv
Hey I assume I know how to fix this, you can assign to me if you want.
Created attachment 246965 [details] crash on fbsd 14-release-p2 I'm uploading a new kernel crash dump from the latest FreeBSD 14.0-RELEASE-p2 compiled from latest releng/14.0 master commit (06497fbd52e2f138b7d590c8499d9cebad182850). I'm able to essentially deterministically reproduce the error now. I basically just need to stress out the card and it will hang. Sometimes it actually hangs after I log into the system and start my session. Opening "transmission-gtk" to help seed all of the AMD64 Torrent links at https://wiki.freebsd.org/Torrents is enough to cause the system to crash. Please let me know anything I can do to help further debug this.
I've set the attachment type to text/plain, and as far as I can see, it looks the same as the previous crash, so let's extract the most important part: Fatal trap 12: page fault while in kernel mode cpuid = 16; apic id = 10 fault virtual address = 0x10007 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80c686e0 stack pointer = 0x28:0xfffffe015f114d30 frame pointer = 0x28:0xfffffe015f114d80 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (re0 taskq) rdi: 0000000000000000 rsi: 00000000040003f4 rdx: 00000000ffffffff rcx: 0000000000000001 r8: 0000000000000403 r9: 0000000000000403 rax: 0000000000000000 rbx: 000000000000ffff rbp: fffffe015f114d80 r10: 00000000000100a4 r11: 00000000000072c3 r12: 0000000000008803 r13: 000000000000ffff r14: fffffe015ecb9c80 r15: 0000000000000000 trap number = 12 panic: page fault cpuid = 16 time = 1702262468 KDB: stack backtrace: #0 0xffffffff80b9002d at kdb_backtrace+0x5d #1 0xffffffff80b43132 at vpanic+0x132 #2 0xffffffff80b42ff3 at panic+0x43 #3 0xffffffff8100c85c at trap_fatal+0x40c #4 0xffffffff8100c8af at trap_pfault+0x4f #5 0xffffffff80fe39c8 at calltrap+0x8 #6 0xffffffff8253edd0 at re_rxeof+0x2c0 #7 0xffffffff8252f87a at re_int_task_8125+0xba #8 0xffffffff80ba5922 at taskqueue_run_locked+0x182 #9 0xffffffff80ba6bb2 at taskqueue_thread_loop+0xc2 #10 0xffffffff80afdb0f at fork_exit+0x7f #11 0xffffffff80fe4a2e at fork_trampoline+0xe Uptime: 38s Dumping 2707 out of 65221 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 57 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:405 #2 0xffffffff80b42cc7 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:526 #3 0xffffffff80b4319f in vpanic (fmt=0xffffffff81136b3b "%s", ap=ap@entry=0xfffffe015f114b80) at /usr/src/sys/kern/kern_shutdown.c:970 #4 0xffffffff80b42ff3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:894 #5 0xffffffff8100c85c in trap_fatal (frame=0xfffffe015f114c70, eva=65543) at /usr/src/sys/amd64/amd64/trap.c:952 #6 0xffffffff8100c8af in trap_pfault (frame=0xfffffe015f114c70, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:760 #7 <signal handler called> #8 ether_input (ifp=<optimized out>, m=0xffff) at /usr/src/sys/net/if_ethersubr.c:849 #9 0xffffffff8253edd0 in re_rxeof () from /boot/modules/if_re.ko #10 0xffffffff8252f87a in re_int_task_8125 () from /boot/modules/if_re.ko #11 0xffffffff80ba5922 in taskqueue_run_locked (queue=0xfffffe006ba84168, queue@entry=0xfffff800025df700) at /usr/src/sys/kern/subr_taskqueue.c:512 #12 0xffffffff80ba6bb2 in taskqueue_thread_loop ( arg=arg@entry=0xfffffe006ba84238) at /usr/src/sys/kern/subr_taskqueue.c:824 #13 0xffffffff80afdb0f in fork_exit ( callout=0xffffffff80ba6af0 <taskqueue_thread_loop>, arg=0xfffffe006ba84238, frame=0xfffffe015f114f40) at /usr/src/sys/kern/kern_fork.c:1160 #14 <signal handler called> #15 0x3b1daa5d375daa59 in ?? () Backtrace stopped: Cannot access memory at address 0x9c2a3111906a3115 (kgdb)
Thanks Mina. That’s correct it’s mostly the same, I posted it so we have as much latest data as possible (on a 14.X base, my last copy was from a while ago on 13.X).
This is the list *[addr] and 'backtrace' from kgdb for the dump I provided yesterday. Let me know if you need me to recompile the kernel with a debugging configuration. Perhaps that can provide further information if the current info isn't enough. Thank you! (kgdb) list *0xffffffff80c686e0 0xffffffff80c686e0 is in ether_input (/usr/src/sys/net/if_ethersubr.c:849). 844 */ 845 CURVNET_SET_QUIET(ifp->if_vnet); 846 if (__predict_false(needs_epoch)) 847 NET_EPOCH_ENTER(et); 848 while (m) { 849 mn = m->m_nextpkt; 850 m->m_nextpkt = NULL; 851 852 /* 853 * We will rely on rcvif being set properly in the deferred (kgdb) backtrace #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:405 #2 0xffffffff80b42cc7 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:526 #3 0xffffffff80b4319f in vpanic (fmt=0xffffffff81136b3b "%s", ap=ap@entry=0xfffffe015f114b80) at /usr/src/sys/kern/kern_shutdown.c:970 #4 0xffffffff80b42ff3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:894 #5 0xffffffff8100c85c in trap_fatal (frame=0xfffffe015f114c70, eva=65543) at /usr/src/sys/amd64/amd64/trap.c:952 #6 0xffffffff8100c8af in trap_pfault (frame=0xfffffe015f114c70, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:760 #7 <signal handler called> #8 ether_input (ifp=<optimized out>, m=0xffff) at /usr/src/sys/net/if_ethersubr.c:849 #9 0xffffffff8253edd0 in re_rxeof () from /boot/modules/if_re.ko #10 0xffffffff8252f87a in re_int_task_8125 () from /boot/modules/if_re.ko #11 0xffffffff80ba5922 in taskqueue_run_locked (queue=0xfffffe006ba84168, queue@entry=0xfffff800025df700) at /usr/src/sys/kern/subr_taskqueue.c:512 #12 0xffffffff80ba6bb2 in taskqueue_thread_loop (arg=arg@entry=0xfffffe006ba84238) at /usr/src/sys/kern/subr_taskqueue.c:824 #13 0xffffffff80afdb0f in fork_exit (callout=0xffffffff80ba6af0 <taskqueue_thread_loop>, arg=0xfffffe006ba84238, frame=0xfffffe015f114f40) at /usr/src/sys/kern/kern_fork.c:1160 #14 <signal handler called> #15 0x3b1daa5d375daa59 in ?? () Backtrace stopped: Cannot access memory at address 0x9c2a3111906a3115
I had the same problem about a year ago - unfortunately I have forgotten what workaround I used. Anyhow when I look at the source of /usr/ports/net/realtek-re-kmod/work/rtl_bsd_drv-d3a7a3d/if_re.c where the problem seems to happen: 7103 #if OS_VER < VERSION(4,9) 7104 /* Remove header from mbuf and pass it on. */ 7105 m_adj(m, sizeof(struct ether_header)); 7106 ether_input(ifp, eh, m); 7107 #else 7108 (*ifp->if_input)(ifp, m); 7109 #endif 7110 RE_LOCK(sc); From the crash dump stacks, it seems to end up in line 7106, which is between #if OS_VER < VERSION(4,9) and #else see also Comment #4: #8 ether_input (ifp=<optimized out>, m=0xffff) at /usr/src/sys/net/if_ethersubr.c:849 That seems wrong to me for FreeBSD 13. See if_rereg.h: #define OS_VER __FreeBSD_version
(In reply to Tino Engel from comment #7) > From the crash dump stacks, it seems to end up in line 7106, which is between > #if OS_VER < VERSION(4,9) > and > #else > see also Comment #4: > #8 ether_input (ifp=<optimized out>, m=0xffff) > at /usr/src/sys/net/if_ethersubr.c:849 > That seems wrong to me for FreeBSD 13. See if_rereg.h: > #define OS_VER __FreeBSD_version See /usr/src/sys/net/if_ethersubr.c: ``` void ether_ifattach(struct ifnet *ifp, const u_int8_t *lla) { ... ifp->if_input = ether_input; ... } ```
I've installed/upgraded a PCIe Intel(R) X520-1 82599EN SPF+ 10GiB NIC (and a TP-Link transceiver) on the server which will replace the Realtek onboard NIC (and thus avoid this issue). I'll still have this onboard NIC available so if anyone wants me to try patches, lmk. I'll be happy to send down traffic to it.
(In reply to Jonathan Vasquez from comment #9) Hi Jonathan Vasquez, I’m experiencing a similar problem with the if_re module crashing on my FreeBSD system when I load the module. I saw your comment and bug report about the Realtek onboard NIC issue. Would it be possible for you to share the patch you mentioned? I would greatly appreciate any guidance on how to apply and test it. Thanks for your help! Best regards, Mithun
Hey Mithun, I don't have a patch for this. I installed an Intel NIC on a spare PCIe slot that I have and it's been rock solid and fast now. So I'm no longer using Realtek for my networking.
(In reply to Jonathan Vasquez from comment #11) Hi Jonathan, Thanks for your response!. Unfortunately, I don't have a spare slot to switch NICs, so I'll continue looking for a solution with the if_re module.