Bug 252485 - [re] [panic] Panic in if_re.c (Realtek RTL8111/8168/8411)
Summary: [re] [panic] Panic in if_re.c (Realtek RTL8111/8168/8411)
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords: panic
Depends on:
Blocks:
 
Reported: 2021-01-07 07:29 UTC by Maciej Suszko
Modified: 2021-01-09 00:32 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Maciej Suszko 2021-01-07 07:29:27 UTC
Panic occurs quite often - sometimes after 30 minutes, sometimes after a few hours... but running 12.2-RELEASE on this motherboard for a few days did not succeed in 24h uptime.
Motherborad is Gigabyte GA-J3455N-D3H with 16GB of RAM. On previous hardware (Intel D2500CC) running with if_em - system was rock solid.

I tried a few combinations of loader tunables - It looks like default configuration leads to panic quickly in case of higher transfers. At the moment I have those:

hw.re.msix_disable="1"
hw.re.prefer_iomap="1"

Here's some information from recent core.txt:

Fatal trap 9: general protection fault while in kernel mode
cpuid = 1; apic id = 02
instruction pointer     = 0x20:0xffffffff8071b2a2
stack pointer           = 0x28:0xfffffe000047d8d0
frame pointer           = 0x28:0xfffffe000047d920
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq266: re0)
trap number             = 9
panic: general protection fault
cpuid = 1
time = 1609991554
KDB: stack backtrace:
#0 0xffffffff80659775 at kdb_backtrace+0x65
#1 0xffffffff8060da2b at vpanic+0x17b
#2 0xffffffff8060d8a3 at panic+0x43
#3 0xffffffff8095e0d1 at trap_fatal+0x391
#4 0xffffffff8095d557 at trap+0x67
#5 0xffffffff809376a8 at calltrap+0x8
#6 0xffffffff8073755a at netisr_dispatch_src+0xca
#7 0xffffffff8071a6eb at ether_input+0x4b
#8 0xffffffff81740349 at re_rxeof+0x5f9
#9 0xffffffff8173db0f at re_intr_msi+0xef
#10 0xffffffff805d24ac at ithread_loop+0x23c
#11 0xffffffff805cf35e at fork_exit+0x7e
#12 0xffffffff809386de at fork_trampoline+0xe
Uptime: 3h27m16s
Dumping 2386 out of 16207 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/if_re.ko...Reading symbols from /usr/lib/debug//boot/kernel/if_re.ko.debug...done.
done.
Loaded symbols for /boot/kernel/if_re.ko
Reading symbols from /boot/kernel/fdescfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/fdescfs.ko.debug...done.
done.
Loaded symbols for /boot/kernel/fdescfs.ko
Reading symbols from /boot/kernel/pflog.ko...Reading symbols from /usr/lib/debug//boot/kernel/pflog.ko.debug...done.
done.
Loaded symbols for /boot/kernel/pflog.ko
Reading symbols from /boot/kernel/pf.ko...Reading symbols from /usr/lib/debug//boot/kernel/pf.ko.debug...done.
done.
Loaded symbols for /boot/kernel/pf.ko
#0  doadump () at src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) #0  doadump () at src/sys/amd64/include/pcpu_aux.h:55
#1  0xffffffff8060d645 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:451
#2  0xffffffff8060da83 in vpanic (fmt=<value optimized out>, 
    ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:880
#3  0xffffffff8060d8a3 in panic (fmt=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:807
#4  0xffffffff8095e0d1 in trap_fatal (frame=<value optimized out>, 
    eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:921
#5  0xffffffff8095d557 in trap (frame=0xfffffe000047d810)
    at src/sys/amd64/include/counter.h:86
#6  0xffffffff809376a8 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:289
#7  0xffffffff8071b2a2 in ether_nh_input (m=0xfffff802a7792000)
    at /usr/src/sys/net/if_ethersubr.c:520
#8  0xffffffff8073755a in netisr_dispatch_src (proto=5, 
    source=<value optimized out>, m=<value optimized out>)
    at /usr/src/sys/net/netisr.c:1124
#9  0xffffffff8071a6eb in ether_input (ifp=0xfffff80002aab000, 
    m=<value optimized out>) at /usr/src/sys/net/if_ethersubr.c:787
#10 0xffffffff81740349 in re_rxeof (sc=<value optimized out>)
    at /usr/src/sys/dev/re/if_re.c:2385
#11 0xffffffff8173db0f in re_intr_msi (xsc=0xfffffe00003f2000)
    at /usr/src/sys/dev/re/if_re.c:2681
#12 0xffffffff805d24ac in ithread_loop (arg=0xfffff80002a21300)
    at /usr/src/sys/kern/kern_intr.c:1143
#13 0xffffffff805cf35e in fork_exit (
    callout=0xffffffff805d2270 <ithread_loop>, arg=0xfffff80002a21300, 
    frame=0xfffffe000047db00) at /usr/src/sys/kern/kern_fork.c:1080
#14 0xffffffff809386de in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:1078
#15 0x0000000000000000 in ?? ()
Current language:  auto; currently minimal
(kgdb) 

------------------------------------------------------------------------
ps -axlww

UID  PID PPID  C PRI  NI     VSZ    RSS MWCHAN   STAT TT       TIME COMMAND
  0    0    0  2 -16   0       0      0 swapin   DLs   -    0:00.25 [kernel]
  0    1    0  3  20   0    9916   1032 wait     DLs   -    0:00.11 [init]
  0    2    0  1 -16   0       0      0 crypto_w DL    -    0:00.00 [crypto]
  0    3    0  1 -16   0       0      0 crypto_r DL    -    0:00.00 [crypto returns 0]
  0    4    0  1 -16   0       0      0 crypto_r DL    -    0:00.00 [crypto returns 1]
  0    5    0  0 -16   0       0      0 crypto_r DL    -    0:00.00 [crypto returns 2]
  0    6    0  1 -16   0       0      0 crypto_r DL    -    0:00.00 [crypto returns 3]
  0    7    0  2 -16   0       0      0 -        RL    -    0:00.07 [cam]
  0    8    0  3 -16   0       0      0 -        DL    -    0:00.00 [soaiod1]
  0    9    0  0 -16   0       0      0 -        DL    -    0:00.00 [soaiod2]
  0   10    0  1 -16   0       0      0 audit_wo DL    -    0:00.00 [audit]
  0   11    0  0 155   0       0      0 -        RL    -  809:05.84 [idle]
  0   12    0 -1 -56   0       0      0 -        WL    -    1:33.71 [intr]
  0   13    0  2  -8   0       0      0 -        DL    -    0:00.00 [geom]
  0   14    0  0 -68   0       0      0 -        DL    -    0:00.52 [usb]
  0   15    0  1 -16   0       0      0 -        DL    -    0:00.00 [soaiod3]
  0   16    0  2 -16   0       0      0 -        DL    -    0:00.00 [soaiod4]
  0   17    0  2  -8   0       0      0 t->zthr_ DL    -    0:03.98 [zfskern]
  0   18    0  3 -16   0       0      0 waiting_ DL    -    0:00.00 [sctp_iterator]
  0   19    0  2 -16   0       0      0 -        DL    -    0:10.92 [rand_harvestq]
  0   20    0  1 -16   0       0      0 tzpoll   DL    -    0:00.25 [acpi_thermal]
  0   21    0  2 -16   0       0      0 cooling  DL    -    0:00.07 [acpi_cooling0]
  0   22    0  1 -16   0       0      0 psleep   DL    -    0:01.79 [pagedaemon]
  0   23    0  3 -16   0       0      0 psleep   DL    -    0:00.00 [vmdaemon]
  0   24    0  2 -16   0       0      0 qsleep   DL    -    0:01.03 [bufdaemon]
  0   25    0  2  16   0       0      0 syncer   DL    -    0:01.71 [syncer]
  0   26    0  2 -16   0       0      0 vlruwt   DL    -    0:00.08 [vnlru]
  0   27    0  3 -16   0       0      0 spa->spa DL    -    1:17.42 [zpool-rpool]
  0  720    1  2  52   0   11396   2616 select   Ds    -    0:00.00 [dhclient]
  0  723    1  3  52   0   11676   2796 select   Ds    -    0:00.00 [dhclient]
 65  741    1  2  20   0   11784   2908 select   DCs   -    0:00.00 [dhclient]
  0  757    1  3  20   0   10500   1476 select   Ds    -    0:00.01 [devd]
  0  765    0  0 -16   0       0      0 pftm     DL    -    0:06.20 [pf purge]
  0  777    1  0  20   0   12524   3132 sbwait   Ds    -    0:00.00 [pflogd]
 64  779  777  3  20   0   12592   3156 bpf      D     -    0:01.29 [pflogd]
  0  866    1  3  20   0   11376   2756 select   Ds    -    0:00.18 [syslogd]
  0  875    1  1  20   0   11296   2592 select   Ds    -    0:00.03 [rpcbind]
 53  962    1  2  52   0  143168 101508 sigwait  Ds    -    0:02.75 [named]
  0 1061    1  0  20   0   17548   4376 select   Ds    -    0:01.70 [mountd]
  0 1067    1  2  52   0   17316   4364 accept   Ds    -    0:00.02 [nfsd]
  0 1070 1067  3  52   0   11140   2296 rpcsvc   D     -    0:00.03 [nfsd]
  0 1071    1  1  20   0  279456   4344 select   Ds    -    0:00.02 [rpc.statd]
  0 1073 1071  3  20   0  285828   6316 nanslp   D     -    0:00.04 [rpc.statd]
  0 1075    1  0  52   0   17336   4340 rpcsvc   Ds    -    0:00.03 [rpc.lockd]
  0 1084    1  3  20   0   11192   2420 nanslp   Ds    -    0:00.78 [uptimed]
  0 1087    1  3  52   0   11496   2556 select   Ds    -    0:00.00 [in.tftpd]
  0 1099    1  3  52   0   11176   2548 select   Ds    -    0:00.00 [lpd]
  0 1121    1  2  20   0   10884   2296 select   Ds    -    0:05.18 [powerd]
136 1149    1  2  20   0   20608   8672 select   Ds    -    0:00.07 [dhcpd]
  0 1156    1  2  20 -20   13440   3396 select   D<s   -    0:00.01 [ntpd]
123 1157 1156  3  20 -20   13828   3628 select   D<s   -    0:00.20 [ntpd]
123 1161 1157  0  20   0   13788   3428 select   Ds    -    0:00.01 [ntpd]
  0 1166    1  3  20   0   29140  12200 select   D     -    0:16.41 [snmpd]
  0 1239    1  1  20   0   51532  10084 kqread   Ds    -    0:00.12 [master]
125 1241 1239  1  20   0   51580  10108 kqread   D     -    0:00.18 [qmgr]
933 1253    1  3  52   0 2044424 156804 uwait    D     -    4:11.86 [java]
  0 1260    1  1  36   0   17156   6000 select   Ds    -    0:00.00 [rsync]
535 1273    1  3  20   0   15508   5084 kqread   Ds    -    0:05.86 [redis-server]
  0 1279    1  0  20   0   15536   5736 select   Ds    -    0:00.04 [racoon]
  0 1317    1  3  20   0   42484  19236 kqread   Ds    -    0:00.55 [php-fpm]
 80 1318 1317  2  52   0   42532  19256 accept   D     -    0:00.00 [php-fpm]
 80 1319 1317  1  52   0   42536  19260 accept   D     -    0:00.00 [php-fpm]
181 1327    1  2  20   0   15952   6068 select   Ds    -    0:00.21 [nrpe3]
  0 1331    1  1  42   0   22096   9936 pause    Ds    -    0:00.00 [nginx]
 80 1332 1331  3  20   0   22712  10676 kqread   D     -    0:00.36 [nginx]
181 1345    1  2  20   0   12304   3428 nanslp   Ds    -    0:05.55 [nagios]
 80 1358    1  0  52   0   10844   2276 piperd   Ds    -    0:00.00 [daemon]
 80 1359 1358  3  52   0   10760   2252 accept   D     -    0:00.00 [fcgiwrap]
  0 1410    1  3  21   0   19892   8684 select   Ds    -    0:00.11 [sshd]
  0 1413    1  1  20   0   11352   2620 nanslp   Ds    -    0:00.14 [cron]
 62 1453    1  1  20   0   11132   2496 kqread   Ds    -    0:00.07 [ftp-proxy]
  0 1463    1  3  52   0   10884   2332 ttyin    Ds+   -    0:00.00 [getty]
  0 1464    1  2  52   0   10884   2332 ttyin    Ds+   -    0:00.00 [getty]
  0 1465    1  1  52   0   10884   2332 ttyin    Ds+   -    0:00.00 [getty]
  0 1466    1  2  52   0   10884   2332 ttyin    Ds+   -    0:00.00 [getty]
  0 1467    1  0  52   0   10884   2332 ttyin    Ds+   -    0:00.00 [getty]
  0 1468    1  0  52   0   10884   2332 ttyin    Ds+   -    0:00.00 [getty]
  0 1469    1  3  52   0   10884   2332 ttyin    Ds+   -    0:00.00 [getty]
  0 1470    1  1  52   0   10884   2332 ttyin    Ds+   -    0:00.00 [getty]
125 1479 1239  3  20   0   51700  10744 kqread   D     -    0:00.02 [tlsmgr]
125 4863 1239  2  20   0   51484  10768 kqread   D     -    0:00.05 [pickup]

------------------------------------------------------------------------
vmstat -s

 15697354 cpu context switches
  5030830 device interrupts
   216815 software interrupts
  1701900 traps
 23191341 system calls
       28 kernel threads created
     3766  fork() calls
     1206 vfork() calls
       14 rfork() calls
        0 swap pager pageins
        0 swap pager pages paged in
        0 swap pager pageouts
        0 swap pager pages paged out
     3615 vnode pager pageins
    31698 vnode pager pages paged in
      496 vnode pager pageouts
     1263 vnode pager pages paged out
        0 page daemon wakeups
        0 pages examined by the page daemon
        0 clean page reclamation shortfalls
        0 pages reactivated by the page daemon
   402489 copy-on-write faults
       26 copy-on-write optimized faults
  1062366 zero fill pages zeroed
        0 zero fill pages prezeroed
       40 intransit blocking page faults
  1789880 total VM faults taken
     3318 page faults requiring I/O
        0 pages affected by kernel thread creation
   183419 pages affected by  fork()
    68418 pages affected by vfork()
      808 pages affected by rfork()
  1581468 pages freed
        0 pages freed by daemon
        0 pages freed by exiting processes
        0 pages active
        0 pages inactive
        0 pages in the laundry queue
        0 pages wired down
        0 virtual user pages wired down
        0 pages free
        0 bytes per page
        0 total name lookups
          cache hits (0% pos + 0% neg) system 0% per-directory
          deletions 0%, falsehits 0%, toolong 0%
Comment 1 Maciej Suszko 2021-01-07 14:12:19 UTC
Using official Realtek drviers from ports (net/realtek-re-kmod) and removing all if_re related tunables looks promising - system is few hours online whitout panic...
Comment 2 Maciej Suszko 2021-01-08 07:15:15 UTC
Unfortunately, using ports kernel module stability is better but the machine paniced after ~16h... but now it's not strictly if_re related I suppose:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x100000010
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80627b34
stack pointer           = 0x28:0xfffffe0074516780
frame pointer           = 0x28:0xfffffe00745167e0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 11 (idle: cpu0)
trap number             = 12
panic: page fault
cpuid = 0
time = 1610073011
KDB: stack backtrace:
#0 0xffffffff80659775 at kdb_backtrace+0x65
#1 0xffffffff8060da2b at vpanic+0x17b
#2 0xffffffff8060d8a3 at panic+0x43
#3 0xffffffff8095e0d1 at trap_fatal+0x391
#4 0xffffffff8095e12f at trap_pfault+0x4f
#5 0xffffffff8095d776 at trap+0x286
#6 0xffffffff809376a8 at calltrap+0x8
#7 0xffffffff809809f8 at handleevents+0x188
#8 0xffffffff809811ef at timercb+0x25f
#9 0xffffffff809beceb at lapic_handle_timer+0x9b
#10 0xffffffff809393a1 at Xtimerint+0xb1
#11 0xffffffff809b54fe at cpu_idle_acpi+0x3e
#12 0xffffffff809b55af at cpu_idle+0x9f
#13 0xffffffff806415e6 at sched_idletd+0x326
#14 0xffffffff805cf35e at fork_exit+0x7e
#15 0xffffffff809386de at fork_trampoline+0xe
Uptime: 15h54m12s

#0  doadump () at src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) #0  doadump () at src/sys/amd64/include/pcpu_aux.h:55
#1  0xffffffff8060d645 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:451
#2  0xffffffff8060da83 in vpanic (fmt=<value optimized out>, 
    ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:880
#3  0xffffffff8060d8a3 in panic (fmt=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:807
#4  0xffffffff8095e0d1 in trap_fatal (frame=<value optimized out>, 
    eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:921
#5  0xffffffff8095e12f in trap_pfault (frame=0xfffffe00745166c0, 
    usermode=<value optimized out>, signo=<value optimized out>, 
    ucode=<value optimized out>) at src/sys/amd64/include/pcpu_aux.h:55
#6  0xffffffff8095d776 in trap (frame=0xfffffe00745166c0)
    at /usr/src/sys/amd64/amd64/trap.c:405
#7  0xffffffff809376a8 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:289
#8  0xffffffff80627b34 in callout_process (now=245898303208611)
    at /usr/src/sys/kern/kern_timeout.c:489
#9  0xffffffff809809f8 in handleevents (now=245898303208611, fake=0)
    at /usr/src/sys/kern/kern_clocksource.c:213
#10 0xffffffff809811ef in timercb (et=0xffffffff80f769e0, 
    arg=<value optimized out>) at /usr/src/sys/kern/kern_clocksource.c:357
#11 0xffffffff809beceb in lapic_handle_timer (frame=0xfffffe00745168b0)
    at /usr/src/sys/x86/x86/local_apic.c:1339
#12 0xffffffff809393a1 in Xtimerint () at apic_vector.S:132
#13 0xffffffff803c12ab in acpi_cpu_idle (sbt=<value optimized out>)
    at /usr/src/sys/dev/acpica/acpi_cpu.c:1194
#14 0xffffffff809b54fe in cpu_idle_acpi (sbt=68154160)
    at /usr/src/sys/x86/x86/cpu_machdep.c:506
#15 0xffffffff809b55af in cpu_idle (busy=0)
    at /usr/src/sys/x86/x86/cpu_machdep.c:654
#16 0xffffffff806415e6 in sched_idletd (dummy=<value optimized out>)
    at /usr/src/sys/kern/sched_ule.c:2860
#17 0xffffffff805cf35e in fork_exit (
    callout=0xffffffff806412c0 <sched_idletd>, arg=0x0, 
    frame=0xfffffe0074516b00) at /usr/src/sys/kern/kern_fork.c:1080
#18 0xffffffff809386de in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:1078
#19 0x0000000000000000 in ?? ()

Any ideas?