Bug 275523 - Kernel PANIC in do_osd_del() in LIST_REMOVE macro in 15.0-CURRENT
Summary: Kernel PANIC in do_osd_del() in LIST_REMOVE macro in 15.0-CURRENT
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2023-12-04 14:15 UTC by David Gilbert
Modified: 2023-12-08 03:25 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Gilbert 2023-12-04 14:15:36 UTC
This panic was on RISC-V hardware (real hardware), but the panic may not be RISC-V only.  Hardware does pass 48h of memtester (so far).

Kernel core file and symbol table available on request, Full core.txt available at:

[2:6:306]dgilbert@ump:/var/crash> uname -a
FreeBSD ump.daveg.ca 15.0-CURRENT FreeBSD 15.0-CURRENT #1 main-n266101-608da65de955-dirty: Wed Oct 25 02:49:32 EDT 2023     root@ump.daveg.ca:/usr/obj/usr/src/riscv.riscv64/sys/GENERIC riscv

https://termbin.com/y9g4

Unread portion of the kernel message buffer:
panic: Bad link elm 0xffffffd050ee24f8 next->prev != elm
cpuid = 0
time = 1699626334
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x36
kdb_backtrace() at kdb_backtrace+0x2c
vpanic() at vpanic+0x116
panic() at panic+0x26
do_osd_del() at do_osd_del+0x378
osd_del() at osd_del+0x5c
khelp_destroy_osd() at khelp_destroy_osd+0x78
tcp_discardcb() at tcp_discardcb+0x96
tcp_usr_detach() at tcp_usr_detach+0x4e
sorele_locked() at sorele_locked+0xce
tcp_close() at tcp_close+0x1d0
tcp_timer_2msl() at tcp_timer_2msl+0x132
tcp_timer_enter() at tcp_timer_enter+0x11e
softclock_call_cc() at softclock_call_cc+0x112
softclock_thread() at softclock_thread+0x9e
fork_exit() at fork_exit+0x68
fork_trampoline() at fork_trampoline+0xa
KDB: enter: panic

get_curthread () at /usr/src/sys/riscv/include/pcpu.h:71
71		__asm __volatile("ld %0, 0(tp)" : "=&r"(td));
(kgdb) #0  get_curthread () at /usr/src/sys/riscv/include/pcpu.h:71
#1  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:405
#2  0xffffffc0000e2dda in db_dump (dummy=<optimized out>, 
    dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>)
    at /usr/src/sys/ddb/db_command.c:591
#3  0xffffffc0000e2bea in db_command (last_cmdp=<optimized out>, 
    cmd_table=<optimized out>, dopager=true)
    at /usr/src/sys/ddb/db_command.c:504
#4  0xffffffc0000e296a in db_command_loop ()
    at /usr/src/sys/ddb/db_command.c:551
#5  0xffffffc0000e5c32 in db_trap (type=<optimized out>, 
    type@entry=<error reading variable: value is not available>, 
    code=<optimized out>, 
    code@entry=<error reading variable: value is not available>)
    at /usr/src/sys/ddb/db_main.c:268
#6  0xffffffc00033825c in kdb_trap (type=3, code=0, tf=<optimized out>)
    at /usr/src/sys/kern/subr_kdb.c:790
#7  0xffffffc0005b1d88 in do_trap_supervisor (frame=0xffffffc0043bc690)
    at /usr/src/sys/riscv/riscv/trap.c:359
#8  <signal handler called>
#9  kdb_enter (why=<optimized out>, msg=<optimized out>)
    at /usr/src/sys/kern/subr_kdb.c:556
#10 0xffffffc0002f44d2 in vpanic (
    fmt=0xffffffc00062846a "Bad link elm %p next->prev != elm", 
    ap=0xffffffc0043bc848) at /usr/src/sys/kern/kern_shutdown.c:958
#11 0xffffffc0002f42ce in panic (
    fmt=0x12 <error: Cannot access memory at address 0x12>)
    at /usr/src/sys/kern/kern_shutdown.c:894
#12 0xffffffc0002d5b56 in do_osd_del (type=2, osd=0xffffffd050ee24f8, slot=1, 
    list_locked=0) at /usr/src/sys/kern/kern_osd.c:346
#13 0xffffffc0002d61f2 in osd_del (type=2, osd=0x1, slot=37)
    at /usr/src/sys/kern/kern_osd.c:311
#14 0xffffffc0002be3da in khelp_remove_osd (
    h=0xffffffc0007c4b30 <ertt_helper>, hosd=0xffffffd050ee24f8)
    at /usr/src/sys/kern/kern_khelp.c:223
#15 khelp_destroy_osd (hosd=0xffffffd050ee24f8)
    at /usr/src/sys/kern/kern_khelp.c:203
#16 0xffffffc00045878a in tcp_discardcb (tp=0xffffffd050ee2000)
    at /usr/src/sys/netinet/tcp_subr.c:2416
#17 0xffffffc000463d44 in tcp_usr_detach (so=0xffffffd33d624000)
    at /usr/src/sys/netinet/tcp_usrreq.c:217
#18 0xffffffc000386060 in sofree (so=0xffffffd33d624000)
    at /usr/src/sys/kern/uipc_socket.c:1211
#19 sorele_locked (so=0xffffffd33d624000)
    at /usr/src/sys/kern/uipc_socket.c:1238
#20 0xffffffc0004586ca in tcp_close (tp=<optimized out>)
    at /usr/src/sys/netinet/tcp_subr.c:2541
#21 0xffffffc0004627ae in tcp_timer_2msl (tp=0xffffffd050ee2000)
    at /usr/src/sys/netinet/tcp_timer.c:373
#22 0xffffffc00046130c in tcp_timer_enter (xtp=0xffffffd050ee2000)
    at /usr/src/sys/netinet/tcp_timer.c:880
#23 0xffffffc00030f58e in softclock_call_cc (c=0xffffffd050ee2198, 
    cc=0xffffffc003e6a0c0, direct=0) at /usr/src/sys/kern/kern_timeout.c:719
#24 0xffffffc000310b4c in softclock_thread (arg=0xffffffc003e6a0c0)
    at /usr/src/sys/kern/kern_timeout.c:858
#25 0xffffffc0002b1c9c in fork_exit (
    callout=0xffffffc000310aaa <softclock_thread>, arg=0xffffffc003e6a0c0, 
    frame=0xffffffc0043bcc50) at /usr/src/sys/kern/kern_fork.c:1160
#26 0xffffffc0005b1aee in fork_trampoline ()
    at /usr/src/sys/riscv/riscv/swtch.S:370
Backtrace stopped: frame did not save the PC
(kgdb)
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2023-12-05 09:15:11 UTC
^Triage: someone from net@ should check if this is theirs.
Comment 2 Zhenlei Huang freebsd_committer freebsd_triage 2023-12-08 03:21:24 UTC
(In reply to dgilbert from comment #0)
> This panic was on RISC-V hardware (real hardware), but the panic may not be
> RISC-V only.  Hardware does pass 48h of memtester (so far).

From the view of source code I do not get a clue how that happens.

May you please share you setup and steps to repeat this problem ?
Comment 3 David Gilbert 2023-12-08 03:25:20 UTC
The device is an unmatched.  It has a 2T nvme disk with ZFS on it. AFAICR, it was largely idle when this panic happened.  I can provide someone the core and symbol files if that helps.