Bug 250413 - kernel 13.0 panic sleepq_add with net/syncthing
Summary: kernel 13.0 panic sleepq_add with net/syncthing
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Alexander V. Chernikov
URL:
Keywords: panic
Depends on:
Blocks:
 
Reported: 2020-10-17 07:32 UTC by Christopher Hall
Modified: 2020-10-19 11:28 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christopher Hall 2020-10-17 07:32:07 UTC
This panic happens after syncthing has been running for a while, syncing 20GB of data from an Android device.
Usually takes a few hours to panic. Ifinaly managed to capture a core dump and obtain the following back trace
using the command:

  kgdb -n last /usr/obj/usr/src/amd64.amd64/sys/GENERIC/kernel.debug

kernel:  FreeBSD 13.0-CURRENT #20 r366667
syncthing 1.10.0
hardware: Lenovo E495 Ryzen/Vega system


panic message:

WARNING !drm_modeset_is_locked(&plane->mutex) failed at /wrkdirs/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.4.62_3/drivers/gpu/drm/drm_atomic_helper.c:871
#0 0xffffffff836ae6e3 at linux_dump_stack+0x23
#1 0xffffffff8362c610 at drm_atomic_helper_check_planes+0xb0
#2 0xffffffff8341c42b at amdgpu_dm_atomic_check+0x46b
#3 0xffffffff8362a35a at drm_atomic_check_only+0x3fa
#4 0xffffffff8362a793 at drm_atomic_commit+0x13
#5 0xffffffff836373d8 at drm_client_modeset_commit_atomic+0x148
#6 0xffffffff83637139 at drm_client_modeset_commit_force+0x69
#7 0xffffffff83677cea at drm_fb_helper_restore_fbdev_mode_unlocked+0x7a
#8 0xffffffff83671cd7 at vt_kms_postswitch+0x167
#9 0xffffffff80a360c2 at vt_window_switch+0x122
#10 0xffffffff80a3358f at vtterm_cngrab+0x1f
#11 0xffffffff80b724b6 at cngrab+0x16
#12 0xffffffff80bd98bc at vpanic+0xec
#13 0xffffffff80bd9713 at panic+0x43
#14 0xffffffff80c34c29 at sleepq_add+0x359
#15 0xffffffff80be4004 at _sx_xlock_hard+0x414
#16 0xffffffff80be3b8a at _sx_xlock+0xba
#17 0xffffffff80d7fe7c at inp_setmoptions+0xd5c
kernel trap 22 with interrupts disabled
                            kernel trap 22 with interrupts disabled
 panic: sleepq_add: td 0xfffffe005eac6500 to sleep on wchan 0xffffffff81d976e8 with sleeping prohibited
cpuid = 5
time = 1602901125
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe005df085d0
vpanic() at vpanic+0x182/frame 0xfffffe005df08620
panic() at panic+0x43/frame 0xfffffe005df08680
sleepq_add() at sleepq_add+0x359/frame 0xfffffe005df086d0
_sx_xlock_hard() at _sx_xlock_hard+0x414/frame 0xfffffe005df08770
_sx_xlock() at _sx_xlock+0xba/frame 0xfffffe005df087b0
inp_setmoptions() at inp_setmoptions+0xd5c/frame 0xfffffe005df08960
ip_ctloutput() at ip_ctloutput+0xfe/frame 0xfffffe005df089b0
udp_ctloutput() at udp_ctloutput+0x100/frame 0xfffffe005df089f0
sosetopt() at sosetopt+0xed/frame 0xfffffe005df08a50
kern_setsockopt() at kern_setsockopt+0xa8/frame 0xfffffe005df08ab0
sys_setsockopt() at sys_setsockopt+0x24/frame 0xfffffe005df08ad0
amd64_syscall() at amd64_syscall+0x135/frame 0xfffffe005df08bf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe005df08bf0
--- syscall (105, FreeBSD ELF64, sys_setsockopt), rip = 0x4c220a, rsp = 0xc0005738d8, rbp = 0xc000573938 ---
KDB: enter: panic

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55		__asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:394
#2  0xffffffff804a0a8a in db_dump (dummy=<optimized out>, dummy2=<optimized out>, dummy3=<unavailable>, dummy4=<unavailable>)
    at /usr/src/sys/ddb/db_command.c:575
#3  0xffffffff804a0850 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=1) at /usr/src/sys/ddb/db_command.c:482
#4  0xffffffff804a05ad in db_command_loop () at /usr/src/sys/ddb/db_command.c:535
#5  0xffffffff804a38c6 in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:270
#6  0xffffffff80c255d4 in kdb_trap (type=3, code=0, tf=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:699
#7  0xffffffff81021dde in trap (frame=0xfffffe005df08500) at /usr/src/sys/amd64/amd64/trap.c:576
#8  <signal handler called>
#9  kdb_enter (why=0xffffffff8120c497 "panic", msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:486
#10 0xffffffff80bd996e in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:901
#11 0xffffffff80bd9713 in panic (fmt=0xffffffff81c89468 <cnputs_mtx> "\270\331\034\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:838
#12 0xffffffff80c34c29 in sleepq_add (wchan=0xffffffff81d976e8 <in_multi_sx>, lock=0x0, wmesg=0xffffffff8123f1a9 "in_multi_sx", flags=3, queue=0)
    at /usr/src/sys/kern/subr_sleepqueue.c:328
#13 0xffffffff80be4004 in _sx_xlock_hard (sx=0xffffffff81d976e8 <in_multi_sx>, x=<optimized out>, opts=<optimized out>, file=<optimized out>, 
    line=<optimized out>) at /usr/src/sys/kern/kern_sx.c:854
#14 0xffffffff80be3b8a in _sx_xlock (sx=0xffffffff81d976e8 <in_multi_sx>, opts=0, file=0xffffffff811552ae "/usr/src/sys/netinet/in_mcast.c", line=2081)
    at /usr/src/sys/kern/kern_sx.c:332
#15 0xffffffff80d7fe7c in inp_join_group (inp=0xfffff800241d5000, sopt=<optimized out>) at /usr/src/sys/netinet/in_mcast.c:2081
#16 inp_setmoptions (inp=0xfffff800241d5000, sopt=<optimized out>) at /usr/src/sys/netinet/in_mcast.c:2851
#17 0xffffffff80d8f4ce in ip_ctloutput (so=<optimized out>, sopt=0xfffffe005df08a68) at /usr/src/sys/netinet/ip_output.c:1366
#18 0xffffffff80dbd360 in udp_ctloutput (so=0xfffff8004ddc6b40, sopt=0xfffffe005df08a68) at /usr/src/sys/netinet/udp_usrreq.c:992
#19 0xffffffff80c7e11d in sosetopt (so=0xfffff8004ddc6b40, sopt=0xfffffe005df08a68) at /usr/src/sys/kern/uipc_socket.c:3003
#20 0xffffffff80c83628 in kern_setsockopt (td=0xfffffe005eac6500, s=<optimized out>, level=<optimized out>, name=<optimized out>, val=<optimized out>, 
    valseg=<optimized out>, valsize=8) at /usr/src/sys/kern/uipc_syscalls.c:1270
#21 0xffffffff80c83574 in sys_setsockopt (td=0xffffffff81c89468 <cnputs_mtx>, uap=<optimized out>) at /usr/src/sys/kern/uipc_syscalls.c:1231
#22 0xffffffff81022c05 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:165
#23 amd64_syscall (td=0xfffffe005eac6500, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1156
#24 <signal handler called>
#25 0x00000000004c220a in ?? ()
Backtrace stopped: Cannot access memory at address 0xc0005738d8
(kgdb) frame 12
#12 0xffffffff80c34c29 in sleepq_add (wchan=0xffffffff81d976e8 <in_multi_sx>, lock=0x0, wmesg=0xffffffff8123f1a9 "in_multi_sx", flags=3, queue=0)
    at /usr/src/sys/kern/subr_sleepqueue.c:328
328			KASSERT(0,
(kgdb) list
323		/* If this thread is not allowed to sleep, die a horrible death. */
324		if (__predict_false(!THREAD_CAN_SLEEP())) {
325	#ifdef EPOCH_TRACE
326			epoch_trace_list(curthread);
327	#endif
328			KASSERT(0,
329			    ("%s: td %p to sleep on wchan %p with sleeping prohibited",
330			    __func__, td, wchan));
331		}
332	
(kgdb) 


Please advise what further commands could be used to obtain more information, or any
changes I migh make to help track this down.
Comment 1 Konstantin Belousov freebsd_committer 2020-10-17 08:18:46 UTC
https://reviews.freebsd.org/D25445#inline-164722 for description of what
happens there, I believe.
Comment 2 Christopher Hall 2020-10-17 09:13:28 UTC
I checked my source does correspond to version with the epoch enter/exit code.
Comment 3 Alexander V. Chernikov freebsd_committer 2020-10-17 11:34:01 UTC
Hi Christopher, I raised the https://reviews.freebsd.org/D25445 review and will commit it tomorrow. Meanwhile, if you could consider testing it, that would be perfect.
Comment 4 commit-hook freebsd_committer 2020-10-17 20:33:49 UTC
A commit references this bug:

Author: melifaro
Date: Sat Oct 17 20:33:10 UTC 2020
New revision: 366795
URL: https://svnweb.freebsd.org/changeset/base/366795

Log:
  Fix sleepq_add panic happening with too wide net epoch in mcast control.

  PR:		250413
  Reported by:	Christopher Hall <hsw at bitmark.com>
  Reviewed by:	ae
  Differential Revision:	https://reviews.freebsd.org/D26827

Changes:
  head/sys/netinet/in_mcast.c
Comment 5 Christopher Hall 2020-10-18 02:55:30 UTC
Sorry, timezone difference, means that the fix was already committed.
I will test rev 366804.Just started buildworld, so may be a while before I can check.
Comment 6 Christopher Hall 2020-10-19 11:28:43 UTC
syncthing still running after 11 hours, no panics