Bug 243711

Summary: dtrace immediately panics the system if compiled without SMP
Product: Base System Reporter: alex_y_xu
Component: kernAssignee: Mark Johnston <markj>
Status: Closed FIXED    
Severity: Affects Some People CC: markj
Priority: ---    
Version: CURRENT   
Hardware: amd64   
OS: Any   

Description alex_y_xu 2020-01-29 19:13:11 UTC
on amd64, with 12.1-STABLE or 13.0-CURRENT, building without SMP then running "dtrace -n 'tick-5s { exit(1); }'" immediately panics the system. this doesn't occur with SMP enabled.

#0  __curthread () at /usr/src/sys/amd64/include/pcpu.h:234
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:371
#2  0xffffffff803ad059 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451
#3  0xffffffff803ad3f3 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:877
#4  0xffffffff803ad2a3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:804
#5  0xffffffff805e6cca in trap_fatal (frame=0xfffffe00004783e0, eva=168) at /usr/src/sys/amd64/amd64/trap.c:943
#6  0xffffffff805e6d18 in trap_pfault (frame=0xfffffe00004783e0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:767
#7  0xffffffff805e636a in trap (frame=0xfffffe00004783e0) at /usr/src/sys/amd64/amd64/trap.c:443
#8  <signal handler called>
#9  dtrace_buffer_activate (state=0x0) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:12068
#10 0xffffffff80400d46 in smp_rendezvous_cpus (map=..., setup_func=0xffffffff80e3e740 <dtrace_buffer_activate+32>, action_func=0xffffffff80400dc0 <smp_no_rendezvous_barrier>, teardown_func=0xfffffe0000c46000, arg=0x0)
    at /usr/src/sys/kern/subr_smp.c:799
#11 0xffffffff80e41b53 in dtrace_xcall (cpu=-1, func=<optimized out>, arg=0x0) at /usr/src/sys/cddl/dev/dtrace/amd64/dtrace_subr.c:125
#12 dtrace_sync () at /usr/src/sys/cddl/dev/dtrace/amd64/dtrace_subr.c:141
#13 0xffffffff80e3d47b in dtrace_buffer_activate_cpu (state=<optimized out>, cpu=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:12084
#14 dtrace_state_go (state=<optimized out>, cpu=0xfffffe00004788c0) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:15195
#15 0xffffffff80e39065 in __sx_xunlock (sx=<optimized out>, td=<optimized out>, file=<optimized out>, line=<optimized out>) at /usr/src/sys/sys/sx.h:179
#16 dtrace_ioctl (dev=<optimized out>, cmd=<optimized out>, addr=<optimized out>, flags=<optimized out>, td=<optimized out>) at /usr/src/sys/cddl/dev/dtrace/dtrace_ioctl.c:762
#17 0xffffffff8032a7ad in devfs_ioctl (ap=0xfffffe0000478728) at /usr/src/sys/fs/devfs/devfs_vnops.c:808
#18 0xffffffff8062c3ec in VOP_IOCTL_APV (vop=0xffffffff80816080 <devfs_specops>, a=0xfffffe0000478728) at vnode_if.c:1067
#19 0xffffffff80479bac in vn_ioctl (fp=0xfffff8001acc3af0, com=<optimized out>, data=0xfffffe00004788c0, active_cred=0xfffff80008b72400, td=0x0) at /usr/src/sys/kern/vfs_vnops.c:1473
#20 0xffffffff8032ad9f in devfs_ioctl_f (fp=0x0, com=18446744071577003840, data=0xffffffff80400dc0 <smp_no_rendezvous_barrier>, cred=0x246, td=0xfffff8001e363000) at /usr/src/sys/fs/devfs/devfs_vnops.c:766
#21 0xffffffff8040c633 in fo_ioctl (fp=0x0, com=<optimized out>, data=0xffffffff80400dc0 <smp_no_rendezvous_barrier>, active_cred=0x246, td=<optimized out>) at /usr/src/sys/sys/file.h:337
#22 kern_ioctl (td=0xfffff8001e363000, fd=3, com=1074034700, data=0xffffffff80400dc0 <smp_no_rendezvous_barrier> "UH\211\345]\303f.\017\037\204") at /usr/src/sys/kern/sys_generic.c:804
#23 0xffffffff8040c3dd in sys_ioctl (td=0xfffff8001e363000, uap=0xfffff8001e3633c0) at /usr/src/sys/kern/sys_generic.c:712
#24 0xffffffff805e76f4 in syscallenter (td=0xfffff8001e363000) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135
#25 amd64_syscall (td=0xfffff8001e363000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1186
#26 <signal handler called>
#27 0x0000000803560e9a in ?? ()
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2020-01-29 23:34:13 UTC
Hmm, I can't even boot a non-SMP amd64 kernel on head; we panic when adding a thread.  I guess this is some fallout from the recent scheduler locking optimizations.  That said, I can't quite see the issue from looking at the DTrace code.  Which revision are you running?  Out of curiosity, why are you using a non-SMP kernel?
Comment 2 alex_y_xu 2020-01-30 00:58:56 UTC
(In reply to Mark Johnston from comment #1)

> Which revision are you running?

I used https://download.freebsd.org/ftp/snapshots/VM-IMAGES/13.0-CURRENT/amd64/Latest/FreeBSD-13.0-CURRENT-amd64.qcow2.xz (date 2020-Jan-23 09:41) and https://download.freebsd.org/ftp/snapshots/amd64/13.0-CURRENT/src.txz (date 2020-Jan-23 08:56). My understanding is that these are kept up-to-date, but probably your source copy is newer.

> I can't quite see the issue from looking at the DTrace code.

I also tried some debugging, but I am not familiar with the FreeBSD kernel. I suspect that the backtrace is not accurate. With FreeBSD 12.1-STABLE, I acquired a backtrace that had some dtrace state=non-zero, then magically become arg=0x0 when it reached dtrace_xcall or thereabouts. Maybe something strange is going on with the registers during the trap and panic though. I didn't get around to trying DDB though.

> why are you using a non-SMP kernel?

I want to use a non-SMP kernel because I am on a single-CPU VM and I assume that no-SMP kernels are more efficient. On Linux, spinlocks in interrupt-disabled context can be compiled out of the kernel in no-SMP mode. If it's really poorly tested though, there's no particular reason I *can't* use an SMP kernel.
Comment 3 Mark Johnston freebsd_committer freebsd_triage 2020-01-30 16:19:59 UTC
(In reply to alex_y_xu from comment #2)
Ah, I think the snapshots have invariants-checking disabled, which would explain why you're not seeing the problem during boot.

> I want to use a non-SMP kernel because I am on a single-CPU VM and I assume that no-SMP kernels are more efficient.

Fair enough.  Indeed, mtx_lock_spin() does nothing but disable interrupts in non-SMP FreeBSD kernels (and nothing if they are already disabled).  Non-SMP kernels definitely aren't as well tested though.  It looks like DTrace has been broken there basically forever: MAXCPU is 256 in kernel modules even when compiled without SMP:

#if defined(SMP) || defined(KLD_MODULE)                                                                               
#ifndef MAXCPU                                                                                                        
#define MAXCPU          256                                                                                           
#endif                                                                                                                
#else                                                                                                                 
#define MAXCPU          1                                                                                             
#endif 

... and smp_rendezvous() and other kernel APIs use pass-by-value for cpuset_t's for a reason that is not clear to me.
Comment 4 alex_y_xu 2020-01-30 17:05:05 UTC
(In reply to Mark Johnston from comment #3)

That would definitely explain why arg suddenly becomes zero, and probably also why dtrace_buffer_activate shifts from action_func to setup_func.
Comment 5 Mark Johnston freebsd_committer freebsd_triage 2020-02-05 00:36:42 UTC
The boot-time panic is fixed by r357473, I believe this will fix the DTrace panic, though I haven't yet tested: https://reviews.freebsd.org/D23512
Comment 6 commit-hook freebsd_committer freebsd_triage 2020-02-05 19:08:56 UTC
A commit references this bug:

Author: markj
Date: Wed Feb  5 19:08:22 UTC 2020
New revision: 357585
URL: https://svnweb.freebsd.org/changeset/base/357585

Log:
  Define MAXCPU consistently between the kernel and KLDs.

  This reverts r177661.  The change is no longer very useful since
  out-of-tree KLDs will be built to target SMP kernels anyway.  Moveover
  it breaks the KBI in !SMP builds since cpuset_t's layout depends on the
  value of MAXCPU, and several kernel interfaces, notably
  smp_rendezvous_cpus(), take a cpuset_t as a parameter.

  PR:		243711
  Reviewed by:	jhb, kib
  Sponsored by:	The FreeBSD Foundation
  Differential Revision:	https://reviews.freebsd.org/D23512

Changes:
  head/sys/amd64/include/param.h
  head/sys/arm/include/param.h
  head/sys/arm64/include/param.h
  head/sys/i386/include/param.h
  head/sys/powerpc/include/param.h
  head/sys/riscv/include/param.h
Comment 7 commit-hook freebsd_committer freebsd_triage 2020-02-05 19:08:57 UTC
A commit references this bug:

Author: markj
Date: Wed Feb  5 19:08:46 UTC 2020
New revision: 357586
URL: https://svnweb.freebsd.org/changeset/base/357586

Log:
  Stop compiling dtrace modules with -DSMP.

  I believe this is left over from when dtrace was being ported and
  developed out-of-tree.  Now it just ensures that dtrace.ko and a non-SMP
  kernel have incompatible KBIs.

  PR:		243711
  Sponsored by:	The FreeBSD Foundation

Changes:
  head/sys/modules/dtrace/dtrace/Makefile
  head/sys/modules/dtrace/fasttrap/Makefile
Comment 8 Mark Johnston freebsd_committer freebsd_triage 2020-02-05 19:10:40 UTC
Thanks for the report.  I think the param.h change will live in head only and not be backported to stable/12, since they change the KBI.