Bug 210800

Summary: hung process using ktrace with cloudabi
Product: Base System Reporter: Michael Plass <mfp49_freebsd>
Component: kernAssignee: Ed Schouten <ed>
Status: Closed FIXED    
Severity: Affects Some People CC: amd64, ed
Priority: ---    
Version: CURRENT   
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
Properly set sa->narg none

Description Michael Plass 2016-07-03 23:05:49 UTC
Using ktrace on a cloudabi executable sometimes hangs in such a way that it cannot be killed.

FreeBSD xx 11.0-ALPHA5 FreeBSD 11.0-ALPHA5 #0 r302164: Fri Jun 24 02:51:52 UTC 2016     root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

# kldload cloudabi
# kldload cloudabi64

$ pkg info | grep cloud
cloudabi-0.6                   Constants, types and data structures used by CloudABI
cloudabi-toolchain-1.4         C and C++ toolchain for CloudABI
cloudabi-utils-0.11            Utilities for running CloudABI programs
x86_64-unknown-cloudabi-cloudabi-0.6_1 cloudabi for x86_64-unknown-cloudabi
x86_64-unknown-cloudabi-cloudlibc-0.40_1 cloudlibc for x86_64-unknown-cloudabi
x86_64-unknown-cloudabi-compiler-rt-3.8.0_4 compiler-rt for x86_64-unknown-cloudabi
x86_64-unknown-cloudabi-curl-7.49.1_2 curl for x86_64-unknown-cloudabi
x86_64-unknown-cloudabi-cxx-runtime-1.0_2 cxx-runtime for x86_64-unknown-cloudabi
x86_64-unknown-cloudabi-libcxx-3.8.0_9 libcxx for x86_64-unknown-cloudabi
x86_64-unknown-cloudabi-libcxxabi-3.8.0_6 libcxxabi for x86_64-unknown-cloudabi
x86_64-unknown-cloudabi-libressl-2.4.1_1 libressl for x86_64-unknown-cloudabi
x86_64-unknown-cloudabi-libunwind-3.8.0_5 libunwind for x86_64-unknown-cloudabi
x86_64-unknown-cloudabi-lua-5.3.3_2 lua for x86_64-unknown-cloudabi
x86_64-unknown-cloudabi-zlib-1.2.8_11 zlib for x86_64-unknown-cloudabi

$ : | ktrace /usr/local/x86_64-unknown-cloudabi/bin/lua

Here is a kernel stack trace of the hung process:
(kgdb) where
#0  sched_switch (td=0xfffff8006a217000, newtd=0xfffff80007380a00, 
    flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1973
#1  0xffffffff80a52a87 in mi_switch (flags=260, newtd=0x0)
    at /usr/src/sys/kern/kern_synch.c:455
#2  0xffffffff80a95d27 in sleepq_switch (wchan=<value optimized out>, pri=0)
    at /usr/src/sys/kern/subr_sleepqueue.c:557
#3  0xffffffff80a95bf3 in sleepq_wait (wchan=0xffffffff81c34400, pri=0)
    at /usr/src/sys/kern/subr_sleepqueue.c:637
#4  0xffffffff809e8cc4 in _cv_wait (cvp=<value optimized out>, 
    lock=<value optimized out>) at /usr/src/sys/kern/kern_condvar.c:144
#5  0xffffffff80aa3132 in vmem_xalloc (vm=<value optimized out>, 
    size0=<value optimized out>, align=<value optimized out>, phase=0, 
    nocross=<value optimized out>, minaddr=0, maxaddr=<value optimized out>, 
    flags=8194, addrp=<value optimized out>)
    at /usr/src/sys/kern/subr_vmem.c:1209
#6  0xffffffff80aa2e72 in vmem_alloc (vm=0xffffffff81c34380, size=14244610048, 
    flags=8194, addrp=0xfffffe01212959f0) at /usr/src/sys/kern/subr_vmem.c:1095
#7  0xffffffff80d2c193 in kmem_malloc (vmem=0xffffffff81c34380, 
    size=14244610048, flags=2) at /usr/src/sys/vm/vm_kern.c:313
#8  0xffffffff80d24d46 in uma_large_malloc (size=14244610048, wait=2)
    at /usr/src/sys/vm/uma_core.c:1106
#9  0xffffffff80a25833 in malloc (size=<value optimized out>, 
    mtp=0xffffffff818f0780, flags=2) at /usr/src/sys/kern/kern_malloc.c:510
#10 0xffffffff80a189ad in ktrsyscall (code=35, narg=1780576256, 
    args=0xfffffe0121295b80) at /usr/src/sys/kern/kern_ktrace.c:451
#11 0xffffffff80eb893e in amd64_syscall (td=0xfffff8006a217000, traced=0)
    at subr_syscall.c:77
#12 0xffffffff80e9897b in Xfast_syscall ()
    at /usr/src/sys/amd64/amd64/exception.S:396
#13 0x000000000103f42b in ?? ()

Clearly narg is ktrsyscall is garbage. It looks like
cloudabi64_fetch_syscall_args() is not filling in sa->nargs.
Comment 1 Ed Schouten freebsd_committer freebsd_triage 2016-07-05 14:27:58 UTC
Created attachment 172143 [details]
Properly set sa->narg

Hi Michael,

Thanks for reporting this bug. Can you let me know whether the attached patch fixes the problem for you? If so, I'll make sure to commit it before 11.0.

Thanks,
Ed
Comment 2 Michael Plass 2016-07-05 21:56:11 UTC
Ed,

With your patch, all seems well. No hangs in 1000 tries, kdump output looks reasonable, and the size of ktrace.out is consistent from run to run.

Thanks,
- Michael
Comment 3 Ed Schouten freebsd_committer freebsd_triage 2016-07-06 08:15:12 UTC
Perfect! Thanks for testing!

As we've already entered the freeze for 11.0, I've sent out a commit approval request to re@. Will commit the patch as soon as I get the approval.
Comment 4 commit-hook freebsd_committer freebsd_triage 2016-07-08 20:10:27 UTC
A commit references this bug:

Author: ed
Date: Fri Jul  8 20:09:22 UTC 2016
New revision: 302448
URL: https://svnweb.freebsd.org/changeset/base/302448

Log:
  Don't forget to set sa->narg for CloudABI system calls.

  It turns out that this value is not used within the system call code
  under normal conditions, except when using tracing tools like ktrace.
  If we forget to set this value, it is set to random garbage. This may
  cause ktrace to hang indefinitely, making it impossible to kill.

  Reported by: Michael Plass
  PR: 210800
  MFC before: 11.0-RELEASE

Changes:
  head/sys/amd64/cloudabi64/cloudabi64_sysvec.c
  head/sys/arm64/cloudabi64/cloudabi64_sysvec.c
Comment 5 commit-hook freebsd_committer freebsd_triage 2016-07-12 08:12:49 UTC
A commit references this bug:

Author: ed
Date: Tue Jul 12 06:25:28 UTC 2016
New revision: 302627
URL: https://svnweb.freebsd.org/changeset/base/302627

Log:
  MFC r302448:

    Don't forget to set sa->narg for CloudABI system calls.

    It turns out that this value is not used within the system call code
    under normal conditions, except when using tracing tools like ktrace.
    If we forget to set this value, it is set to random garbage. This may
    cause ktrace to hang indefinitely, making it impossible to kill.

  Approved by: re@
  Reported by: Michael Plass
  PR: 210800

Changes:
_U  stable/11/
  stable/11/sys/amd64/cloudabi64/cloudabi64_sysvec.c
  stable/11/sys/arm64/cloudabi64/cloudabi64_sysvec.c
Comment 6 Ed Schouten freebsd_committer freebsd_triage 2016-07-12 08:14:08 UTC
Looks like this is fully fixed now. 11.0-BETA2 should be first version to include this fix.

Thanks again for reporting this issue and enjoy using CloudABI!