How-to reproduce: sysctl kern.randompid=2000 cd /usr/tests/lib/libutil repeat 5000 kyua test pidfile_test --8<-- root@f10:~ # uname -a FreeBSD f10.3r 10.3-RELEASE FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25 02:10:02 UTC 2016 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 root@f10:~ # sysctl kern.randompid kern.randompid: 0 root@f10:~ # cd /usr/tests/lib/libutil/ root@f10:/usr/tests/lib/libutil # root@f10:/usr/tests/lib/libutil # repeat 5000 kyua test pidfile_test ... Results file id is usr_tests_lib_libutil.20160910-170441-426322 Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170441-426322.db 1/1 passed (0 failed) pidfile_test:main -> passed [0.003s] Results file id is usr_tests_lib_libutil.20160910-170441-439743 Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170441-439743.db 1/1 passed (0 failed) pidfile_test:main -> passed [0.003s] Results file id is usr_tests_lib_libutil.20160910-170441-453027 Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170441-453027.db 1/1 passed (0 failed) pidfile_test:main -> passed [0.003s] Results file id is usr_tests_lib_libutil.20160910-170441-466528 Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170441-466528.db ... 1/1 passed (0 failed) pidfile_test:main -> passed [0.003s] Results file id is usr_tests_lib_libutil.20160910-170441-479882 Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170441-479882.db 1/1 passed (0 failed) pidfile_test:main -> passed [0.003s] Results file id is usr_tests_lib_libutil.20160910-170441-493082 Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170441-493082.db 1/1 passed (0 failed) root@f10:/usr/tests/lib/libutil # sysctl kern.randompid=2000 kern.randompid: 0 -> 2000 root@f10:/usr/tests/lib/libutil # repeat 5000 kyua test pidfile_test ... Results file id is usr_tests_lib_libutil.20160910-170636-767664 Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170636-767664.db 1/1 passed (0 failed) pidfile_test:main -> passed [0.003s] Results file id is usr_tests_lib_libutil.20160910-170636-780812 Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170636-780812.db 1/1 passed (0 failed) pidfile_test:main -> passed [0.003s] Results file id is usr_tests_lib_libutil.20160910-170636-794136 Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170636-794136.db 1/1 passed (0 failed) pidfile_test:main -> passed [0.003s] Results file id is usr_tests_lib_libutil.20160910-170636-807206 Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170636-807206.db 1/1 passed (0 failed) pidfile_test:main -> passed [0.003s] Results file id is usr_tests_lib_libutil.20160910-170636-820402 Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170636-820402.db 1/1 passed (0 failed) pidfile_test:main -> # the kyua test can stucks here for an infinite time, (a day or so...) --8<-- --8<-- 1/1 passed (0 failed) pidfile_test:main -> ^C[-- Signal caught; please wait for cleanup --] load: 0.08 cmd: kyua 87586 [wait] 132.82r 0.00u 0.00s 0% 5832k ^C[-- Signal caught; please wait for cleanup --] broken: Caught unexpected exception: Tester received signal 6; this is a bug [134.589s] kyua: E: Interrupted by signal 2. --8<--
root@f10:~ # ps aux | grep pidfile root 23493 0.0 0.1 14496 1920 - I 5:06PM 0:00.00 /usr/tests/lib/libutil/pidfile_test root 88443 0.0 0.1 14496 1920 - I 5:06PM 0:00.00 /usr/tests/lib/libutil/pidfile_test root 90967 0.0 0.1 14496 1916 - Is 5:09PM 0:00.00 /usr/tests/lib/libutil/pidfile_test root 91273 0.0 0.1 14496 1920 - I 5:09PM 0:00.00 /usr/tests/lib/libutil/pidfile_test root 87915 0.0 0.3 31876 5848 0 I+ 5:09PM 0:00.01 kyua test pidfile_test root 3196 0.0 0.1 18832 2232 1 S+ 5:11PM 0:00.00 grep pidfile root@f10:~ # pgrep pidfile_test 91273 90967 88443 23493 root@f10:~ # foreach i ( `pgrep pidfile_test` ) foreach? procstat -kk $i foreach? end PID TID COMM TDNAME KSTACK 91273 100094 pidfile_test - mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _cv_wait_sig+0x17a seltdwait+0xae kern_select+0x8fa sys_select+0x54 amd64_syscall+0x40f Xfast_syscall+0xfb PID TID COMM TDNAME KSTACK 90967 100092 pidfile_test - mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_wait6+0x5de sys_wait4+0x72 amd64_syscall+0x40f Xfast_syscall+0xfb PID TID COMM TDNAME KSTACK 88443 100091 pidfile_test - mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _cv_wait_sig+0x17a seltdwait+0xae kern_select+0x8fa sys_select+0x54 amd64_syscall+0x40f Xfast_syscall+0xfb PID TID COMM TDNAME KSTACK 23493 100051 pidfile_test - mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _cv_wait_sig+0x17a seltdwait+0xae kern_select+0x8fa sys_select+0x54 amd64_syscall+0x40f Xfast_syscall+0xfb root@f10:~ #
root@f10:/usr/tests/lib/libutil # uname -a FreeBSD f10.3r 10.3-RELEASE FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25 02:10:02 UTC 2016 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
Additional info: my setup is a bhyve VM with 2 CPU and 2 GB of RAM and 20 GB of HDD image, on a Haswell based i5-4670 and a Skylake based i7-6700. I use the vmrun.sh script from the installed examples on Skylake machine and iohyve on Haswell.
And with higher kern.randompid value, the possibility increases.
I can reproduce this bug on head r305285M, also in bhyve. Attaching gdb shows the parent to be stuck in the waitpid() on line 215 and the child in the select() on line 172. The bug seems to be in the test. Sometimes the child takes very long to reach the select() call and the signal has already happened before that, and the child blocks indefinitely. A possible fix would be to block SIGINT instead of installing an empty handler and using sigwait() to wait for the signal.
Fixed in head by br@FreeBSD.org in SVN r306098.
For bugs matching the following conditions: - Status == In Progress - Assignee == "bugs@FreeBSD.org" - Last Modified Year <= 2017 Do - Set Status to "Open"
A commit references this bug: Author: emaste Date: Wed Jul 4 18:01:53 UTC 2018 New revision: 335964 URL: https://svnweb.freebsd.org/changeset/base/335964 Log: MFC r306098 (br): Use kqueue(2) instead of select(2). This helps to ensure we will not lose SIGINT sent by parent to child. PR: 212562, 228492 Changes: _U stable/11/ stable/11/lib/libutil/tests/pidfile_test.c
A commit references this bug: Author: emaste Date: Wed Jul 4 18:03:20 UTC 2018 New revision: 335965 URL: https://svnweb.freebsd.org/changeset/base/335965 Log: MFC r306098 (br): Use kqueue(2) instead of select(2). This helps to ensure we will not lose SIGINT sent by parent to child. PR: 212562, 228492 Changes: _U stable/10/ stable/10/lib/libutil/tests/pidfile_test.c