Bug 212562 - stucking kyua test (/usr/tests/lib/libutil) on 10.3-RELEASE
Summary: stucking kyua test (/usr/tests/lib/libutil) on 10.3-RELEASE
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-10 15:17 UTC by op
Modified: 2018-07-04 18:03 UTC (History)
7 users (show)

See Also:
op: mfc-stable10?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description op 2016-09-10 15:17:51 UTC
How-to reproduce:

sysctl kern.randompid=2000
cd /usr/tests/lib/libutil
repeat 5000 kyua test pidfile_test

--8<--
root@f10:~ # uname -a
FreeBSD f10.3r 10.3-RELEASE FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25 02:10:02 UTC 2016     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
root@f10:~ # sysctl kern.randompid
kern.randompid: 0
root@f10:~ # cd /usr/tests/lib/libutil/
root@f10:/usr/tests/lib/libutil # 
root@f10:/usr/tests/lib/libutil # repeat 5000 kyua test pidfile_test

...

Results file id is usr_tests_lib_libutil.20160910-170441-426322
Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170441-426322.db

1/1 passed (0 failed)
pidfile_test:main  ->  passed  [0.003s]

Results file id is usr_tests_lib_libutil.20160910-170441-439743
Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170441-439743.db

1/1 passed (0 failed)
pidfile_test:main  ->  passed  [0.003s]

Results file id is usr_tests_lib_libutil.20160910-170441-453027
Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170441-453027.db

1/1 passed (0 failed)
pidfile_test:main  ->  passed  [0.003s]

Results file id is usr_tests_lib_libutil.20160910-170441-466528
Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170441-466528.db

...

1/1 passed (0 failed)
pidfile_test:main  ->  passed  [0.003s]

Results file id is usr_tests_lib_libutil.20160910-170441-479882
Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170441-479882.db

1/1 passed (0 failed)
pidfile_test:main  ->  passed  [0.003s]

Results file id is usr_tests_lib_libutil.20160910-170441-493082
Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170441-493082.db

1/1 passed (0 failed)

root@f10:/usr/tests/lib/libutil # sysctl kern.randompid=2000
kern.randompid: 0 -> 2000

root@f10:/usr/tests/lib/libutil # repeat 5000 kyua test pidfile_test

...

Results file id is usr_tests_lib_libutil.20160910-170636-767664
Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170636-767664.db

1/1 passed (0 failed)
pidfile_test:main  ->  passed  [0.003s]

Results file id is usr_tests_lib_libutil.20160910-170636-780812
Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170636-780812.db

1/1 passed (0 failed)
pidfile_test:main  ->  passed  [0.003s]

Results file id is usr_tests_lib_libutil.20160910-170636-794136
Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170636-794136.db

1/1 passed (0 failed)
pidfile_test:main  ->  passed  [0.003s]

Results file id is usr_tests_lib_libutil.20160910-170636-807206
Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170636-807206.db

1/1 passed (0 failed)
pidfile_test:main  ->  passed  [0.003s]

Results file id is usr_tests_lib_libutil.20160910-170636-820402
Results saved to /root/.kyua/store/results.usr_tests_lib_libutil.20160910-170636-820402.db

1/1 passed (0 failed)
pidfile_test:main  ->  

# the kyua test can stucks here for an infinite time, (a day or so...)
--8<--

--8<--
1/1 passed (0 failed)
pidfile_test:main  ->  ^C[-- Signal caught; please wait for cleanup --]
load: 0.08  cmd: kyua 87586 [wait] 132.82r 0.00u 0.00s 0% 5832k
^C[-- Signal caught; please wait for cleanup --]
broken: Caught unexpected exception: Tester received signal 6; this is a bug  [134.589s]
kyua: E: Interrupted by signal 2.

--8<--
Comment 1 op 2016-09-10 15:20:58 UTC
root@f10:~ # ps aux | grep pidfile
root  23493   0.0  0.1 14496 1920  -  I     5:06PM  0:00.00 /usr/tests/lib/libutil/pidfile_test
root  88443   0.0  0.1 14496 1920  -  I     5:06PM  0:00.00 /usr/tests/lib/libutil/pidfile_test
root  90967   0.0  0.1 14496 1916  -  Is    5:09PM  0:00.00 /usr/tests/lib/libutil/pidfile_test
root  91273   0.0  0.1 14496 1920  -  I     5:09PM  0:00.00 /usr/tests/lib/libutil/pidfile_test
root  87915   0.0  0.3 31876 5848  0  I+    5:09PM  0:00.01 kyua test pidfile_test
root   3196   0.0  0.1 18832 2232  1  S+    5:11PM  0:00.00 grep pidfile
root@f10:~ # pgrep pidfile_test
91273
90967
88443
23493
root@f10:~ # foreach  i ( `pgrep pidfile_test` )
foreach? procstat -kk $i
foreach? end
  PID    TID COMM             TDNAME           KSTACK                       
91273 100094 pidfile_test     -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _cv_wait_sig+0x17a seltdwait+0xae kern_select+0x8fa sys_select+0x54 amd64_syscall+0x40f Xfast_syscall+0xfb 
  PID    TID COMM             TDNAME           KSTACK                       
90967 100092 pidfile_test     -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_wait6+0x5de sys_wait4+0x72 amd64_syscall+0x40f Xfast_syscall+0xfb 
  PID    TID COMM             TDNAME           KSTACK                       
88443 100091 pidfile_test     -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _cv_wait_sig+0x17a seltdwait+0xae kern_select+0x8fa sys_select+0x54 amd64_syscall+0x40f Xfast_syscall+0xfb 
  PID    TID COMM             TDNAME           KSTACK                       
23493 100051 pidfile_test     -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _cv_wait_sig+0x17a seltdwait+0xae kern_select+0x8fa sys_select+0x54 amd64_syscall+0x40f Xfast_syscall+0xfb 
root@f10:~ #
Comment 2 op 2016-09-10 15:25:04 UTC
root@f10:/usr/tests/lib/libutil # uname -a
FreeBSD f10.3r 10.3-RELEASE FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25 02:10:02 UTC 2016     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
Comment 3 op 2016-09-16 22:31:47 UTC
Additional info: my setup is a bhyve VM with 2 CPU and 2 GB of RAM and 20 GB of
HDD image, on a Haswell based i5-4670 and a Skylake based i7-6700. I use the
vmrun.sh script from the installed examples on Skylake machine and iohyve on
Haswell.
Comment 4 op 2016-09-16 22:33:08 UTC
And with higher kern.randompid value, the possibility increases.
Comment 5 Jilles Tjoelker freebsd_committer freebsd_triage 2016-09-18 22:09:59 UTC
I can reproduce this bug on head r305285M, also in bhyve. Attaching gdb shows the parent to be stuck in the waitpid() on line 215 and the child in the select() on line 172.

The bug seems to be in the test. Sometimes the child takes very long to reach the select() call and the signal has already happened before that, and the child blocks indefinitely.

A possible fix would be to block SIGINT instead of installing an empty handler and using sigwait() to wait for the signal.
Comment 6 Jilles Tjoelker freebsd_committer freebsd_triage 2016-09-27 22:01:54 UTC
Fixed in head by br@FreeBSD.org in SVN r306098.
Comment 7 Eitan Adler freebsd_committer freebsd_triage 2018-05-20 23:50:20 UTC
For bugs matching the following conditions:
- Status == In Progress
- Assignee == "bugs@FreeBSD.org"
- Last Modified Year <= 2017

Do
- Set Status to "Open"
Comment 8 commit-hook freebsd_committer freebsd_triage 2018-07-04 18:02:43 UTC
A commit references this bug:

Author: emaste
Date: Wed Jul  4 18:01:53 UTC 2018
New revision: 335964
URL: https://svnweb.freebsd.org/changeset/base/335964

Log:
  MFC r306098 (br): Use kqueue(2) instead of select(2).

  This helps to ensure we will not lose SIGINT sent by parent to child.

  PR:		212562, 228492

Changes:
_U  stable/11/
  stable/11/lib/libutil/tests/pidfile_test.c
Comment 9 commit-hook freebsd_committer freebsd_triage 2018-07-04 18:03:50 UTC
A commit references this bug:

Author: emaste
Date: Wed Jul  4 18:03:20 UTC 2018
New revision: 335965
URL: https://svnweb.freebsd.org/changeset/base/335965

Log:
  MFC r306098 (br): Use kqueue(2) instead of select(2).

  This helps to ensure we will not lose SIGINT sent by parent to child.

  PR:		212562, 228492

Changes:
_U  stable/10/
  stable/10/lib/libutil/tests/pidfile_test.c