We are observing latest version of pkg hanging if some process spawn by the script remains running after script itself exits.
Inspecting the process with GDB the pkg-static process stucks doing poll() on FD#7:
#0 0x000000000084ba1a in _poll ()
#1 0x00000000007fad26 in __thr_poll ()
#2 0x00000000005e7303 in pkg_script_run (pkg=0x801079000, type=PKG_SCRIPT_POST_INSTALL, upgrade=false) at scripts.c:248
#3 0x0000000000469574 in pkg_add_port (db=0x801078700, pkg=0x801079000, input_path=0x7fffffffe7b7 "/tmp/media/ssp_port/work/stage", reloc=0x0, testing=false)
#4 0x0000000000416535 in exec_register (argc=7, argv=0x7fffffffe3e8) at register.c:195
#5 0x0000000000411202 in main (argc=7, argv=0x7fffffffe3e8) at main.c:886
Then checking the file descriptor table using lsof shows the following, the stray denyhosts process is a culprit, however it's unclear why it should have a pipe connected all the way to the pkg-static:
[sobomax@builder ~]$ ps -xalww | grep -w 33903
0 33903 33901 0 52 0 27020 22276 select I - 0:02.54 /usr/local/sbin/pkg-static register -i /tmp/media/ssp_port/work/stage -m /tmp/media/ssp_port/work/.metadir -f /tmp/media/ssp_port/work/.PLIST.mktmp
0 33923 33903 0 73 0 0 0 - Z - 0:02.16 <defunct>
0 34324 33903 0 75 0 0 0 - Z - 0:00.00 <defunct>
0 34325 33903 0 20 0 21096 17760 select S - 0:00.01 /usr/local/bin/python2.7 /usr/local/bin/denyhosts.py --config /usr/local/etc/denyhosts.conf --noemail --daemon
70 34361 33903 0 73 0 0 0 - Z - 0:05.03 <defunct>
[sobomax@builder ~]$ sudo lsof | grep -w 0xfffff8017cc32000
pkg-stati 33903 root 7u unix 0xfffff8017cc32000 0t0 ->0xfffff80480a06000
python2.7 34325 root 4u unix 0xfffff80480a06000 0t0 ->0xfffff8017cc32000
I believe this is direct result of the single change here:
Some comments were left in that commit as to how to fix possibly.
Created attachment 213235 [details]
Patch to fix the issue
The following patch seems to be solving the problem for us returning pkg to the previous behavior, which was to complete as soon as the child exits.
I hit this problem today. Is there any progress or was it intentional change?
Also, I see the same code in libpkg/lua_scripts.c. Maybe it needs the same patch?
(In reply to Jung-uk Kim from comment #2)
Jung-uk, yes, I think so. I overlooked that.
Sorry I missed that issue, do you have an example of script that can reproduce the issue? I would like to add a regression tests.
A commit references this bug:
Date: Wed Apr 29 07:32:01 UTC 2020
New revision: 533325
Update to 1.14.4
- fix a hang in pkg scripts
Submitted by: sobomax
Reported by: sobomax, jkim
I still see the same problem with 1.14.4. I see the committed patch is little different from the original patch.
Basically, sobomax made poll(2) time out but the committed version does not time out. Was it intentional?
A new pull request is filed.