poudriere jail -k doesn't seem to be doing anything. A few months ago, it would not start new builds but never actually kill the jails. More recently it simply doesn't do anything. Example invocation: 'poudriere jail -k -j 11-0-i386 -p local' $ poudriere version 3.1.19 11.0-RELEASE-p8 #0: Wed Feb 22 06:12:04 UTC 2017 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
~/git/poudriere # ./poudriere -e /usr/local/etc jail -j exp-10amd64 -k [00:00:00] ====>> Jail exp-10amd64-default not running, but cleaning up anyway [00:00:00] ====>> Unmounting file systems What output do you get?
Nothing. After starting jail -k I manually kill all bulk.sh scripts and then poudriere cleans up the mounts and jails.
Can you please run with -x? 'poudriere -x jail ...' and host the output somewhere?
Without manual intervention (i.e. poudriere -x jail -k still hangs): http://stereo.hq.phicoh.net/~philip/freebsd-bugs/220162/poudriere-x-jail-k.txt
Hangs? That's the first you've mentioned a hang...
As in poudriere waits for ever for pwait to finish.
What's the output of: cat /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/01.pid /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/02.pid /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/03.pid /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/04.pid /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/05.pid /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/06.pid /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/07.pid /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/08.pid
(In reply to Bryan Drewery from comment #7) > What's the output of: > > cat /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/01.pid > /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/02.pid > /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/03.pid > /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/04.pid > /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/05.pid > /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/06.pid > /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/07.pid > /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/08.pid And are any of the pids in them running? Check ps. Don't kill them though, I'm betting there is a pid-reuse problem here.
I guess I never waited long enough. poudriere jail -k does terminate. It just took about 1.5 hours. I started a new one. Here is the output of the cat command: 43278 43357 43381 43405 43421 43436 43465 57613 # ps ax | egrep '43278|43357|43381|43405|43421|43436|43465|57613' 45461 2 I+ 0:00.00 pwait 43278 43357 43381 45384 43405 43421 43436 434 77393 3 S+ 0:00.00 egrep 43278|43357|43381|43405|43421|43436|43465|576 43278 26 I+ 0:00.01 sh -e /usr/local/share/poudriere/bulk.sh -j 11-0-i3 43357 26 I+ 0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j 11-0-i3 43381 26 I+ 0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j 11-0-i3 43405 26 I+ 0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j 11-0-i3 43421 26 I+ 0:00.01 sh -e /usr/local/share/poudriere/bulk.sh -j 11-0-i3 43436 26 I+ 0:00.01 sh -e /usr/local/share/poudriere/bulk.sh -j 11-0-i3 43465 26 S+ 0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j 11-0-i3 57613 26 I+ 0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j 11-0-i3
So it seems that jail -k just waits for the builds that were running when jail -k was invoked to finish on their own. While this is going on new builds get started. Then when the last of those existing ones is finish it actually cleans up and kill the jails. For my use I need jail -k to actually kill the current builds and clean up as soon as possible.
(In reply to Philip Homburg from comment #9) > I guess I never waited long enough. poudriere jail -k does terminate. It > just took about 1.5 hours. > > I started a new one. > > Here is the output of the cat command: > 43278 > 43357 > 43381 > 43405 > 43421 > 43436 > 43465 > 57613 > > # ps ax | egrep '43278|43357|43381|43405|43421|43436|43465|57613' > 45461 2 I+ 0:00.00 pwait 43278 43357 43381 45384 43405 43421 43436 > 434 > 77393 3 S+ 0:00.00 egrep > 43278|43357|43381|43405|43421|43436|43465|576 > 43278 26 I+ 0:00.01 sh -e /usr/local/share/poudriere/bulk.sh -j > 11-0-i3 > 43357 26 I+ 0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j > 11-0-i3 > 43381 26 I+ 0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j > 11-0-i3 > 43405 26 I+ 0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j > 11-0-i3 > 43421 26 I+ 0:00.01 sh -e /usr/local/share/poudriere/bulk.sh -j > 11-0-i3 > 43436 26 I+ 0:00.01 sh -e /usr/local/share/poudriere/bulk.sh -j > 11-0-i3 > 43465 26 S+ 0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j > 11-0-i3 > 57613 26 I+ 0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j > 11-0-i3 Perfect, thanks. The 'jail -k' case is missing sending a 'pkill' to all of the processes, so it just ends up waiting for the current build to finish. This is done properly in the bulk cleanup case though. I'll get it fixed.
(In reply to Philip Homburg from comment #10) > So it seems that jail -k just waits for the builds that were running when > jail -k was invoked to finish on their own. While this is going on new > builds get started. Then when the last of those existing ones is finish it > actually cleans up and kill the jails. > > For my use I need jail -k to actually kill the current builds and clean up > as soon as possible. Yup that is the intention - that -k kills everything right away.
This patch should be a workaround until I get something committed. I have not tested it. Apply to /usr/local/share/poudriere/jail.sh: https://people.freebsd.org/~bdrewery/patches/poudriere-jail-k.diff
After applying the patch I now have a few instances for bulk.sh left. 0 47196 47195 0 52 0 73708 11888 select I+ 26 0:00.02 /usr/lo cal/bin/python2.7 /home/deploy/.ansible/tmp/ansible-tmp-1497994106.81-9221376935 427/command.py 0 47197 47196 0 20 0 84864 14764 select S+ 26 0:00.11 /usr/lo cal/bin/python2.7 /tmp/ansible_pTOw7o/ansible_module_command.py 0 47198 47197 0 20 0 8452 3320 select I+ 26 0:01.29 sh -e / usr/local/share/poudriere/bulk.sh -j 11-0-amd64 -p local -f /usr/local/etc/poudr iere.d/port-list 0 47408 47198 0 52 0 8452 3100 nanslp S+ 26 0:03.46 sh -e / usr/local/share/poudriere/bulk.sh -j 11-0-amd64 -p local -f /usr/local/etc/poudr iere.d/port-list 0 47413 47198 0 52 0 8452 3096 piperd I+ 26 0:00.00 sh -e / usr/local/share/poudriere/bulk.sh -j 11-0-amd64 -p local -f /usr/local/etc/poudr iere.d/port-list No idea though what 47408 and 47413 are waiting for.
(In reply to Philip Homburg from comment #14) > After applying the patch I now have a few instances for bulk.sh left. > > 0 47196 47195 0 52 0 73708 11888 select I+ 26 0:00.02 > /usr/lo > cal/bin/python2.7 > /home/deploy/.ansible/tmp/ansible-tmp-1497994106.81-9221376935 > 427/command.py > 0 47197 47196 0 20 0 84864 14764 select S+ 26 0:00.11 > /usr/lo > cal/bin/python2.7 /tmp/ansible_pTOw7o/ansible_module_command.py > 0 47198 47197 0 20 0 8452 3320 select I+ 26 0:01.29 sh > -e / > usr/local/share/poudriere/bulk.sh -j 11-0-amd64 -p local -f > /usr/local/etc/poudr > iere.d/port-list > 0 47408 47198 0 52 0 8452 3100 nanslp S+ 26 0:03.46 sh > -e / > usr/local/share/poudriere/bulk.sh -j 11-0-amd64 -p local -f > /usr/local/etc/poudr > iere.d/port-list > 0 47413 47198 0 52 0 8452 3096 piperd I+ 26 0:00.00 sh > -e / > usr/local/share/poudriere/bulk.sh -j 11-0-amd64 -p local -f > /usr/local/etc/poudr > iere.d/port-list > > No idea though what 47408 and 47413 are waiting for. Well, 'jail -k' is not really intended to kill an active 'poudriere bulk', only an orphaned one that has crashed or really its jail/mounts. So with the patch I've given it does cleanup the jail/mounts, but it leaves behind processes that are out-of-scope of jail -k. Why do you want to kill an active bulk like this? If you're running it from another script you'll have the main poudriere bulk PID from there and can kill it directly.
And thinking more I'm not sure I want to commit the patch I've provided as if you reboot between a 'bulk' and running 'jail -k' now suddenly poudriere is killing random PIDs from the jail that are no longer relevant. That could end very badly.
Recently, as in somewhere in the last 6 months or so, poudriere started using insane amounts of I/O bandwidth. Somehow build dependencies got a lot more expensive. I haven't looked into why. In the past, a poudriere run was fast enough that I could start it during the night and in would be finish in the morning. Now it takes for ever. Sometimes I need the machine for something else, so I need to kill the poudriere run. Which is fine, because it at the next run it will continue just fine. I start poudriere through ansible, which doesn't seem to propagate ^C properly. And in the past jail -k worked fine.
Do I understand right, and this is solved? Could I close it?
(In reply to w.schwarzenfeld from comment #18) > Do I understand right, and this is solved? Could I close it? No and no.
A commit references this bug: Author: bdrewery Date: Wed Jan 24 20:58:21 UTC 2018 New revision: 459889 URL: https://svnweb.freebsd.org/changeset/ports/459889 Log: - Provide a compatibility cppunit-config. Upstream intends scripts to use pkg-config now, but there are plenty of old cppunit.m4 files that expect to find cppunit-config still, including several ports. PR: 220162 Reported by: Greg V <greg@unrelenting.technology> Changes: head/devel/cppunit/Makefile head/devel/cppunit/files/cppunit-config.in head/devel/cppunit/pkg-plist
Anyway, -k for jail is to stop the jail started with -s, isn't it? For that matter, the bug is that SYNOPSIS in the man poudriere-jail seems to be wrong :)