Bug 220162 - ports-mgmt/poudriere: jail -k doesn't work
Summary: ports-mgmt/poudriere: jail -k doesn't work
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: Bryan Drewery
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-06-20 13:24 UTC by Philip Homburg
Modified: 2018-01-24 20:59 UTC (History)
1 user (show)

See Also:
bugzilla: maintainer-feedback? (bdrewery)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philip Homburg 2017-06-20 13:24:58 UTC
poudriere jail -k doesn't seem to be doing anything. A few months ago, it would not start new builds but never actually kill the jails. More recently it simply doesn't do anything.

Example invocation: 'poudriere jail -k -j 11-0-i386 -p local'

$ poudriere version
3.1.19

11.0-RELEASE-p8 #0: Wed Feb 22 06:12:04 UTC 2017     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
Comment 1 Bryan Drewery freebsd_committer 2017-06-20 16:30:44 UTC
~/git/poudriere # ./poudriere -e /usr/local/etc jail -j exp-10amd64 -k
[00:00:00] ====>> Jail exp-10amd64-default not running, but cleaning up anyway
[00:00:00] ====>> Unmounting file systems


What output do you get?
Comment 2 Philip Homburg 2017-06-20 16:40:57 UTC
Nothing. 

After starting jail -k I manually kill all bulk.sh scripts and then poudriere cleans up the mounts and jails.
Comment 3 Bryan Drewery freebsd_committer 2017-06-20 17:02:12 UTC
Can you please run with -x? 'poudriere -x jail ...' and host the output somewhere?
Comment 4 Philip Homburg 2017-06-20 17:40:20 UTC
Without manual intervention (i.e. poudriere -x jail -k still hangs):

http://stereo.hq.phicoh.net/~philip/freebsd-bugs/220162/poudriere-x-jail-k.txt
Comment 5 Bryan Drewery freebsd_committer 2017-06-20 19:27:56 UTC
Hangs? That's the first you've mentioned a hang...
Comment 6 Philip Homburg 2017-06-20 19:31:22 UTC
As in poudriere waits for ever for pwait to finish.
Comment 7 Bryan Drewery freebsd_committer 2017-06-20 19:44:15 UTC
What's the output of:

cat /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/01.pid /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/02.pid /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/03.pid /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/04.pid /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/05.pid /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/06.pid /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/07.pid /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/08.pid
Comment 8 Bryan Drewery freebsd_committer 2017-06-20 19:59:56 UTC
(In reply to Bryan Drewery from comment #7)
> What's the output of:
> 
> cat /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/01.pid
> /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/02.pid
> /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/03.pid
> /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/04.pid
> /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/05.pid
> /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/06.pid
> /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/07.pid
> /usr/local/poudriere/data/.m/11-0-i386-local/ref/.p/var/run/08.pid

And are any of the pids in them running? Check ps.

Don't kill them though, I'm betting there is a pid-reuse problem here.
Comment 9 Philip Homburg 2017-06-20 20:04:09 UTC
I guess I never waited long enough. poudriere jail -k does terminate. It just took about 1.5 hours.

I started a new one.

Here is the output of the cat command:
43278
43357
43381
43405
43421
43436
43465
57613

# ps ax | egrep '43278|43357|43381|43405|43421|43436|43465|57613'
45461  2  I+        0:00.00 pwait 43278 43357 43381 45384 43405 43421 43436 434
77393  3  S+        0:00.00 egrep 43278|43357|43381|43405|43421|43436|43465|576
43278 26  I+        0:00.01 sh -e /usr/local/share/poudriere/bulk.sh -j 11-0-i3
43357 26  I+        0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j 11-0-i3
43381 26  I+        0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j 11-0-i3
43405 26  I+        0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j 11-0-i3
43421 26  I+        0:00.01 sh -e /usr/local/share/poudriere/bulk.sh -j 11-0-i3
43436 26  I+        0:00.01 sh -e /usr/local/share/poudriere/bulk.sh -j 11-0-i3
43465 26  S+        0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j 11-0-i3
57613 26  I+        0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j 11-0-i3
Comment 10 Philip Homburg 2017-06-20 20:12:00 UTC
So it seems that jail -k just waits for the builds that were running when jail -k was invoked to finish on their own. While this is going on new builds get started. Then when the last of those existing ones is finish it actually cleans up and kill the jails.

For my use I need jail -k to actually kill the current builds and clean up as soon as possible.
Comment 11 Bryan Drewery freebsd_committer 2017-06-20 20:14:20 UTC
(In reply to Philip Homburg from comment #9)
> I guess I never waited long enough. poudriere jail -k does terminate. It
> just took about 1.5 hours.
> 
> I started a new one.
> 
> Here is the output of the cat command:
> 43278
> 43357
> 43381
> 43405
> 43421
> 43436
> 43465
> 57613
> 
> # ps ax | egrep '43278|43357|43381|43405|43421|43436|43465|57613'
> 45461  2  I+        0:00.00 pwait 43278 43357 43381 45384 43405 43421 43436
> 434
> 77393  3  S+        0:00.00 egrep
> 43278|43357|43381|43405|43421|43436|43465|576
> 43278 26  I+        0:00.01 sh -e /usr/local/share/poudriere/bulk.sh -j
> 11-0-i3
> 43357 26  I+        0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j
> 11-0-i3
> 43381 26  I+        0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j
> 11-0-i3
> 43405 26  I+        0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j
> 11-0-i3
> 43421 26  I+        0:00.01 sh -e /usr/local/share/poudriere/bulk.sh -j
> 11-0-i3
> 43436 26  I+        0:00.01 sh -e /usr/local/share/poudriere/bulk.sh -j
> 11-0-i3
> 43465 26  S+        0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j
> 11-0-i3
> 57613 26  I+        0:00.02 sh -e /usr/local/share/poudriere/bulk.sh -j
> 11-0-i3

Perfect, thanks.

The 'jail -k' case is missing sending a 'pkill' to all of the processes, so it just ends up waiting for the current build to finish.
This is done properly in the bulk cleanup case though.

I'll get it fixed.
Comment 12 Bryan Drewery freebsd_committer 2017-06-20 20:14:46 UTC
(In reply to Philip Homburg from comment #10)
> So it seems that jail -k just waits for the builds that were running when
> jail -k was invoked to finish on their own. While this is going on new
> builds get started. Then when the last of those existing ones is finish it
> actually cleans up and kill the jails.
> 
> For my use I need jail -k to actually kill the current builds and clean up
> as soon as possible.

Yup that is the intention - that -k kills everything right away.
Comment 13 Bryan Drewery freebsd_committer 2017-06-20 20:16:31 UTC
This patch should be a workaround until I get something committed. I have not tested it. Apply to /usr/local/share/poudriere/jail.sh:

https://people.freebsd.org/~bdrewery/patches/poudriere-jail-k.diff
Comment 14 Philip Homburg 2017-06-20 21:51:35 UTC
After applying the patch I now have a few instances for bulk.sh left. 

   0 47196 47195   0  52  0   73708 11888 select   I+   26       0:00.02 /usr/lo
cal/bin/python2.7 /home/deploy/.ansible/tmp/ansible-tmp-1497994106.81-9221376935
427/command.py
   0 47197 47196   0  20  0   84864 14764 select   S+   26       0:00.11 /usr/lo
cal/bin/python2.7 /tmp/ansible_pTOw7o/ansible_module_command.py
   0 47198 47197   0  20  0    8452  3320 select   I+   26       0:01.29 sh -e /
usr/local/share/poudriere/bulk.sh -j 11-0-amd64 -p local -f /usr/local/etc/poudr
iere.d/port-list
   0 47408 47198   0  52  0    8452  3100 nanslp   S+   26       0:03.46 sh -e /
usr/local/share/poudriere/bulk.sh -j 11-0-amd64 -p local -f /usr/local/etc/poudr
iere.d/port-list
   0 47413 47198   0  52  0    8452  3096 piperd   I+   26       0:00.00 sh -e /
usr/local/share/poudriere/bulk.sh -j 11-0-amd64 -p local -f /usr/local/etc/poudr
iere.d/port-list

No idea though what 47408 and 47413 are waiting for.
Comment 15 Bryan Drewery freebsd_committer 2017-06-20 22:54:46 UTC
(In reply to Philip Homburg from comment #14)
> After applying the patch I now have a few instances for bulk.sh left. 
> 
>    0 47196 47195   0  52  0   73708 11888 select   I+   26       0:00.02
> /usr/lo
> cal/bin/python2.7
> /home/deploy/.ansible/tmp/ansible-tmp-1497994106.81-9221376935
> 427/command.py
>    0 47197 47196   0  20  0   84864 14764 select   S+   26       0:00.11
> /usr/lo
> cal/bin/python2.7 /tmp/ansible_pTOw7o/ansible_module_command.py
>    0 47198 47197   0  20  0    8452  3320 select   I+   26       0:01.29 sh
> -e /
> usr/local/share/poudriere/bulk.sh -j 11-0-amd64 -p local -f
> /usr/local/etc/poudr
> iere.d/port-list
>    0 47408 47198   0  52  0    8452  3100 nanslp   S+   26       0:03.46 sh
> -e /
> usr/local/share/poudriere/bulk.sh -j 11-0-amd64 -p local -f
> /usr/local/etc/poudr
> iere.d/port-list
>    0 47413 47198   0  52  0    8452  3096 piperd   I+   26       0:00.00 sh
> -e /
> usr/local/share/poudriere/bulk.sh -j 11-0-amd64 -p local -f
> /usr/local/etc/poudr
> iere.d/port-list
> 
> No idea though what 47408 and 47413 are waiting for.

Well, 'jail -k' is not really intended to kill an active 'poudriere bulk', only an orphaned one that has crashed or really its jail/mounts.  So with
the patch I've given it does cleanup the jail/mounts, but it leaves behind
processes that are out-of-scope of jail -k.

Why do you want to kill an active bulk like this?  If you're running it
from another script you'll have the main poudriere bulk PID from there and
can kill it directly.
Comment 16 Bryan Drewery freebsd_committer 2017-06-20 22:55:45 UTC
And thinking more I'm not sure I want to commit the patch I've provided as
if you reboot between a 'bulk' and running 'jail -k' now suddenly poudriere
is killing random PIDs from the jail that are no longer relevant.  That
could end very badly.
Comment 17 Philip Homburg 2017-06-21 09:02:23 UTC
Recently, as in somewhere in the last 6 months or so, poudriere started using insane amounts of I/O bandwidth. Somehow build dependencies got a lot more expensive. I haven't looked into why. 

In the past, a poudriere run was fast enough that I could start it during the night and in would be finish in the morning. Now it takes for ever.

Sometimes I need the machine for something else, so I need to kill the poudriere run. Which is fine, because it at the next run it will continue just fine.

I start poudriere through ansible, which doesn't seem to propagate ^C properly.
And in the past jail -k worked fine.
Comment 18 Walter Schwarzenfeld freebsd_triage 2018-01-19 03:27:36 UTC
Do I understand right, and this is solved? Could I close it?
Comment 19 Bryan Drewery freebsd_committer 2018-01-19 22:24:51 UTC
(In reply to w.schwarzenfeld from comment #18)
> Do I understand right, and this is solved? Could I close it?

No and no.
Comment 20 commit-hook freebsd_committer 2018-01-24 20:59:08 UTC
A commit references this bug:

Author: bdrewery
Date: Wed Jan 24 20:58:21 UTC 2018
New revision: 459889
URL: https://svnweb.freebsd.org/changeset/ports/459889

Log:
  - Provide a compatibility cppunit-config.
    Upstream intends scripts to use pkg-config now, but there are plenty of old
    cppunit.m4 files that expect to find cppunit-config still, including
    several ports.

  PR:		220162
  Reported by:	Greg V <greg@unrelenting.technology>

Changes:
  head/devel/cppunit/Makefile
  head/devel/cppunit/files/cppunit-config.in
  head/devel/cppunit/pkg-plist