When building devel/ice 3.6.0 on amd64 (10.1-RELEASE) via poudriere the build hangs at: *** test started: 07/21/15 14:06:12 *** using Ice source dist (64bit) starting icegrid registry... ok starting icegrid node... ok starting glacier2... ok testing login with username/password... ok testing commands... ok stopping glacier2... make: Working in: /usr/ports/devel/ice Disabling the test is a workaround but clearly not ideal
Hi Steven, As I couldn't reproduce the bug yet, could you maybe try running the unit test manually (e.g. by building with poudriere testport -I) to see if it fails every time? After building you would go into cd .../ice/work/ice-3.6.0/cpp/test/IceGrid/admin while true; do /usr/local/bin/python2.7 run.py; done I hope that the issues are caused by some sort of resource shortage (there were a few more other builds that failed in other places). As I can't really reproduce these failures yet, I will probably have to improve Ice's unit tests to be a bit more vocal if something fails (in this case it seems like glacier gets stuck on shutdown for unknown reasons). Unfortunately this could be a lot of work and I don't really have time to address this in the next couple of days, so I'm tempted to disable unit tests for the time being, so that packages are built. If it turns out, that the underlying issue is severe this is be a bad idea though.
Added Roger in case he's interested
I'll see if I can reproduce and let you know if I can.
Unable to reproduce on bare metal in a 10.1-RELEASE jail. I'll try in various other VMs to see if I can make it happen.
No success reproducing in a 1 CPU VM either, I'm afraid.
There are two more tests that break or hang on package builders occasionally, those are: cpp/test/Ice/hold (most of the time) cpp/test/Glacier2/override (sometimes) In both cases the failing step operates on oneway proxies (might have issues with packet loss). In case of test 85 it also involves callbacks and sleeps, so it might be timing sensitive (but I couldn't reproduce it even when changing those sleep statements). Test 85 also involves a calls to hold and activate, which it has in common with the Ice/hold test. I can't reproduce any of these issues though, so my guess is that it's some sort of resource exhaustion and/or networking issue (maybe ports are used by some other process, it's using ports around 12000 for all tests and afaik poudriere uses the same IPs as other processes/buildjails). *** running tests 33/93 in /wrkdirs/usr/ports/devel/ice/work/ice-3.6.0/cpp/test/Ice/hold *** configuration: Default *** test started: 07/14/15 02:04:21 *** using Ice source dist (64bit) starting server... ok starting client... ok testing stringToProxy... ok testing checked cast... ok changing state between active and hold rapidly... ok testing without serialize mode... ok testing with serialize mode... -! 07/14/15 02:05:21.741 /wrkdirs/usr/ports/devel/ice/work/ice-3.6.0/cpp/test/Ice/hold/client: warning: connection exception: ConnectionI.cpp:2048: Ice::TimeoutException: timeout while sending or receiving data local address = 127.0.0.1:49519 remote address = 127.0.0.1:12011 failed! AllTests.cpp:169: assertion `cond->value()' failed unexpected exit status: expected: 0, got -6 -! 07/14/15 02:05:21.880 /wrkdirs/usr/ports/devel/ice/work/ice-3.6.0/cpp/test/Ice/hold/server: warning: connection exception: StreamSocket.cpp:202: Ice::ConnectionLostException: connection lost: recv() returned zero local address = 127.0.0.1:12011 remote address = 127.0.0.1:49872 -! 07/14/15 02:05:21.880 /wrkdirs/usr/ports/devel/ice/work/ice-3.6.0/cpp/test/Ice/hold/server: warning: connection exception: StreamSocket.cpp:202: Ice::ConnectionLostException: connection lost: recv() returned zero local address = 127.0.0.1:12010 remote address = 127.0.0.1:49518 ('test in /wrkdirs/usr/ports/devel/ice/work/ice-3.6.0/cpp/test/Ice/hold failed with exit status', 1) *** Error code 1 Stop. make: stopped in /usr/ports/devel/ice -------------------- *** running tests 85/93 in /wrkdirs/usr/ports/devel/ice/work/ice-3.6.0/cpp/test/Glacier2/override *** configuration: Default *** test started: 07/14/15 07:40:04 *** using Ice source dist (64bit) starting router in buffered mode... ok starting server... ok starting client... ok testing client request override... ok testing server request override... ====>> Killing runaway build after 7200 seconds with no output
A commit references this bug: Author: grembo Date: Sun Aug 2 13:47:43 UTC 2015 New revision: 393423 URL: https://svnweb.freebsd.org/changeset/ports/393423 Log: Add debug output in an attempt to figure out why certain tests fail on package builders. PR: 201743 Approved by: mentors (implicit) Changes: head/devel/ice/files/patch-cpp-test-Glacier2-override-Client.cpp head/devel/ice/files/patch-cpp-test-Ice-hold-AllTests.cpp
failing here consistently using poudriere 10.2-RELEASE-p2 amd64 *** running tests 89/93 in /wrkdirs/usr/ports/devel/ice/work/ice-3.6.0/cpp/test/Glacier2/staticFiltering *** configuration: Default *** test started: 09/02/15 07:48:00 *** using Ice source dist (64bit) testing category filter... ok testing adapter id filter... helloF1 @ "an adapter with spaces" Outgoing.cpp:535: Ice::UnknownLocalExce-! 09/02/15 07:48:01.739 /wrkdirs/usr/ports/devel/ice/work/ice-3.6.0/cpp/bin/glacier2router: warning: dispatch exception: Network.cpp:2369: Ice::ConnectionRefusedException: ption: unknown loc connection refused: Connection refused al exception identity: helloF1 : Netw facet: ork.cpp:2369 operation: ice_ping : Ice::ConnectionRefused remote host: 127.0.0.1 remote port: 33301 Exception: connection refused: Connection refused failed! Client.cpp:122: assertion `"Unexpected local exception" == 0' failed unexpected exit status: expected: 0, got -6 ('test in /wrkdirs/usr/ports/devel/ice/work/ice-3.6.0/cpp/test/Glacier2/staticFiltering failed with exit status', 1) *** Error code 1 Stop. make: stopped in /usr/ports/devel/ice
@Vikash: Could you please provide the complete poudriere build log and maybe add some more details about the system you're building on. Especially if it's bare metal or VM, how many CPU, network configuration etc. I would like to reproduce the problem.
A commit references this bug: Author: grembo Date: Sun Sep 6 13:59:03 UTC 2015 New revision: 396218 URL: https://svnweb.freebsd.org/changeset/ports/396218 Log: Add timing information to debug output to understand if timeouts on package builders are real. PR: 201743 Approved by: mentors (implicit) Changes: head/devel/ice/files/patch-cpp-test-Glacier2-override-Client.cpp head/devel/ice/files/patch-cpp-test-Ice-hold-AllTests.cpp
(In reply to Michael Gmelin from comment #9) logs and dmesg located: http://enrage.where-ever.za.net/~vikashb/FreeBSD/BUG_201743/ Host is a VM in OpenStack kilo 64GB Ram 8 CPU network : vtnet disk: : vtbd[0-2]
(In reply to Vikash Badal from comment #11) poudriere-3.1.7
A commit references this bug: Author: grembo Date: Tue Sep 8 10:16:02 UTC 2015 New revision: 396360 URL: https://svnweb.freebsd.org/changeset/ports/396360 Log: Remove C++11 specific construct. PR: 201743 Approved by: mentors (implicit) Changes: head/devel/ice/files/patch-cpp-test-Glacier2-override-Client.cpp head/devel/ice/files/patch-cpp-test-Ice-hold-AllTests.cpp
@Vikash: Unfortunately I can't reproduce the issue here. Is there any chance you could give me temporary access to an environment that allows me to investigate this?
(In reply to Michael Gmelin from comment #14) please provide me with an ssh key and the ip that you will be connecting from
@Vikash: Sent you a key and IP information via E-Mail.
@Vikash: Resent the email
A commit references this bug: Author: grembo Date: Tue Sep 22 15:03:00 UTC 2015 New revision: 397542 URL: https://svnweb.freebsd.org/changeset/ports/397542 Log: Fix unit test in case hostname is not on a local interface PR: 201743 Approved by: mentors (implicit) Changes: head/devel/ice/files/patch-cpp-test-Glacier2-staticFiltering-run.py
@Vikash: I was able to reproduce and fix the problem, thank you for providing access to your host, you may revoke it now. The issue was that this unit test is resolving the local hostname to determine if a more in-depth test should be done. Most poudriere builders use something local, so it won't try to execute the tests. In your case your package builder has a fqdn configured in poudriere.conf that equals the server hostname and resolved to an IP address not available in the build jail. When it tried to access that IP address in the package build jail (in which server processes only bind to 127.0.0.1) the build failed due to connection refused. Ironically the IP address is available in the jail you get dropped into after the build (-i), that's why it took a while to figure out what's going on. The fix was to add an extra check to see if the IP addresses determined by gethostbyname are actually configured on a local interface (by calling ifconfig), therefore it will still run as many tests as possible without breaking on setups like yours. Thanks again for your help.
@Michael can confirm all is well thank you very much