Bug 201743 - devel/ice 3.6.0 tests hang
Summary: devel/ice 3.6.0 tests hang
Status: Closed Unable to Reproduce
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Many People
Assignee: Michael Gmelin
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-21 14:31 UTC by Steven Hartland
Modified: 2015-12-27 00:10 UTC (History)
4 users (show)

See Also:
bugzilla: maintainer-feedback? (freebsd)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Steven Hartland freebsd_committer freebsd_triage 2015-07-21 14:31:21 UTC
When building devel/ice 3.6.0 on amd64 (10.1-RELEASE) via poudriere the build hangs at:

*** test started: 07/21/15 14:06:12
*** using Ice source dist (64bit) 
starting icegrid registry... ok
starting icegrid node... ok
starting glacier2... ok
testing login with username/password... ok
testing commands... ok
stopping glacier2... make: Working in: /usr/ports/devel/ice

Disabling the test is a workaround but clearly not ideal
Comment 1 Michael Gmelin freebsd_committer freebsd_triage 2015-07-24 12:43:23 UTC
Hi Steven,

As I couldn't reproduce the bug yet, could you maybe try running the unit test manually (e.g. by building with poudriere testport -I) to see if it fails every time?

After building you would go into 

  cd .../ice/work/ice-3.6.0/cpp/test/IceGrid/admin
  while true; do /usr/local/bin/python2.7 run.py; done

I hope that the issues are caused by some sort of resource shortage (there were a few more other builds that failed in other places). As I can't really reproduce these failures yet, I will probably have to improve Ice's unit tests to be a bit more vocal if something fails (in this case it seems like glacier gets stuck on shutdown for unknown reasons).

Unfortunately this could be a lot of work and I don't really have time to address this in the next couple of days, so I'm tempted to disable unit tests for the time being, so that packages are built. If it turns out, that the underlying issue is severe this is be a bad idea though.
Comment 2 Michael Gmelin freebsd_committer freebsd_triage 2015-07-24 12:46:04 UTC
Added Roger in case he's interested
Comment 3 Roger Leigh 2015-07-24 16:05:37 UTC
I'll see if I can reproduce and let you know if I can.
Comment 4 Roger Leigh 2015-07-24 22:50:06 UTC
Unable to reproduce on bare metal in a 10.1-RELEASE jail.  I'll try in various other VMs to see if I can make it happen.
Comment 5 Roger Leigh 2015-07-24 23:41:40 UTC
No success reproducing in a 1 CPU VM either, I'm afraid.
Comment 6 Michael Gmelin freebsd_committer freebsd_triage 2015-07-25 13:34:35 UTC
There are two more tests that break or hang on package builders occasionally, those are:

cpp/test/Ice/hold (most of the time)
cpp/test/Glacier2/override (sometimes)

In both cases the failing step operates on oneway proxies (might have issues with packet loss). In case of test 85 it also involves callbacks and sleeps, so it might be timing sensitive (but I couldn't reproduce it even when changing those sleep statements). Test 85 also involves a calls to hold and activate, which it has in common with the Ice/hold test.

I can't reproduce any of these issues though, so my guess is that it's some sort of resource exhaustion and/or networking issue (maybe ports are used by some other process, it's using ports around 12000 for all tests and afaik poudriere uses the same IPs as other processes/buildjails).

*** running tests 33/93 in /wrkdirs/usr/ports/devel/ice/work/ice-3.6.0/cpp/test/Ice/hold
*** configuration: Default 
*** test started: 07/14/15 02:04:21
*** using Ice source dist (64bit) 
starting server... ok
starting client... ok
testing stringToProxy... ok
testing checked cast... ok
changing state between active and hold rapidly... ok
testing without serialize mode... ok
testing with serialize mode... -! 07/14/15 02:05:21.741 /wrkdirs/usr/ports/devel/ice/work/ice-3.6.0/cpp/test/Ice/hold/client: warning: connection exception:
   ConnectionI.cpp:2048: Ice::TimeoutException:
   timeout while sending or receiving data
   local address = 127.0.0.1:49519
   remote address = 127.0.0.1:12011
failed!
AllTests.cpp:169: assertion `cond->value()' failed
unexpected exit status: expected: 0, got -6
-! 07/14/15 02:05:21.880 /wrkdirs/usr/ports/devel/ice/work/ice-3.6.0/cpp/test/Ice/hold/server: warning: connection exception:
   StreamSocket.cpp:202: Ice::ConnectionLostException:
   connection lost: recv() returned zero
   local address = 127.0.0.1:12011
   remote address = 127.0.0.1:49872
-! 07/14/15 02:05:21.880 /wrkdirs/usr/ports/devel/ice/work/ice-3.6.0/cpp/test/Ice/hold/server: warning: connection exception:
   StreamSocket.cpp:202: Ice::ConnectionLostException:
   connection lost: recv() returned zero
   local address = 127.0.0.1:12010
   remote address = 127.0.0.1:49518
('test in /wrkdirs/usr/ports/devel/ice/work/ice-3.6.0/cpp/test/Ice/hold failed with exit status', 1)
*** Error code 1

Stop.
make: stopped in /usr/ports/devel/ice

--------------------

*** running tests 85/93 in /wrkdirs/usr/ports/devel/ice/work/ice-3.6.0/cpp/test/Glacier2/override
*** configuration: Default 
*** test started: 07/14/15 07:40:04
*** using Ice source dist (64bit) 
starting router in buffered mode... ok
starting server... ok
starting client... ok
testing client request override... ok
testing server request override... ====>> Killing runaway build after 7200 seconds with no output
Comment 7 commit-hook freebsd_committer freebsd_triage 2015-08-02 13:48:06 UTC
A commit references this bug:

Author: grembo
Date: Sun Aug  2 13:47:43 UTC 2015
New revision: 393423
URL: https://svnweb.freebsd.org/changeset/ports/393423

Log:
  Add debug output in an attempt to figure out why certain tests fail on
  package builders.

  PR:		201743
  Approved by:	mentors (implicit)

Changes:
  head/devel/ice/files/patch-cpp-test-Glacier2-override-Client.cpp
  head/devel/ice/files/patch-cpp-test-Ice-hold-AllTests.cpp
Comment 8 Vikash Badal 2015-09-02 07:56:11 UTC
failing here consistently using poudriere
10.2-RELEASE-p2 amd64

*** running tests 89/93 in /wrkdirs/usr/ports/devel/ice/work/ice-3.6.0/cpp/test/Glacier2/staticFiltering
*** configuration: Default 
*** test started: 09/02/15 07:48:00
*** using Ice source dist (64bit) 
testing category filter... ok
testing adapter id filter... helloF1 @ "an adapter with spaces"
Outgoing.cpp:535: Ice::UnknownLocalExce-! 09/02/15 07:48:01.739 /wrkdirs/usr/ports/devel/ice/work/ice-3.6.0/cpp/bin/glacier2router: warning: dispatch exception: Network.cpp:2369: Ice::ConnectionRefusedException:
ption:
unknown loc   connection refused: Connection refused
al exception   identity: helloF1
:
Netw   facet: 
ork.cpp:2369   operation: ice_ping
: Ice::ConnectionRefused   remote host: 127.0.0.1 remote port: 33301
Exception:
connection refused: Connection refused
failed!
Client.cpp:122: assertion `"Unexpected local exception" == 0' failed
unexpected exit status: expected: 0, got -6
('test in /wrkdirs/usr/ports/devel/ice/work/ice-3.6.0/cpp/test/Glacier2/staticFiltering failed with exit status', 1)
*** Error code 1

Stop.
make: stopped in /usr/ports/devel/ice
Comment 9 Michael Gmelin freebsd_committer freebsd_triage 2015-09-06 11:19:35 UTC
@Vikash: Could you please provide the complete poudriere build log and maybe add some more details about the system you're building on. Especially if it's bare metal or VM, how many CPU, network configuration etc. I would like to reproduce the problem.
Comment 10 commit-hook freebsd_committer freebsd_triage 2015-09-06 13:59:34 UTC
A commit references this bug:

Author: grembo
Date: Sun Sep  6 13:59:03 UTC 2015
New revision: 396218
URL: https://svnweb.freebsd.org/changeset/ports/396218

Log:
  Add timing information to debug output to understand if
  timeouts on package builders are real.

  PR:		201743
  Approved by:	mentors (implicit)

Changes:
  head/devel/ice/files/patch-cpp-test-Glacier2-override-Client.cpp
  head/devel/ice/files/patch-cpp-test-Ice-hold-AllTests.cpp
Comment 11 Vikash Badal 2015-09-07 08:11:00 UTC
(In reply to Michael Gmelin from comment #9)

logs and dmesg located:

http://enrage.where-ever.za.net/~vikashb/FreeBSD/BUG_201743/


Host is a VM in OpenStack kilo

64GB Ram
8 CPU

network : vtnet
disk:   : vtbd[0-2]
Comment 12 Vikash Badal 2015-09-07 08:12:51 UTC
(In reply to Vikash Badal from comment #11)


poudriere-3.1.7
Comment 13 commit-hook freebsd_committer freebsd_triage 2015-09-08 10:17:01 UTC
A commit references this bug:

Author: grembo
Date: Tue Sep  8 10:16:02 UTC 2015
New revision: 396360
URL: https://svnweb.freebsd.org/changeset/ports/396360

Log:
  Remove C++11 specific construct.

  PR:		201743
  Approved by:	mentors (implicit)

Changes:
  head/devel/ice/files/patch-cpp-test-Glacier2-override-Client.cpp
  head/devel/ice/files/patch-cpp-test-Ice-hold-AllTests.cpp
Comment 14 Michael Gmelin freebsd_committer freebsd_triage 2015-09-16 23:11:21 UTC
@Vikash: Unfortunately I can't reproduce the issue here.

Is there any chance you could give me temporary access to an environment that allows me to investigate this?
Comment 15 Vikash Badal 2015-09-17 08:24:48 UTC
(In reply to Michael Gmelin from comment #14)

please provide me with an ssh key and the ip that you will be connecting from
Comment 16 Michael Gmelin freebsd_committer freebsd_triage 2015-09-18 16:48:28 UTC
@Vikash: Sent you a key and IP information via E-Mail.
Comment 17 Michael Gmelin freebsd_committer freebsd_triage 2015-09-21 11:41:12 UTC
@Vikash: Resent the email
Comment 18 commit-hook freebsd_committer freebsd_triage 2015-09-22 15:03:48 UTC
A commit references this bug:

Author: grembo
Date: Tue Sep 22 15:03:00 UTC 2015
New revision: 397542
URL: https://svnweb.freebsd.org/changeset/ports/397542

Log:
  Fix unit test in case hostname is not on a local interface

  PR:		201743
  Approved by:	mentors (implicit)

Changes:
  head/devel/ice/files/patch-cpp-test-Glacier2-staticFiltering-run.py
Comment 19 Michael Gmelin freebsd_committer freebsd_triage 2015-09-22 15:11:51 UTC
@Vikash: I was able to reproduce and fix the problem, thank you for providing access to your host, you may revoke it now.

The issue was that this unit test is resolving the local hostname to determine if a more in-depth test should be done. Most poudriere builders use something local, so it won't try to execute the tests. In your case your package builder has a fqdn configured in poudriere.conf that equals the server hostname and resolved to an IP address not available in the build jail.

When it tried to access that IP address in the package build jail (in which server processes only bind to 127.0.0.1) the build failed due to connection refused. Ironically the IP address is available in the jail you get dropped into after the build (-i), that's why it took a while to figure out what's going on.

The fix was to add an extra check to see if the IP addresses determined by gethostbyname are actually configured on a local interface (by calling ifconfig), therefore it will still run as many tests as possible without breaking on setups like yours.

Thanks again for your help.
Comment 20 Vikash Badal 2015-09-23 07:26:06 UTC
@Michael 

can confirm all is well

thank you very much