Summary: | net/dante fails to build on i386 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Ports & Packages | Reporter: | Carlos J Puga Medina <cjpm> | ||||||
Component: | Individual Port(s) | Assignee: | Kurt Jaeger <pi> | ||||||
Status: | Closed FIXED | ||||||||
Severity: | Affects Only Me | CC: | anders, mp39590, peter, pi, r00t, thomas.scholten, yoshin-t | ||||||
Priority: | --- | ||||||||
Version: | Latest | ||||||||
Hardware: | Any | ||||||||
OS: | Any | ||||||||
Attachments: |
|
Description
Carlos J Puga Medina
2014-07-31 19:06:19 UTC
I have to assume you are talking about net/dante 1.4.0 committed today. Under that assumption, I'm assigning this to wg@ who committed it. I'm also notifying mp39590 who submitted bug 191232 that updated it. (In reply to John Marino from comment #1) > I have to assume you are talking about net/dante 1.4.0 committed today. > Under that assumption, I'm assigning this to wg@ who committed it. > > I'm also notifying mp39590 who submitted bug 191232 that updated it. Right, it fails with version 1.4.0 Digging more I discovered that my wireless device repeatedly throws the following message: urtwn0: could not create RX mbuf After increase the values of kern.ipc.nmbclusters to 200000 and kern.ipc.nmbufs to 280000 I finally installed sucessfully dante. Note that I have had to increase the kern.ipc.nmbufs value twice before pass the configure checks. wg@ (CC'd), Carlos wrote me with redport results, this definitely doesn't build on i386: https://redports.org/buildarchive/20140801172516-99642/ You assigned this back to the heap without even saying anything. You might want to at least mark it broken on i386. It's not in a good state after your commit. Actually, Carlos, you tell me. Is it broken on i386 or what? Having to put in non-standard sysctl settings doesn't sound normal to me, maybe at least it needs to be IGNORED by default (overrideable) ? What's your recommendation? Hi John, Yes, it's weird needing to tweak sysctls to make it compile. Thats the reason why I asked for help. I think we should mark as BROKEN for i386 meanwhile I get a reply from dante's guys. A commit references this bug: Author: marino Date: Sat Aug 2 15:33:21 UTC 2014 New revision: 363820 URL: http://svnweb.freebsd.org/changeset/ports/363820 Log: Mark net/dante BROKEN on i386 Dante gets stuck building on i386. Redports did not complete on any of the four platforms. Marking this broken, and upstream is being notified. PR: 192295 Changes: head/net/dante/Makefile I would say if you don't hear anything in 2 weeks from upstream or the net/dante maintainer, then just close the PR. I don't think there's anything else to do for now. (In reply to John Marino from comment #9) > I would say if you don't hear anything in 2 weeks from upstream or the > net/dante maintainer, then just close the PR. > > I don't think there's anything else to do for now. Ok, thanks John :) (In reply to John Marino from comment #5) > wg@ (CC'd), Carlos wrote me with redport results, this definitely doesn't > build on i386: > > https://redports.org/buildarchive/20140801172516-99642/ > > You assigned this back to the heap without even saying anything. You might > want to at least mark it broken on i386. It's not in a good state after > your commit. Where I can find buildlogs from redports for i386? I've checked building on my VirtualBox with i386 10.0-RELEASE and it's successful. (In reply to mp39590 from comment #11) > (In reply to John Marino from comment #5) > > wg@ (CC'd), Carlos wrote me with redport results, this definitely doesn't > > build on i386: > > > > https://redports.org/buildarchive/20140801172516-99642/ > > > > You assigned this back to the heap without even saying anything. You might > > want to at least mark it broken on i386. It's not in a good state after > > your commit. > > Where I can find buildlogs from redports for i386? I've checked building on > my VirtualBox with i386 10.0-RELEASE and it's successful. I have canceled all build process related to i386 because they were stuck. Here are the logs: https://redports.org/buildarchive/20140801172516-99642/ (In reply to Carlos Jacobo Puga Medina from comment #12) > > Where I can find buildlogs from redports for i386? I've checked building on > > my VirtualBox with i386 10.0-RELEASE and it's successful. > > I have canceled all build process related to i386 because they were stuck. > Here are the logs: > > https://redports.org/buildarchive/20140801172516-99642/ I can't see logs for i386 there, only for amd64. I'm interested where i386 10-RELEASE stuck, because on my VirtualBox (10/i386) it compiles without any errors or freezes. You said that it compiles fine on 10.0-RELEASE/i386. Can you show me the values of the following sysctls: kern.ipc.nmbclusters kern.ipc.nmbufs I want to compare both values to make a better diagnose and discard options. I could install dante after increase temporarily both values to build it, but that no make sense because I'm using the default values for a desktop use and I never had a problem with them. Regarding to the build logs I stopped them. kern.ipc.nmbclusters: 10294 kern.ipc.nmbufs: 65895 FreeBSD vb2 10.0-RELEASE FreeBSD 10.0-RELEASE #0 r260789: Fri Jan 17 01:46:25 UTC 2014 root@snap.freebsd.org:/usr/obj/usr/src/sys/GENERIC i386 It's absolutely new 10.0 RELEASE without any tweaks. Mine are these: kern.ipc.nmbclusters: 26368 kern.ipc.nmbufs: 168765 but dante complains with the following messages: [zone: mbuf_cluster] kern.ipc.nmbclusters limit reached and/or [zone: mbuf_cluster] kern.ipc.nmbufs limit reached It gets stuck while 'checking read-side pipe system...' I will send you my config.log so you can take a look. I received a reply from dante's devs so I copy it here for your general knowledge: ======================================================================= Hello, Thank you for the bug report. There is a configure bug in Dante 1.4.0 that affects some FreeBSD versions and it might be it that causes the problem that you are seeing. We hope to have this fixed in the next version of Dante (1.4.1), and a snapshot with a possible solution can be found at this location: ftp://ftp.inet.no/pub/socks/private/dante-20140802.tar.gz Please let us know if this fixes the problem that you are seeing if you are able to test it. Note that this snapshot contains code that has not been fully tested yet and should not be used in production environments. With kind regards, Karl-Andre' Skevik Inferno Nettverk A/S ======================================================================= I investigated further the configure script and I found that the hack to determine the pipe buffer type is the culprit. Commenting out this little hack is able to build fine. Maybe we can unset this check or elaborate a patch to pass it. FWIW, this is all platforms, not just i386. The test program has the same effect even as a non-privileged user on 10.0+ - 9.x isn't affected. FWIW, the culprit appears in the output of netstat -f unix - one unix domain socket will have a massive Recv-Q that has permanently consumed all mbufs. Created attachment 145320 [details]
test case #1
Yes, the problem that write() call do not return ENOBUFS and just infinitely cycles inside the loop. I attach simple program to reproduce, it's stripped down version of conftest.c from dante's configure script which causes this.
(In reply to Carlos Jacobo Puga Medina from comment #18) > I investigated further the configure script and I found that the hack to > determine the pipe buffer type is the culprit. Commenting out this little > hack is able to build fine. > > Maybe we can unset this check or elaborate a patch to pass it. My build log: http://pastebin.com/Qcn52aEu Redports build logs: https://redports.org/buildarchive/20140803200200-4555/ (In reply to Peter Wemm from comment #20) > FWIW, the culprit appears in the output of netstat -f unix - one unix domain > socket will have a massive Recv-Q that has permanently consumed all mbufs. Peter, can you show the 'netstat -f unix' output? root@redbuild04.nyi:~ # netstat -f unix -n Active UNIX domain sockets Address Type Recv-Q Send-Q Inode Conn Refs Nextref Addr fffff80066587a50 dgram 1671128432 0 0 fffff805800b1c30 fffff805800b1c30 0 root@redbuild04.nyi:~ # fstat | grep fffff80066587a50 root conftest 11909 4* local dgram fffff805800b1c30 <-> fffff80066587a50 root conftest 11909 8* local dgram fffff80066587a50 <-> fffff805800b1c30 There's 16GB of mbufs consumed in the Recv-Q. After reverting this commit[1] I can successfully build dante on amd64-current and run test case program. [1] - http://svnweb.freebsd.org/base?view=revision&revision=262867 It seems there's multiple things going on here. 1) the conftest program in net/dante is actually broken. It is using setsockopt() with the sizes as size_t (64 bit on amd64) when they are socklen_t (32 bit). This is causing some interesting diagnostic output when conftest tries to set insane sizes. 2) there was a kernel bug introduced with 262867 (and MFC'ed to 10-stable) that ignored the socket buffer limits entirely. I've partially backed out the apparently offending line in 262867 (which I think was an edit / patch error) and the test program no longer breaks the machine. It does generate all the expected noise due to the errors in #1, but the machines are no longer harmed. The test program makes the same noise due to #1 on a 9.x amd64 machine. This was not in 10.0-RELEASE, only in 10-stable and head. I will try to build it again on -CURRENT after r269489 was committed. To work this out till the logical ending: how we should deal with this port? Leave forbidden for anything >10 - I don't know a way for checking a revision number inside Makefile (suggestions are welcome). Dante dev's said that they've fixed #1 (from Peter's comment) and it will be in 1.4.1. Actually, as they say, even in 1.4.0, when it was buggy it won't cause any troubles besides strange output. Also, we can just wait for proper 1.4.1, this will give time for build machines to upgrade, and when it will out - remove 'broken/forbidden' altogether with a port version update. I confirm that r269489 fixes the problem. Build log: http://pastebin.com/xzq7RJd5 Yes, we can wait to the bugfix release 1.4.1 before commit any new change. New version of Dante has been released http://www.inet.no/dante/announce-1.4.1 Created attachment 146768 [details] dante-1.4.1.diff - Fix LICENSE - Use INSTALL_TARGET=install-strip Build logs via redports: https://redports.org/buildarchive/20140904080301-22095/ Hello, Could somebody please commit https://bugs.freebsd.org/bugzilla/attachment.cgi?id=146768&action=diff and remove forbidden flag? 1.4.1 builds/works fine on 10+ (amd64). Thanks. A commit references this bug: Author: pi Date: Fri Nov 7 18:47:38 UTC 2014 New revision: 372290 URL: https://svnweb.freebsd.org/changeset/ports/372290 Log: net/dante: 1.4.0 -> 1.4.1 ChangeLog: http://www.inet.no/dante/announce-1.4.1 PR: 192295 Submitted by: Carlos Jacobo Puga Medina <cpm@fbsd.es> Changes: head/net/dante/Makefile head/net/dante/distinfo head/net/dante/pkg-plist Tested with poudriere on 10.0-amd64, 10.0-i386, 9.1-amd64, 8.4-i386. Committed, thanks for the excellent analysis and fix! With dante 1.4.1, I saw similar problem on % sysctl kern.version kern.version: FreeBSD 9.3-RELEASE-p5 #0: Mon Nov 3 22:38:58 UTC 2014 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC It seems r269489 fix is not applied for releng/9.3 ??? configure remains here > checking read/send-side pipe system... I could not kill conftest process. # netstat -m 15749840/9790/15759630 mbufs in use (current/cache/total) 7891285/9265/7900550/8134718 mbuf clusters in use (current/cache/total/max) 7891285/5291 mbuf+clusters out of packet secondary zone in use (current/cache) 0/222/222/4067358 4k (page size) jumbo clusters in use (current/cache/total/max) 8191/136/8327/1205143 9k jumbo clusters in use (current/cache/total/max) 0/0/0/677893 16k jumbo clusters in use (current/cache/total/max) 19793749K/23089K/19816838K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines Even worse, there was a message igb4: Could not setup receive structure and network stopped working. This required reboot. |