|Summary:||net-mgmt/collectd5:network plugin: getnameinfo failed: Non-recoverable failure in name resolution|
|Product:||Ports & Packages||Reporter:||Fabian Wenk <fabian>|
|Component:||Individual Port(s)||Assignee:||freebsd-ports-bugs (Nobody) <ports-bugs>|
|Severity:||Affects Only Me||CC:||ports|
Description Fabian Wenk 2020-06-06 20:47:25 UTC
When collectd is also setup to listen for stats from other servers and clients send their data, then the following lines is logged and data are not collected any more: network plugin: getnameinfo failed: Non-recoverable failure in name resolution Same configuration works just fine with collectd5-5.10.0_1, but fails with collectd5-5.11.0 / collectd5-5.11.0_1. I traced it down to this change in 5.11.0: Network plugin: New metadata "network:ip_address" has been added. Thanks to Takuro Ashie. #3191   https://github.com/collectd/collectd/pull/3191 The error message does suggest that resolving does not work, but all the systems which send data to my servers have proper PTR entries in DNS and do resolve from IP to hostname and hostname to IP. I did revert that change in src/network.c and collectd 5.11.0 is working again with collecting data from other servers. See also bug report #3477  upstream.  https://github.com/collectd/collectd/issues/3477
Comment 1 Krzysztof 2020-06-09 19:15:04 UTC
Thanks a lot for your problem report. I don't know how to help you, because I'm using network plugin and there is no problems at all. I have configuration with IP (not dns names): # server side <Plugin network> <Listen "192.168.128.16" "25826"> SecurityLevel Sign AuthFile "/usr/local/etc/collectd/passwd" Interface "re0" </Listen> </Plugin> # client side <Plugin network> Server "192.168.128.16" "25826" <Server "192.168.128.16" "25826"> SecurityLevel "sign" Username "******" Password "******" </Server> ReportStats true Forward false MaxPacketSize 1024 </Plugin> Telling the truth I've got my own pkg repository and my collectd5 is built without DEBUG. So maybe that's why I do not have any error messages. I'll try to change IP address to FQDN and I'll write if I have identical error.
Comment 2 Krzysztof 2020-06-09 19:36:36 UTC
Unfortunately it is working on my server. I 've changed configuration on c;ient side and there is no errors and statistics are collected. So I have no idea what's wrong in your configration/situation :-(((
Comment 3 Fabian Wenk 2020-06-09 21:29:24 UTC
I have build and run it on FreeBSD 11.3, maybe this is relevant. I did try as well only with IP addresses (IPv4 and IPv6 independent), but forgot to mention. I also have Server and Listen configured on the same host, but Server points to an other system. I am using two systems to collect the data, see configs below. I had / have activated "LOGGING - Enable debug logging" in make config, but I did just rebuild with it deactivated and also removed the patch revert, but still with the same errors. Other build options I have set are this: OPTIONS_FILE_SET+=CGI OPTIONS_FILE_SET+=GCRYPT OPTIONS_FILE_SET+=LOGGING OPTIONS_FILE_SET+=CURL OPTIONS_FILE_SET+=MYSQL OPTIONS_FILE_SET+=PGSQL OPTIONS_FILE_SET+=PING OPTIONS_FILE_SET+=SNMP OPTIONS_FILE_SET+=XML OPTIONS_FILE_SET+=RRDTOOL "network" config on server batman.home4u.ch (collectd.home4u.ch as service hostname): <Plugin network> <Server "collectd.wenks.ch" "25826"> SecurityLevel None </Server> <Listen "collectd.home4u.ch" "25826"> SecurityLevel None </Listen> TimeToLive 128 MaxPacketSize 1452 Forward false ReportStats true </Plugin> And on the other server superman.wenks.ch (collectd.wenks.ch as service hostname): <Plugin network> <Server "collectd.home4u.ch" "25826"> SecurityLevel None </Server> <Listen "collectd.wenks.ch" "25826"> SecurityLevel None </Listen> TimeToLive 128 MaxPacketSize 1452 Forward false ReportStats true </Plugin> And on one of the clients (Gentoo Linux still with collectd 5.10.0): <Plugin network> TimeToLive 128 MaxPacketSize 1452 Forward false ReportStats true <Server "collectd.home4u.ch" "25826"> SecurityLevel None BindAddress "2001:8a8:1005:1::8" </Server> <Server "collectd.wenks.ch" "25826"> SecurityLevel None # using the same BindAddress as above did not work with collectd-5.10.0 BindAddress "2001:8a8:1005:2::185" </Server> </Plugin> I did also try with the non-LOGGING build and stopped the clients which are still on 5.10.0, but still I get the errors.
Comment 4 Krzysztof 2020-06-13 05:20:47 UTC
OK. So for me it seems that you have such errors beacuse you don't use authentication (please see my configuration). AFAIK this new fature was introduced as "replace" for previous authentication method. Yes, I'm using FreeBSD 12.x and actually I don't have any 11.x server. SO for me it is dificult to test your envinronment. I'll try to reconfigure my servers to not use gcrypt and I'll be back with information.
Comment 5 Krzysztof 2020-06-13 06:17:57 UTC
So I've made a very quick configuration change on my servers to turn off authentication. Also I've changed from private IP to dns names (on server side). I've only received many errors from rrdcache but I was not able to reproduce your errors :-((( I'll have to build a small lab and try your configuration.
Comment 6 Fabian Wenk 2020-07-05 15:25:19 UTC
(In reply to Krzysztof from comment #4) Sorry for the delay in getting back. I did had authentication in place and I had the same errors, so I changed my configuration without authentication. But as we know, this did not help too. Some other thing which I had forgotten to mention, but may be a problem. The host names used in server and listen (on both collectors) resolve to a IPv6 address, e.g. collectd.home4u.ch to 2001:8a8:1005:1::2. But in return 2001:8a8:1005:1::2 resolves to batman.home4u.ch. And then batman.home4u.ch does resolve to 2x IPv4 (22.214.171.124 and 126.96.36.199) and 2x IPv6 (2001:8a8:1005:1::2 and 2001:8a8:1005:2::178) addresses. I will also update the ticket upstream.
Comment 7 Krzysztof 2020-09-10 06:05:55 UTC
(In reply to Fabian Wenk from comment #6) So: did you find a problem with your configuration? I've just read upstream bug report and I found no more comments from you... Maybe it is time to close this bug report?
Comment 8 Fabian Wenk 2020-09-12 14:11:54 UTC
(In reply to Krzysztof from comment #7) As I understand this, it is not a configuration problem with my collectd setup. Could it be, that something goes wrong with the getnameinfo(3) as mention upstream? As I said, my systems have 2 interfaces in two different networks, with both IPv4 and IPv6 addresses. All of them resolve to the same hostname, and of course the hostname resolves to all IPv4+IPv6 addresses.
Comment 9 Krzysztof 2020-09-23 15:26:59 UTC
I've just updated upstream ticket (on github). So we see what collectd team will answer hint: https://github.com/freebsd/freebsd/blob/releng/11.3/lib/libc/net/getnameinfo.c
Comment 10 Krzysztof 2020-09-24 20:50:29 UTC
There is no answer on upstream bug (ticket). For me it seems that problem is related to FreeBSD implementation of getnameinfo at 11-RELENG. As I wrote earlier I'm not able to reproduce this bug on my 12-RELENG boxes. So I suggest to close this bug report
Comment 11 Krzysztof 2020-12-22 19:13:17 UTC
(In reply to Krzysztof from comment #10) can we close this bug? As you can see there is no answer for 3 months. As I understand problem exists only with 11-RELENG. I was not able to reproduce it.