Bug 247036

Summary: net-mgmt/collectd5:network plugin: getnameinfo failed: Non-recoverable failure in name resolution
Product: Ports & Packages Reporter: Fabian Wenk <fabian>
Component: Individual Port(s)Assignee: freebsd-ports-bugs (Nobody) <ports-bugs>
Status: Closed Overcome By Events    
Severity: Affects Only Me CC: ports
Priority: --- Flags: ports: maintainer-feedback-
Version: Latest   
Hardware: Any   
OS: Any   

Description Fabian Wenk 2020-06-06 20:47:25 UTC
When collectd is also setup to listen for stats from other servers and clients send their data, then the following lines is logged and data are not collected any more:
network plugin: getnameinfo failed: Non-recoverable failure in name resolution

Same configuration works just fine with collectd5-5.10.0_1, but fails with collectd5-5.11.0 / collectd5-5.11.0_1.

I traced it down to this change in 5.11.0:
Network plugin: New metadata "network:ip_address" has been added. Thanks to Takuro Ashie. #3191 [1]

 [1] https://github.com/collectd/collectd/pull/3191

The error message does suggest that resolving does not work, but all the systems which send data to my servers have proper PTR entries in DNS and do resolve from IP to hostname and hostname to IP.

I did revert that change in src/network.c and collectd 5.11.0 is working again with collecting data from other servers.

See also bug report #3477 [2] upstream.

  [2] https://github.com/collectd/collectd/issues/3477
Comment 1 Krzysztof 2020-06-09 19:15:04 UTC
Thanks a lot for your problem report.

I don't know how to help you, because I'm using network plugin and there is no problems at all. I have configuration with IP (not dns names):

# server side
<Plugin network>
    <Listen "192.168.128.16" "25826">
        SecurityLevel Sign
        AuthFile "/usr/local/etc/collectd/passwd"
        Interface "re0"
    </Listen>
</Plugin>

# client side
<Plugin network>
    Server "192.168.128.16" "25826"
    <Server "192.168.128.16" "25826">
        SecurityLevel "sign"
        Username "******"
        Password "******"
    </Server>
    ReportStats true
    Forward false
    MaxPacketSize 1024
</Plugin>

Telling the truth I've got my own pkg repository and my collectd5 is built without DEBUG. So maybe that's why I do not have any error messages.

I'll try to change IP address to FQDN and I'll write if I have identical error.
Comment 2 Krzysztof 2020-06-09 19:36:36 UTC
Unfortunately it is working on my server. I 've changed configuration on c;ient side and there is no errors and statistics are collected.

So I have no idea what's wrong in your configration/situation :-(((
Comment 3 Fabian Wenk 2020-06-09 21:29:24 UTC
I have build and run it on FreeBSD 11.3, maybe this is relevant.
I did try as well only with IP addresses (IPv4 and IPv6 independent), but forgot to mention.
I also have Server and Listen configured on the same host, but Server points to an other system. I am using two systems to collect the data, see configs below.

I had / have activated "LOGGING - Enable debug logging" in make config, but I did just rebuild with it deactivated and also removed the patch revert, but still with the same errors.

Other build options I have set are this:
OPTIONS_FILE_SET+=CGI
OPTIONS_FILE_SET+=GCRYPT
OPTIONS_FILE_SET+=LOGGING
OPTIONS_FILE_SET+=CURL
OPTIONS_FILE_SET+=MYSQL
OPTIONS_FILE_SET+=PGSQL
OPTIONS_FILE_SET+=PING
OPTIONS_FILE_SET+=SNMP
OPTIONS_FILE_SET+=XML
OPTIONS_FILE_SET+=RRDTOOL

"network" config on server batman.home4u.ch (collectd.home4u.ch as service hostname):
<Plugin network>
        <Server "collectd.wenks.ch" "25826">
                SecurityLevel None
        </Server>
        <Listen "collectd.home4u.ch" "25826">
                SecurityLevel None
        </Listen>
        TimeToLive 128
        MaxPacketSize 1452
        Forward false
        ReportStats true
</Plugin>

And on the other server superman.wenks.ch (collectd.wenks.ch as service hostname):
<Plugin network>
        <Server "collectd.home4u.ch" "25826">
                SecurityLevel None
        </Server>
        <Listen "collectd.wenks.ch" "25826">
                SecurityLevel None
        </Listen>
        TimeToLive 128
        MaxPacketSize 1452
        Forward false
        ReportStats true
</Plugin>

And on one of the clients (Gentoo Linux still with collectd 5.10.0):
<Plugin network>
        TimeToLive 128
        MaxPacketSize 1452
        Forward false
        ReportStats true
        <Server "collectd.home4u.ch" "25826">
                SecurityLevel None
                BindAddress "2001:8a8:1005:1::8"
        </Server>
        <Server "collectd.wenks.ch" "25826">
                SecurityLevel None
                # using the same BindAddress as above did not work with collectd-5.10.0
                BindAddress "2001:8a8:1005:2::185"
        </Server>
</Plugin>

I did also try with the non-LOGGING build and stopped the clients which are still on 5.10.0, but still I get the errors.
Comment 4 Krzysztof 2020-06-13 05:20:47 UTC
OK.

So for me it seems that you have such errors beacuse you don't use authentication  (please see my configuration).

AFAIK this new fature was introduced as "replace" for previous authentication method.

Yes, I'm using FreeBSD 12.x and actually I don't have any 11.x server. SO for me it is dificult to test your envinronment.

I'll try to reconfigure my servers to not use gcrypt and I'll be back with information.
Comment 5 Krzysztof 2020-06-13 06:17:57 UTC
So I've made a very quick configuration change on my servers to turn off authentication. Also I've changed from private IP to dns names (on server side).

I've only received many errors from rrdcache but I was not able to reproduce your errors :-(((

I'll have to build a small lab and try your configuration.
Comment 6 Fabian Wenk 2020-07-05 15:25:19 UTC
(In reply to Krzysztof from comment #4)

Sorry for the delay in getting back. I did had authentication in place and I had the same errors, so I changed my configuration without authentication. But as we know, this did not help too.

Some other thing which I had forgotten to mention, but may be a problem. The host names used in server and listen (on both collectors) resolve to a IPv6 address, e.g. collectd.home4u.ch to 2001:8a8:1005:1::2. But in return 2001:8a8:1005:1::2 resolves to batman.home4u.ch. And then batman.home4u.ch does resolve to 2x IPv4 (62.12.173.2 and 62.2.85.178) and 2x IPv6 (2001:8a8:1005:1::2 and 2001:8a8:1005:2::178) addresses.

I will also update the ticket upstream.
Comment 7 Krzysztof 2020-09-10 06:05:55 UTC
(In reply to Fabian Wenk from comment #6)

So: did you find a problem with your configuration? I've just read upstream bug report and I found no more comments from you...

Maybe it is time to close this bug report?
Comment 8 Fabian Wenk 2020-09-12 14:11:54 UTC
(In reply to Krzysztof from comment #7)

As I understand this, it is not a configuration problem with my collectd setup. Could it be, that something goes wrong with the getnameinfo(3) as mention upstream?

As I said, my systems have 2 interfaces in two different networks, with both IPv4 and IPv6 addresses. All of them resolve to the same hostname, and of course the hostname resolves to all IPv4+IPv6 addresses.
Comment 9 Krzysztof 2020-09-23 15:26:59 UTC
I've just updated upstream ticket (on github). So we see what collectd team will answer
hint: https://github.com/freebsd/freebsd/blob/releng/11.3/lib/libc/net/getnameinfo.c
Comment 10 Krzysztof 2020-09-24 20:50:29 UTC
There is no answer on upstream bug (ticket). For me it seems that problem is related to FreeBSD implementation of getnameinfo at 11-RELENG.

As I wrote earlier I'm not able to reproduce this bug on my 12-RELENG boxes.

So I suggest to close this bug report
Comment 11 Krzysztof 2020-12-22 19:13:17 UTC
(In reply to Krzysztof from comment #10)
can we close this bug? As you can see there is no answer for 3 months. As I understand problem exists only with 11-RELENG. I was not able to reproduce it.
Comment 12 Fabian Wenk 2021-02-16 21:03:59 UTC
(In reply to Krzysztof from comment #11)

Upstream bug report is still open, and I have a workaround for me with the reverted patch. For me it is fine to close this bug and I will probably test again after upgrade to 12.x sometimes in the future.
Comment 13 Krzysztof 2022-11-20 21:29:01 UTC
Fabian, can you close this bug? As you wrote in your last comment?
Comment 14 Fabian Wenk 2022-11-20 22:36:51 UTC
(In reply to Krzysztof from comment #13)

Hello Krzysztof

Thanks for the reminder, I totally forgot this bug. I did upgrade my systems to FreeBSD 12.x. In the mean time I also do builds of packages on another system where my revert patch is not in place any more. And of course it is still working without problem. I will close this bug now.


Best regards,
Fabian