I met this bug several times for the last years,today I can reproduce it on a machine.I can always restart nscd or use nscd -i hosts to clear the nscd cache,but I think it is better to fix this bug. config: cat /etc/nsswitch.conf group: compat group_compat: nis hosts: cache files dns networks: files passwd: compat passwd_compat: nis shells: files services: compat services_compat: nis protocols: files rpc: files cat /etc/nscd.conf threads 1 enable-cache passwd yes enable-cache group yes enable-cache hosts yes enable-cache services yes enable-cache protocols yes enable-cache rpc yes enable-cache networks yes positive-time-to-live hosts 30 negative-time-to-live hosts 1 on the first console : >pkg install bash postgresql95-client >bash >while true; do psql -p 80 -h www.google.com; sleep 50; done you will see psql: received invalid response to SSL... on the second console: edit the /etc/resolv.conf,comment all the nameserver using "#' now on the first console,you will see: psql: could not translate host name "www.google.com" to address: hostname nor servname provided,or not known psql: could not translate host name "www.google.com" to address: Non -recoverable failure in name resolution ...............repeate the last message......... you can ping www.google.com,but psql can not resolve the error.I read the psql code,and find it use getaddrinfo to resolve the addr. now even if you un-comment the servername in /etc/resolv.conf ,the "Non -recoverable failure in name resolution" still there,unless you restart nscd,or nscd -i hosts for that user. uname -a FreeBSD xx 10.1-RELEASE-p26 FreeBSD 10.1-RELEASE-p26 #0: Wed Jan 13 20:59:29 UTC 2016 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
(In reply to Jov from comment #0) you can ping www.google.com after you recover the /etc/resolv.conf content,but psql can not resolve the error.I read the psql code,and find it use getaddrinfo to resolve the addr. this bug happened for my home router several times and one vps a a time. I have coredump for nscd,and I can gdb for the psql or nscd process for more info if some one need.
Created attachment 167901 [details] fix nscd negtive ttl bug I find PR 181586 and try the patch,it fix the bug. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=181586 the origin patch is trimed,and I rework the patch.
I'm not able to reproduce this problem. slippy$ while :; do psql -p 80 -h www.google.com; sleep 50; done psql: received invalid response to SSL negotiation: H I commented out all nameserver statements in resolv.conf psql: could not translate host name "www.google.com" to address: hostname nor servname provided, or not known psql: could not translate host name "www.google.com" to address: hostname nor servname provided, or not known I restored all nameserver statements in resolv.conf psql: received invalid response to SSL negotiation: H psql: received invalid response to SSL negotiation: H At no time was nscd restarted or cache flushed. The test was performed on; FreeBSD slippy 12.0-CURRENT FreeBSD 12.0-CURRENT #41 r318400M: Wed May 17 06:08:52 PDT 2017 root@slippy:/export/obj/opt/src/svn-current/sys/BREAK amd64
(In reply to Cy Schubert from comment #3) I can reproduce this bug on REL-11.0p8 on tty1: while true; do nc -z -w 3 www.google.com 80; sleep 5; done Connection to www.google.com 80 port [tcp/http] succeeded! Connection to www.google.com 80 port [tcp/http] succeeded! Connection to www.google.com 80 port [tcp/http] succeeded! Connection to www.google.com 80 port [tcp/http] succeeded! nc: getaddrinfo: hostname nor servname provided, or not known---> after [act1] nc: getaddrinfo: Non-recoverable failure in name resolution nc: getaddrinfo: Non-recoverable failure in name resolution nc: getaddrinfo: Non-recoverable failure in name resolution nc: getaddrinfo: Non-recoverable failure in name resolution ----> after [act2] nc: getaddrinfo: Non-recoverable failure in name resolution ----> should not happen because the negtive entry should timeout.the patch fix this. nc: getaddrinfo: Non-recoverable failure in name resolution nc: getaddrinfo: Non-recoverable failure in name resolution Connection to www.google.com 80 port [tcp/http] succeeded! ----> after [act3],restart nscd to force clean the negative cache. Connection to www.google.com 80 port [tcp/http] succeeded! on tty2: [act1]: remove the content of /etc/resolev.conf [act2]: restore the content of /etc/resolve.conf [act3]: restart nscd plseae make sure the config is take effect,because the nsswitch do not use cache by default ,and the default TTL for positive and negtive are too long: /etc/nsswitch.conf has: hosts: cache files dns and /etc/nscd.conf (should restart nscd after change)has: positive-time-to-live hosts 3 negative-time-to-live hosts 1
I can also reproduce this bug with 12-current nscd from FreeBSD-12.0-CURRENT-amd64-20170510-r318137-mini-memstick.img.I replace the REL 11.0 nscd binary with the nscd from 12-CURRENT img and do the same steps.
Do I understand you correctly? Did you run the 12.0 binary on a 11.0 system?
(In reply to Cy Schubert from comment #6) Yes,my last post(comment #4) run 12 binary on 11.
That's wrong. A binary that works on a 12 system but not on an 11 system proves the binary is not at fault. The fault lies in a library patch that has not been MFCed from HEAD to stable/11.
(In reply to Cy Schubert from comment #8) hi, I can also reproduce this bug on 12-current with the same method.I setup a 12-current using FreeBSD-12.0-CURRENT-amd64-20170510-r318137-disc1.iso uname -a FreeBSD fb12 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r318137: Wed May 10 15:09:31 UTC 2017 root@releng3.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
I'll test again, on a different machine.
I've tested on my testbed this time (laptop last time). The bug has been confirmed. I'll commit it after some additional tests I usually do before committing. Thank you for finding this bug and for the patch.
A commit references this bug: Author: cy Date: Sat May 20 16:58:49 UTC 2017 New revision: 318578 URL: https://svnweb.freebsd.org/changeset/base/318578 Log: Fix non-recoverable name resolution failures due to negative cache entries never expiring. This patch honours the negative cache timeout. To test/experience the failure do the following: 1. Edit /etc/ncd.conf to adjust the cache timeouts as follows: positive-time-to-live hosts 30 negative-time-to-live hosts 1 2. Ensure that nsswitch.conf hosts line contains something like: hosts: files cache dns Note that cache must be specified before dns. 3. Start nscd. 4. Run the following command: while true; do nc -z -w 3 www.google.com 80; sleep 5; done 5. While running the command, remove or comment out all nameserver statements in /etc/resolv.conf. After a short while you will notice non-recoverable name rsolution failures. 6. Uncomment or replace all nameserver statements back into /etc/resolv.conf. Take note that name resolution never recovers. To recover nscd must be restarted. This patch fixes this. PR: 207804 Submitted by: Jov <amutu@amutu.com> MFC after: 1 week Changes: head/usr.sbin/nscd/query.c
Thank you for the patch. I will MFC in a week.
A commit references this bug: Author: cy Date: Tue May 30 03:28:00 UTC 2017 New revision: 319177 URL: https://svnweb.freebsd.org/changeset/base/319177 Log: MFC r318578: Fix non-recoverable name resolution failures due to negative cache entries never expiring. This patch honours the negative cache timeout. To test/experience the failure do the following: 1. Edit /etc/ncd.conf to adjust the cache timeouts as follows: positive-time-to-live hosts 30 negative-time-to-live hosts 1 2. Ensure that nsswitch.conf hosts line contains something like: hosts: files cache dns Note that cache must be specified before dns. 3. Start nscd. 4. Run the following command: while true; do nc -z -w 3 www.google.com 80; sleep 5; done 5. While running the command, remove or comment out all nameserver statements in /etc/resolv.conf. After a short while you will notice non-recoverable name rsolution failures. 6. Uncomment or replace all nameserver statements back into /etc/resolv.conf. Take note that name resolution never recovers. To recover nscd must be restarted. This patch fixes this. PR: 207804 Submitted by: Jov <amutu@amutu.com> Changes: _U stable/10/ stable/10/usr.sbin/nscd/query.c _U stable/11/ stable/11/usr.sbin/nscd/query.c
MFCed. Thanks for the patch.
*** Bug 181586 has been marked as a duplicate of this bug. ***