Bug 207804 - nscd negtive cache do not timeout for getaddrinfo:Non -recoverable failure in name resolution
Summary: nscd negtive cache do not timeout for getaddrinfo:Non -recoverable failure in...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 10.1-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: Cy Schubert
URL:
Keywords: patch
: 181586 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-03-08 12:00 UTC by Jov
Modified: 2020-07-11 18:41 UTC (History)
3 users (show)

See Also:


Attachments
fix nscd negtive ttl bug (724 bytes, patch)
2016-03-09 07:13 UTC, Jov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jov 2016-03-08 12:00:55 UTC
I met this bug several times for the last years,today I can reproduce it on a machine.I can always restart nscd or use nscd -i hosts to clear the nscd cache,but I think it is better to fix this bug.

config:

cat /etc/nsswitch.conf 
group: compat
group_compat: nis
hosts: cache files dns
networks: files
passwd: compat
passwd_compat: nis
shells: files
services: compat
services_compat: nis
protocols: files
rpc: files

cat /etc/nscd.conf
threads 1
enable-cache passwd yes
enable-cache group yes
enable-cache hosts yes
enable-cache services yes
enable-cache protocols yes
enable-cache rpc yes
enable-cache networks yes

positive-time-to-live hosts 30
negative-time-to-live hosts 1

on the first console :
>pkg install bash postgresql95-client
>bash
>while true; do psql -p 80 -h www.google.com; sleep 50; done

you will see
psql: received invalid response to SSL...

on the second console:
edit the /etc/resolv.conf,comment all the nameserver using "#'

now on the first console,you will see:
psql: could not translate host name "www.google.com" to address: hostname nor servname provided,or not known
psql: could not translate host name "www.google.com" to address: Non
-recoverable failure in name resolution
...............repeate the last message.........
you can ping www.google.com,but psql can not resolve the error.I read the psql code,and find it use getaddrinfo to resolve the addr.

now even if you un-comment the servername in /etc/resolv.conf ,the  "Non
-recoverable failure in name resolution" still there,unless you restart nscd,or nscd -i hosts for that user.

uname -a
FreeBSD xx 10.1-RELEASE-p26 FreeBSD 10.1-RELEASE-p26 #0: Wed Jan 13 20:59:29 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
Comment 1 Jov 2016-03-08 12:08:04 UTC
(In reply to Jov from comment #0)

you can ping www.google.com after you recover the /etc/resolv.conf content,but psql can not resolve the error.I read the psql code,and find it use getaddrinfo to resolve the addr.

this bug happened for my home router several times and one vps a a time. 

I have coredump for nscd,and I can gdb for the psql or nscd process for more info if some one need.
Comment 2 Jov 2016-03-09 07:13:31 UTC
Created attachment 167901 [details]
fix nscd negtive ttl bug

I find PR 181586 and try the patch,it fix the bug.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=181586

the origin patch is trimed,and I rework the patch.
Comment 3 Cy Schubert freebsd_committer freebsd_triage 2017-05-18 04:03:37 UTC
I'm not able to reproduce this problem.

slippy$ while :; do psql -p 80 -h www.google.com; sleep 50; done
psql: received invalid response to SSL negotiation: H

I commented out all nameserver statements in resolv.conf

psql: could not translate host name "www.google.com" to address: hostname nor servname provided, or not known
psql: could not translate host name "www.google.com" to address: hostname nor servname provided, or not known

I restored all nameserver statements in resolv.conf

psql: received invalid response to SSL negotiation: H
psql: received invalid response to SSL negotiation: H

At no time was nscd restarted or cache flushed.

The test was performed on;

FreeBSD slippy 12.0-CURRENT FreeBSD 12.0-CURRENT #41 r318400M: Wed May 17 06:08:52 PDT 2017     root@slippy:/export/obj/opt/src/svn-current/sys/BREAK  amd64
Comment 4 Jov 2017-05-18 04:45:08 UTC
(In reply to Cy Schubert from comment #3)
I can reproduce this bug on REL-11.0p8

on tty1:
while true; do nc -z -w 3 www.google.com 80; sleep 5; done
Connection to www.google.com 80 port [tcp/http] succeeded!
Connection to www.google.com 80 port [tcp/http] succeeded!
Connection to www.google.com 80 port [tcp/http] succeeded!
Connection to www.google.com 80 port [tcp/http] succeeded!
nc: getaddrinfo: hostname nor servname provided, or not known---> after [act1]
nc: getaddrinfo: Non-recoverable failure in name resolution
nc: getaddrinfo: Non-recoverable failure in name resolution
nc: getaddrinfo: Non-recoverable failure in name resolution
nc: getaddrinfo: Non-recoverable failure in name resolution ----> after [act2]
nc: getaddrinfo: Non-recoverable failure in name resolution ----> should not happen because the negtive entry should timeout.the patch fix this.
nc: getaddrinfo: Non-recoverable failure in name resolution
nc: getaddrinfo: Non-recoverable failure in name resolution
Connection to www.google.com 80 port [tcp/http] succeeded! ----> after [act3],restart nscd to force clean the negative cache.
Connection to www.google.com 80 port [tcp/http] succeeded!

on tty2:
[act1]: remove the content of /etc/resolev.conf
[act2]: restore the content of /etc/resolve.conf
[act3]: restart nscd


plseae make sure the config is take effect,because the nsswitch do not use cache by default ,and the default TTL for positive and negtive are too long:

/etc/nsswitch.conf has:
hosts: cache files dns

and /etc/nscd.conf (should restart nscd after change)has:
positive-time-to-live hosts 3
negative-time-to-live hosts 1
Comment 5 Jov 2017-05-18 05:58:34 UTC
I can also reproduce this bug with 12-current nscd from FreeBSD-12.0-CURRENT-amd64-20170510-r318137-mini-memstick.img.I replace the REL 11.0 nscd binary with the nscd from 12-CURRENT img and do the same steps.
Comment 6 Cy Schubert freebsd_committer freebsd_triage 2017-05-18 06:03:21 UTC
Do I understand you correctly? Did you run the 12.0 binary on a 11.0 system?
Comment 7 Jov 2017-05-18 06:13:37 UTC
(In reply to Cy Schubert from comment #6)
Yes,my last post(comment #4) run 12 binary on 11.
Comment 8 Cy Schubert freebsd_committer freebsd_triage 2017-05-18 13:30:47 UTC
That's wrong. A binary that works on a 12 system but not on an 11 system proves the binary is not at fault. The fault lies in a library patch that has not been MFCed from HEAD to stable/11.
Comment 9 Jov 2017-05-20 01:05:28 UTC
(In reply to Cy Schubert from comment #8)
hi, I can also reproduce this bug on 12-current with the same method.I setup a 12-current using FreeBSD-12.0-CURRENT-amd64-20170510-r318137-disc1.iso

uname -a
FreeBSD fb12 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r318137: Wed May 10 15:09:31 UTC 2017     root@releng3.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
Comment 10 Cy Schubert freebsd_committer freebsd_triage 2017-05-20 03:12:18 UTC
I'll test again, on a different machine.
Comment 11 Cy Schubert freebsd_committer freebsd_triage 2017-05-20 03:36:05 UTC
I've tested on my testbed this time (laptop last time). The bug has been confirmed. I'll commit it after some additional tests I usually do before committing.

Thank you for finding this bug and for the patch.
Comment 12 commit-hook freebsd_committer freebsd_triage 2017-05-20 16:59:51 UTC
A commit references this bug:

Author: cy
Date: Sat May 20 16:58:49 UTC 2017
New revision: 318578
URL: https://svnweb.freebsd.org/changeset/base/318578

Log:
  Fix non-recoverable name resolution failures due to negative cache
  entries never expiring. This patch honours the negative cache timeout.

  To test/experience the failure do the following:

  1. Edit /etc/ncd.conf to adjust the cache timeouts as follows:

  	positive-time-to-live hosts 30
  	negative-time-to-live hosts 1

  2. Ensure that nsswitch.conf hosts line contains something like:

  	hosts: files cache dns

  	Note that cache must be specified before dns.

  3. Start nscd.

  4. Run the following command:

  	while true; do nc -z -w 3 www.google.com 80; sleep 5; done

  5. While running the command, remove or comment out all nameserver
     statements in /etc/resolv.conf. After a short while you will notice
     non-recoverable name rsolution failures.

  6. Uncomment or replace all nameserver statements back into
     /etc/resolv.conf. Take note that name resolution never recovers.
     To recover nscd must be restarted. This patch fixes this.

  PR:		207804
  Submitted by:	Jov <amutu@amutu.com>
  MFC after:	1 week

Changes:
  head/usr.sbin/nscd/query.c
Comment 13 Cy Schubert freebsd_committer freebsd_triage 2017-05-20 17:00:39 UTC
Thank you for the patch. I will MFC in a week.
Comment 14 commit-hook freebsd_committer freebsd_triage 2017-05-30 03:28:49 UTC
A commit references this bug:

Author: cy
Date: Tue May 30 03:28:00 UTC 2017
New revision: 319177
URL: https://svnweb.freebsd.org/changeset/base/319177

Log:
  MFC r318578:

  Fix non-recoverable name resolution failures due to negative cache
  entries never expiring. This patch honours the negative cache timeout.

  To test/experience the failure do the following:

  1. Edit /etc/ncd.conf to adjust the cache timeouts as follows:

  	positive-time-to-live hosts 30
  	negative-time-to-live hosts 1

  2. Ensure that nsswitch.conf hosts line contains something like:

  	hosts: files cache dns

  	Note that cache must be specified before dns.

  3. Start nscd.

  4. Run the following command:

  	while true; do nc -z -w 3 www.google.com 80; sleep 5; done

  5. While running the command, remove or comment out all nameserver
     statements in /etc/resolv.conf. After a short while you will notice
     non-recoverable name rsolution failures.

  6. Uncomment or replace all nameserver statements back into
     /etc/resolv.conf. Take note that name resolution never recovers.
     To recover nscd must be restarted. This patch fixes this.

  PR:		207804
  Submitted by:	Jov <amutu@amutu.com>

Changes:
_U  stable/10/
  stable/10/usr.sbin/nscd/query.c
_U  stable/11/
  stable/11/usr.sbin/nscd/query.c
Comment 15 Cy Schubert freebsd_committer freebsd_triage 2017-05-30 03:40:14 UTC
MFCed. Thanks for the patch.
Comment 16 Allan Jude freebsd_committer freebsd_triage 2020-07-11 18:41:09 UTC
*** Bug 181586 has been marked as a duplicate of this bug. ***