Bug 231193 - ntpd peers stuck in INIT status when using local named
Summary: ntpd peers stuck in INIT status when using local named
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: conf (show other bugs)
Version: 11.2-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: Ian Lepore
URL:
Keywords:
Depends on: 144400
Blocks:
  Show dependency treegraph
 
Reported: 2018-09-06 10:36 UTC by ml
Modified: 2018-09-06 20:08 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ml 2018-09-06 10:36:00 UTC
On several of my servers, which resolve through a locally running named, after boot ntp won't work.

# ntpq
ntpq> peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 ntp1.inrim.it   .INIT.          16 u    -   64    0    0.000    0.000   0.000
 ntp2.inrim.it   .INIT.          16 u    -   64    0    0.000    0.000   0.000




RC order is fine:
> # rcorder /etc/rc.d/* /usr/local/etc/rc.d/* | grep -E "(ntp|named)"
> /usr/local/etc/rc.d/named
> /etc/rc.d/ntpdate
> /etc/rc.d/ntpd
So named starts before ntpd. 



However, named starts asynchronously, so it's not ready yet when ntpd queries for its peer.
See the following log:
# grep -E "ntp|named" /var/log/all.log

> Aug 27 16:03:19 xxxxxx named[1158]: starting BIND 9.11.4-P1 (Extended Support Version) <id:2b060b2>
> Aug 27 16:03:19 xxxxxx kernel: Starting named.
> Aug 27 16:03:19 xxxxxx named[1158]: running on FreeBSD amd64 11.2-RELEASE-p2 FreeBSD 11.2-RELEASE-p2 #8 r338333: Mon Aug 27 09:33:23 CEST 2018     root@xxxxxx.xxxxxxxxxxxxxxxxxxxxx.it:/usr/obj/usr/src/sys/XXXXXX
> Aug 27 16:03:19 xxxxxx named[1158]: built with '--localstatedir=/var' '--disable-linux-caps' '--disable-symtable' '--with-randomdev=/dev/random' '--with-libxml2=/usr/local' '--with-readline=-L/usr/local/lib -ledit' '--with-dlopen=yes' '--sysconfdir=/usr/local/etc/namedb' '--disable-dnstap' '--disable-filter-aaaa' '--disable-fixed-rrset' '--without-geoip' '--with-idn=/usr/local' '--disable-ipv6' '--without-libjson' '--disable-largefile' '--without-lmdb' '--without-python' '--disable-querytrace' '--disable-rpz-nsdname' '--disable-rpz-nsip' '--enable-threads' '--with-tuning=default' '--without-gssapi' '--disable-openssl-version-check' '--without-openssl' '--disable-native-pkcs11' '--with-dlz-filesystem=yes' '--without-gost' '--prefix=/usr/local' '--mandir=/usr/local/man' '--infodir=/usr/local/info/' '--build=amd64-portbld-freebsd11.2' 'build_alias=amd64-portbld-freebsd11.2' 'CC=cc' 'CFLAGS=-O2 -pipe -DLIBICONV_PLUG -fstack-protector -isystem /usr/local/include -fno-strict-aliasing' 'LDFLAGS= -fstack-protector' 'LIB
> Aug 27 16:03:19 xxxxxx named[1158]: running as: named -u bind -c /usr/local/etc/namedb/named.conf
> Aug 27 16:03:19 xxxxxx named[1158]: compiled by CLANG 4.2.1 Compatible FreeBSD Clang 6.0.0 (tags/RELEASE_600/final 326565)
> Aug 27 16:03:19 xxxxxx named[1158]: compiled with libxml2 version: 2.9.7
> Aug 27 16:03:19 xxxxxx named[1158]: linked to libxml2 version: 20907
> Aug 27 16:03:19 xxxxxx named[1158]: compiled with zlib version: 1.2.11
> Aug 27 16:03:19 xxxxxx named[1158]: linked to zlib version: 1.2.11
> Aug 27 16:03:19 xxxxxx named[1158]: threads support is enabled
> Aug 27 16:03:19 xxxxxx named[1158]: ----------------------------------------------------
> Aug 27 16:03:19 xxxxxx named[1158]: BIND 9 is maintained by Internet Systems Consortium,
> Aug 27 16:03:19 xxxxxx named[1158]: Inc. (ISC), a non-profit 501(c)(3) public-benefit Aug 27 16:03:19 xxxxxx named[1158]: corporation.  Support and training for BIND 9 are Aug 27 16:03:19 xxxxxx named[1158]: available at https://www.isc.org/support
> Aug 27 16:03:19 xxxxxx named[1158]: ----------------------------------------------------
> Aug 27 16:03:19 xxxxxx named[1158]: found 4 CPUs, using 4 worker threads
> Aug 27 16:03:19 xxxxxx named[1158]: using 3 UDP listeners per interface
> Aug 27 16:03:19 xxxxxx named[1158]: using up to 4096 sockets
> Aug 27 16:03:19 xxxxxx named[1158]: loading configuration from '/usr/local/etc/namedb/named.conf'
> Aug 27 16:03:19 xxxxxx named[1158]: reading built-in trust anchors from file '/usr/local/etc/namedb/bind.keys'
> Aug 27 16:03:19 xxxxxx named[1158]: using default UDP/IPv4 port range: [49152, 65535]
> Aug 27 16:03:19 xxxxxx named[1158]: using default UDP/IPv6 port range: [49152, 65535]
> Aug 27 16:03:19 xxxxxx named[1158]: listening on IPv4 interface lo0, 127.0.0.1#53
> Aug 27 16:03:19 xxxxxx named[1158]: listening on IPv4 interface vlan1, 192.168.xxx.1#53
> Aug 27 16:03:19 xxxxxx named[1158]: listening on IPv4 interface vlan2, 192.168.xxx.1#53
> Aug 27 16:03:19 xxxxxx named[1158]: listening on IPv4 interface vlan3, 192.168.xxx.1#53
> Aug 27 16:03:19 xxxxxx named[1158]: generating session key for dynamic DNS
> Aug 27 16:03:19 xxxxxx named[1158]: sizing zone task pool based on 167 zones
> Aug 27 16:03:19 xxxxxx named[1158]: none:100: 'max-cache-size 90%' - setting to 7109MB (out of 7898MB)
> Aug 27 16:03:19 xxxxxx named[1158]: set up managed keys zone for view _default, file 'managed-keys.bind'
> Aug 27 16:03:19 xxxxxx named[1158]: automatic empty zone: EMPTY.AS112.ARPA
> Aug 27 16:03:19 xxxxxx named[1158]: automatic empty zone: HOME.ARPA
> Aug 27 16:03:19 xxxxxx named[1158]: none:100: 'max-cache-size 90%' - setting to 7109MB (out of 7898MB)
> Aug 27 16:03:19 xxxxxx named[1158]: configuring command channel from '/usr/local/etc/namedb/rndc.key'
> Aug 27 16:03:19 xxxxxx named[1158]: command channel listening on 127.0.0.1#953
> Aug 27 16:03:19 xxxxxx named[1158]: managed-keys-zone: loaded serial 0
> ...
> Aug 27 16:03:19 xxxxxx named[1158]: all zones loaded
> Aug 27 16:03:19 xxxxxx named[1158]: running
> Aug 27 16:03:19 xxxxxx kernel: Setting date via ntp.
> Aug 27 16:03:19 xxxxxx kernel: Error resolving ntp1.inrim.it: hostname nor servname provided, or not known (8)
> Aug 27 16:03:19 xxxxxx kernel: 27 Aug 16:03:19 ntpdate[1175]: Can't find host ntp1.inrim.it: hostname nor servname provided, or not known (8)
> Aug 27 16:03:19 xxxxxx kernel: Error resolving ntp2.inrim.it: hostname nor servname provided, or not known (8)
> Aug 27 16:03:19 xxxxxx kernel: 27 Aug 16:03:19 ntpdate[1175]: Can't find host ntp2.inrim.it: hostname nor servname provided, or not known (8)
> Aug 27 16:03:19 xxxxxx kernel: Error resolving 0.it.pool.ntp.org: hostname nor servname provided, or not known (8)
> Aug 27 16:03:19 xxxxxx kernel: 27 Aug 16:03:19 ntpdate[1175]: Can't find host 0.it.pool.ntp.org: hostname nor servname provided, or not known (8)
> Aug 27 16:03:19 xxxxxx kernel: 27 Aug 16:03:19 ntpdate[1175]: no servers can be used, exiting
> Aug 27 16:03:20 xxxxxx ntpd[1252]: ntpd 4.2.8p11-a (1): Starting
> Aug 27 16:03:20 xxxxxx kernel: Starting ntpd.
> Aug 27 16:03:20 xxxxxx ntpd[1252]: Command line: /usr/sbin/ntpd -c /etc/ntp.conf -p /var/run/ntpd.pid -f /var/db/ntpd.drift
> Aug 27 16:03:20 xxxxxx ntpd[1254]: proto: precision = 0.062 usec (-24)
> Aug 27 16:03:20 xxxxxx ntpd[1254]: restrict: ignoring line 12, address/host 'ntp1.inrim.it' unusable.
> Aug 27 16:03:20 xxxxxx ntpd[1254]: restrict: ignoring line 14, address/host 'ntp2.inrim.it' unusable.
> Aug 27 16:03:20 xxxxxx ntpd[1254]: restrict: ignoring line 16, address/host '0.it.pool.ntp.org' unusable.
> Aug 27 16:03:20 xxxxxx ntpd[1254]: Listen and drop on 0 v4wildcard 0.0.0.0:123
> Aug 27 16:03:20 xxxxxx ntpd[1254]: Listen normally on 2 lo0 127.0.0.1:123
> Aug 27 16:03:20 xxxxxx ntpd[1254]: Listen normally on 7 vlan1 192.168.xxx.1:123
> Aug 27 16:03:20 xxxxxx ntpd[1254]: Listening on routing socket on fd #28 for interface updates
> Aug 27 16:03:26 xxxxxx named[1158]: resolver priming query complete



I found no way to have ntpd recheck DNS later, so I think a workaround is needed; perhaps inserting some delay either in /usr/local/etc/rc.d/named or /etc/rc.d/ntpd might work.

I see named's RC script has named_wait, but it might be a bigger trouble than it solves, as the server could hang indefinitely if started without an Internet connection working (see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=144400).
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2018-09-06 19:44:22 UTC
Notify committer who has been working in this area recently.
Comment 2 Ian Lepore freebsd_committer 2018-09-06 20:00:56 UTC
(In reply to ml from comment #0)
It's not clear to me that it's ntpd having trouble resolving those names.  The name resolution errors in the log are from ntpdate, not ntpd.  When the peers are stuck in INIT state, what is the output from "ntpq -np"?  That will show for sure whether the addresses are resolved.  That ntpq output looks to me more like the names are resolved but UDP packets to or from port 123 are being blocked somehow.

Oh, hmm... after looking again it looks like the names may not be resolved when ntpd processes the config file (I just noticed the errors about restrict), but then they do resolve by time the peer configuration happens, so you end up with resolved peers who can't communicate because the restrict statements that would have allowed communications with them never got processed.

I think waiting for named to be ready is the only viable fix, and I agree with you that the ability to have an optional timeout for that wait is also important.  IMO, dougb was wrong to reject the changes in bug 144400, and I think I'll look into re-opening that and fixing it.  But I can't make any promises that it'll get into 12.0-RELEASE, I'm really busy these days.