Bug 224126

Summary: rc.d/ntpd outputs fetch usage syntax (stable/11)
Product: Base System Reporter: Jeremy Chadwick <jdc>
Component: binAssignee: Cy Schubert <cy>
Status: Closed FIXED    
Severity: Affects Some People CC: asomers, cy, feld
Priority: ---    
Version: 11.0-STABLE   
Hardware: Any   
OS: Any   

Description Jeremy Chadwick 2017-12-05 20:16:25 UTC
Note: this DOES NOT appear to be related to Bug 213448.

This morning I found the following in my daily run output (periodic daily) on my stable/11 box:

===
...

Security check:
    (output mailed separately)
usage: fetch [-146AadFlMmnPpqRrsUv] [-B bytes] [--bind-address=host]
       [--ca-cert=file] [--ca-path=dir] [--cert=file] [--crl=file]
       [-i file] [--key=file] [-N file] [--no-passive] [--no-proxy=list]
       [--no-sslv3] [--no-tlsv1] [--no-verify-hostname] [--no-verify-peer]
       [-o file] [--referer=URL] [-S bytes] [-T seconds]
       [--user-agent=agent-string] [-w seconds] URL ...
       fetch [-146AadFlMmnPpqRrsUv] [-B bytes] [--bind-address=host]
       [--ca-cert=file] [--ca-path=dir] [--cert=file] [--crl=file]
       [-i file] [--key=file] [-N file] [--no-passive] [--no-proxy=list]
       [--no-sslv3] [--no-tlsv1] [--no-verify-hostname] [--no-verify-peer]
       [-o file] [--referer=URL] [-S bytes] [-T seconds]
       [--user-agent=agent-string] [-w seconds] -h host -f file [-c dir]

Checking userland and kernel versions:
Userland and kernel are in sync.
...
===

Digging through old cron mails, this problem began November 28th (November 27th had no problem).  The system has not been rebuilt (world, mergemaster, etc. -- system is running stable/11 r325712) or rebooted since November 24th.

I modified rc.conf to contain ntp_leapfile_fetch_verbose="yes" and then ran service ntpd onefetch to achieve the following output:

===
root@mambo:~ # service ntpd onefetch
ntp_src_leapfile version is 3676924800
ntp_db_leapfile version is 3676924800
not replacing /var/db/ntpd.leap-seconds.list with /etc/ntp/leap-seconds
Within ntp leapfile expiry limit, initiating fetch
fetching https://www.ietf.org/timezones/data/leap-seconds.list
usage: fetch [-146AadFlMmnPpqRrsUv] [-B bytes] [--bind-address=host]
       [--ca-cert=file] [--ca-path=dir] [--cert=file] [--crl=file]
       [-i file] [--key=file] [-N file] [--no-passive] [--no-proxy=list]
       [--no-sslv3] [--no-tlsv1] [--no-verify-hostname] [--no-verify-peer]
       [-o file] [--referer=URL] [-S bytes] [-T seconds]
       [--user-agent=agent-string] [-w seconds] URL ...
       fetch [-146AadFlMmnPpqRrsUv] [-B bytes] [--bind-address=host]
       [--ca-cert=file] [--ca-path=dir] [--cert=file] [--crl=file]
       [-i file] [--key=file] [-N file] [--no-passive] [--no-proxy=list]
       [--no-sslv3] [--no-tlsv1] [--no-verify-hostname] [--no-verify-peer]
       [-o file] [--referer=URL] [-S bytes] [-T seconds]
       [--user-agent=agent-string] [-w seconds] -h host -f file [-c dir]

{very VERY long stall here}
===

pstree output showed the following:

===
 | \-+= 71172 root sshd: jdc [priv] (sshd)
 |   \-+- 71174 jdc sshd: jdc@pts/1 (sshd)
 |     \-+= 71175 jdc -bash (bash)
 |       \-+= 71176 root sudo su -
 |         \-+- 71177 root su -
 |           \-+= 71178 root -su (csh)
 |             \-+= 71181 root /bin/sh /etc/rc.d/ntpd onefetch
 |               \-+- 71200 root /bin/sh /etc/rc.d/ntpd onefetch
 |                 \--- 71201 root awk $1 == "#$" { print $2 }
===

I then ^C'd the service ntpd onefetch and re-ran it under truss -af and found the following:

===
root@mambo:~ # truss -af service ntpd onefetch
...
71342: wait4(-1,{ EXITED,val=0 },0x0,0x0)        = 71358 (0x116be)
Within ntp leapfile expiry limit, initiating fetch
71342: write(1,"Within ntp leapfile expiry limit"...,51) = 51 (0x33)
fetching https://www.ietf.org/timezones/data/leap-seconds.list
71342: write(1,"fetching https://www.ietf.org/ti"...,63) = 63 (0x3f)
71342: stat("/sbin/fetch",0x7fffffffdbe8)        ERR#2 'No such file or directory'
71342: stat("/bin/fetch",0x7fffffffdbe8)         ERR#2 'No such file or directory'
71342: stat("/usr/sbin/fetch",0x7fffffffdbe8)    ERR#2 'No such file or directory'
71342: stat("/usr/bin/fetch",{ mode=-r-xr-xr-x ,inode=81231,size=36400,blksize=32768 }) = 0 (0x0)
71360: <new process>
71342: vfork()                                   = 71360 (0x116c0)
71360: execve("/usr/bin/fetch",[ "fetch", "-mq", "-o", "https://www.ietf.org/timezones/data/leap-seconds.list" ],0x801418790) = 0 (0x0)
71360: mmap(0x0,32768,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34366189568 (0x800627000)
...
usage: fetch [-146AadFlMmnPpqRrsUv] [-B bytes] [--bind-address=host]
       [--ca-cert=file] [--ca-path=dir] [--cert=file] [--crl=file]
       [-i file] [--key=file] [-N file] [--no-passive] [--no-proxy=list]
       [--no-sslv3] [--no-tlsv1] [--no-verify-hostname] [--no-verify-peer]
       [-o file] [--referer=URL] [-S bytes] [-T seconds]
       [--user-agent=agent-string] [-w seconds] URL ...
       fetch [-146AadFlMmnPpqRrsUv] [-B bytes] [--bind-address=host]
       [--ca-cert=file] [--ca-path=dir] [--cert=file] [--crl=file]
       [-i file] [--key=file] [-N file] [--no-passive] [--no-proxy=list]
       [--no-sslv3] [--no-tlsv1] [--no-verify-hostname] [--no-verify-peer]
       [-o file] [--referer=URL] [-S bytes] [-T seconds]
       [--user-agent=agent-string] [-w seconds] -h host -f file [-c dir]
71360: write(2,"usage: fetch [-146AadFlMmnPpqRrs"...,811) = 811 (0x32b)
...
===

Look closely at the execve() call arguments: there output file argument is missing, which is why fetch is spewing usage syntax:

71360: execve("/usr/bin/fetch",[ "fetch", "-mq", "-o", "https://www.ietf.org/timezones/data/leap-seconds.list" ],0x801418790) = 0 (0x0)

This call essentially correlates with ntpd_fetch_leapfile() in /etc/rc.d/ntpd:

===
...
        if ntpd_needfetch_leapfile ; then
                for url in $ntp_leapfile_sources ; do
                        $verbose fetching $url
                        fetch $ntp_leapfile_fetch_opts -o $ntp_tmp_leapfile $url && break
                done
...
===

This means that $ntp_tmp_leapfile is somehow empty/undefined when the fetch is being attempted -- and that's the bug.

The long delay is an after-effect of get_ntp_leapfile_ver()'s use of awk.  AFAICT it's sitting around waiting for data it never gets because of the failed fetch.

ntp-related stuff in my rc.conf and periodic.conf:

rc.conf
=========
ntpd_enable="yes"
ntpd_config="/conf/ME/ntp.conf"
ntpd_sync_on_start="yes"
ntpd_flags="-4 -p /var/run/ntpd.pid -f /var/db/ntpd.drift"
ntp_leapfile_fetch_verbose="yes"

periodic.conf
===============
daily_ntpd_avoid_congestion="no"

ntp-related files and both file and leap second timestamps:

root@mambo:~ # ls -lT /etc/ntp/leap-seconds /var/db/ntpd.leap-seconds.list
-rw-r--r--  1 root  wheel  10408 Nov  9 08:37:13 2017 /etc/ntp/leap-seconds
-rw-r--r--  1 root  wheel  10408 Nov  9 08:37:13 2017 /var/db/ntpd.leap-seconds.list

root@mambo:~ # grep '^#\$' /etc/ntp/leap-seconds /var/db/ntpd.leap-seconds.list
/etc/ntp/leap-seconds:#$         3676924800
/var/db/ntpd.leap-seconds.list:#$        3676924800

/etc/rc.d/ntpd is:

# $FreeBSD: stable/11/etc/rc.d/ntpd 325256 2017-11-01 01:03:44Z cy $

/etc/periodic/daily/480.leapfile-ntpd is:

# $FreeBSD: stable/11/etc/periodic/daily/480.leapfile-ntpd 325256 2017-11-01 01:03:44Z cy $
Comment 1 Jeremy Chadwick 2017-12-05 20:28:47 UTC
Few things to add for clarification:

1. The very long stall IS NOT caused by sleep(1).  It's caused by the model of piping used with awk, best I can tell,

2. I've since modified /etc/periodic.conf, removing daily_ntpd_avoid_congestion="no" since it's been deprecated and replaced with the more universal anticongestion_sleeptime (which I now set to 10), and the long delay during this problem still happens.

3. I can't tell what the full impact is here.  periodic daily certainly gets delayed as a result, but I only noticed that in my most recent run from last night -- an additional 30 minutes added -- which is very strange.

So for now let's not focus on the "long delay", but instead figure out why $ntp_tmp_leapfile is undefined/empty.
Comment 2 Jeremy Chadwick 2017-12-05 20:33:49 UTC
I believe this problem may be related to the following commit, but unsure; gut feeling says very likely:

http://www.freshbsd.org/commit/freebsd/r325256
Comment 3 Jeremy Chadwick 2017-12-05 20:34:54 UTC
Adding committers involved with r325256
Comment 4 Cy Schubert freebsd_committer freebsd_triage 2017-12-05 20:41:54 UTC
It appears it hasn't been mfc'd yet.
Comment 5 Jeremy Chadwick 2017-12-05 20:47:01 UTC
(In reply to Cy Schubert from comment #4)
Not sure I understand correctly -- r325256, which is an MFC of head r324681 and head r324738, did in fact reach stable/11 (that's how I have it):

https://svnweb.freebsd.org/base?view=revision&revision=325256
Comment 6 Cy Schubert freebsd_committer freebsd_triage 2017-12-05 20:50:58 UTC
What we want is an MFC of r326343. I will MFC as soon as my checkouts of stable/10 & 11 complete.
Comment 7 Jeremy Chadwick 2017-12-05 20:57:36 UTC
(In reply to Cy Schubert from comment #6)
Ahh!  Yes, that looks good.  :-) Thank you!
Comment 8 Cy Schubert freebsd_committer freebsd_triage 2017-12-06 00:02:13 UTC
MFC committed.