Bug 215540

Summary: ntpd fails to start when ran via ssh with pseudo-terminal and terminals/connection are closed before child opens new fds
Product: Base System Reporter: Derek Schrock <dereks>
Component: binAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Some People    
Priority: ---    
Version: 11.0-RELEASE   
Hardware: Any   
OS: Any   

Description Derek Schrock 2016-12-24 19:55:36 UTC
ntpd fails to start if terminals are closed before ntpd's child process can open new fds.  Testing with HAVE_CLOSEFROM in config.h doesn't have this issue.

Before restrarting ntpd from A:

  $ ntpq -pn
       remote           refid      st t when poll reach   delay   offset  jitter
   north-america.p .POOL.          16 p    -   64    0    0.000    0.000   0.002
  *   2 u    1   64    1   87.964   -2.998   2.572

Using something like the following to restart ntp on an array of machines:

    for m in A B C; do ssh -t "$m" 'sudo service ntpd restart' ; done

Later from host A:

  Dec 23 18:40:57 <ntp.notice> A ntpd[80966]: ntpd 4.2.8p9-a (1): Starting

  $ ntpq -pn
  ntpq: read: Connection refused

ntpd fails to finish starting up and silently fails.  The same is true from B and C.

Running the same without ssh's -t or sleeping before the connection closes allows ntpd child to startup.  This assumes the user can run sudo without a password:

  for m in A B C; do ssh "$m" 'sudo service ntpd restart' ; done


  for m in A B C; do ssh -t "$m" 'sudo service ntpd restart; sleep 1' ; done

Testing ntpd from the relang/11.0 branch with '#define HAVE_CLOSEFROM 1' in config.h appears to solve the problem.
This will cause libntp/ntp_worker.c close_all_beyond() to use closefrom(2) that appears to not suffer the same issue as the two other ifdef's.

However, the issue appears to be contrib/ntp/ doesn't have a test to see if the system has closefrom(2) so it appears we have to add it manually?

This was initially found when restarting ntpd on a legion of machines via ansible.  I'd suspect other config management tools that use ssh with pseudo-terminals would suffer the same issue.  The above  is a raw example of what something like ansible might perform.

This is also a problem in 9.3 and 10.x