Bug 215540 - ntpd fails to start when ran via ssh with pseudo-terminal and terminals/connection are closed before child opens new fds
Summary: ntpd fails to start when ran via ssh with pseudo-terminal and terminals/conne...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 11.0-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-24 19:55 UTC by Derek Schrock
Modified: 2016-12-24 19:55 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Derek Schrock 2016-12-24 19:55:36 UTC
ntpd fails to start if terminals are closed before ntpd's child process can open new fds.  Testing with HAVE_CLOSEFROM in config.h doesn't have this issue.

Before restrarting ntpd from A:

  $ ntpq -pn
       remote           refid      st t when poll reach   delay   offset  jitter
  ==============================================================================
   north-america.p .POOL.          16 p    -   64    0    0.000    0.000   0.002
  *50.22.155.163   209.51.161.238   2 u    1   64    1   87.964   -2.998   2.572
  ...

Using something like the following to restart ntp on an array of machines:

    for m in A B C; do ssh -t "$m" 'sudo service ntpd restart' ; done

Later from host A:

  ...
  Dec 23 18:40:57 <ntp.notice> A ntpd[80966]: ntpd 4.2.8p9-a (1): Starting
  ...

  $ ntpq -pn
  ntpq: read: Connection refused
  ...

ntpd fails to finish starting up and silently fails.  The same is true from B and C.

Running the same without ssh's -t or sleeping before the connection closes allows ntpd child to startup.  This assumes the user can run sudo without a password:

  for m in A B C; do ssh "$m" 'sudo service ntpd restart' ; done

or

  for m in A B C; do ssh -t "$m" 'sudo service ntpd restart; sleep 1' ; done

Testing ntpd from the relang/11.0 branch with '#define HAVE_CLOSEFROM 1' in config.h appears to solve the problem.
This will cause libntp/ntp_worker.c close_all_beyond() to use closefrom(2) that appears to not suffer the same issue as the two other ifdef's.

However, the issue appears to be contrib/ntp/ doesn't have a test to see if the system has closefrom(2) so it appears we have to add it manually?

This was initially found when restarting ntpd on a legion of machines via ansible.  I'd suspect other config management tools that use ssh with pseudo-terminals would suffer the same issue.  The above  is a raw example of what something like ansible might perform.

This is also a problem in 9.3 and 10.x