Summary: | net-mgmt/telegraf: service (re)start hangs when scripted: missing option -f with daemon(8) | ||||||
---|---|---|---|---|---|---|---|
Product: | Ports & Packages | Reporter: | Mark.Martinec | ||||
Component: | Individual Port(s) | Assignee: | Palle Girgensohn <girgen> | ||||
Status: | Closed DUPLICATE | ||||||
Severity: | Affects Some People | CC: | cheffo | ||||
Priority: | --- | Flags: | bugzilla:
maintainer-feedback?
(girgen) |
||||
Version: | Latest | ||||||
Hardware: | Any | ||||||
OS: | Any | ||||||
Attachments: |
|
Please see bug 214738 - it contains fix and also update to latest version of telegraf. Thanks! Hi Stefan, Great, well spotted. Adding the -f option does render the redirects meaningless though, just redirecting /dev/null >> /var/log/telegraf Now, one could argue that given the stupid log level of telegraf, this is a good thing... :) But then why not just not redirect? Isn't the fix suggested in 213412 the correct one, though? Redirecting both stdin and stderr closes the file descriptors before forking. Then we will not miss the logs from telegraf. Adding -quiet to telegraf_flags seems reasonable as well to shut up the endless tirades of meaingless logs. *** This bug has been marked as a duplicate of bug 213412 *** A commit references this bug: Author: girgen Date: Wed Nov 23 00:23:49 UTC 2016 New revision: 426880 URL: https://svnweb.freebsd.org/changeset/ports/426880 Log: Update telegraf to 1.1.1 Make sure both stdin and stderr are redirected since daemon(8) forks and this would otherwise leave scripted restarts haning. See #213412 and #214643 for a more complete discussion. PR: 214738, 213412, 214643 Submitted by: Stefan Lambrev Changes: head/net-mgmt/telegraf/Makefile head/net-mgmt/telegraf/distinfo head/net-mgmt/telegraf/files/telegraf.in |
Created attachment 177161 [details] A patch to add the -f daemon(8) flag to net-mgmt/telegraf/files/telegraf.in I came across this issue when trying to manage a service 'telegraf' with SaltStack (i.e. sysutils/py-salt). I would expect the same problem can occur under other management tools like Puppet/Chef/Ansible. What happens is that when a salt-minion tries to start or restart a telegraf process by spawning a command: 'service telegraf restart', the process just hangs, waiting for a restart to happen, even though the telegraf process does restart successfully and is again running underneath. Investigating the issue boils down to this simple and repeatable test case, which mimics the environment under salt-minion (a Python process), feeding a stdout to a pipe: # service telegraf restart | cat Stopping telegraf. Waiting for PIDS: 60067. Starting telegraf. (and hangs, even though the telegraf daemon has indeed successfully restarted meanwhile) With rc_debug="YES" the same thing looks like: (wrapped for clarity) # service telegraf restart | cat /usr/local/etc/rc.d/telegraf: DEBUG: checkyesno: telegraf_enable is set to YES. Stopping telegraf. /usr/local/etc/rc.d/telegraf: DEBUG: run_rc_command: doit: kill -TERM 919 Waiting for PIDS: 919. /usr/local/etc/rc.d/telegraf: DEBUG: pid file (/var/run/telegraf.pid): not readable. /usr/local/etc/rc.d/telegraf: DEBUG: checkyesno: telegraf_enable is set to YES. /usr/local/etc/rc.d/telegraf: DEBUG: run_rc_command: start_precmd: telegraf_prestart Starting telegraf. /usr/local/etc/rc.d/telegraf: DEBUG: run_rc_command: doit: /usr/sbin/daemon -crP /var/run/telegraf.pid /usr/local/bin/telegraf - config=/usr/local/etc/telegraf.conf 2>> /var/log/telegraf.log (and hangs) The problem is that the rc.d script for a telegraf package uses the daemon(8) utility to run the telegraf process, but forgets to specify the '-f' option to /usr/sbin/daemon. $ man daemon daemon — run detached from the controlling terminal -f Redirect standard input, standard output and standard error to /dev/null. [...] If the -p, -P or -r option is specified the program is executed in a spawned child process. So what happens is: - The process executing the daemon(8) program (as started by the 'service' command) has its stdout directed to a pipe; - The daemon(8) program disassociates from a controlling terminal but DOES NOT close its stdout, and then forks; - The forked daemon(8) subprocess inherits the open fd to a pipe; - The parent daemon(8) then exits, but the child daemon(8) still has a pipe open, connected to cat(1) (or to salt-minion); - the child daemon(8) process then exec's the telegraf program, which again inherits the pipe as its stdout. So even though the parent daemon(8) process no longer exists and the 'service' command could now be expected to finish, the sending side of the pipe is still connected to a (re)started telegraf process, and the 'cat' keeps hanging indefinitely, waiting for the pipe on its stdin to close, which never happens. The fix is trivial, just add the option -f to the daemon(8) command on the rc.d script, forcing it to close stdout and stderr before forking: --- /usr/local/etc/rc.d/telegraf~ 2016-11-18 23:53:57.046298000 +0100 +++ /usr/local/etc/rc.d/telegraf 2016-11-18 23:54:17.675132000 +0100 @@ -31,3 +31,3 @@ command=/usr/sbin/daemon -command_args="-crP ${pidfile} /usr/local/bin/${name} ${telegraf_flags} -config=${telegraf_conf} 2>> /var/log/telegraf.log" +command_args="-f -crP ${pidfile} /usr/local/bin/${name} ${telegraf_flags} -config=${telegraf_conf} 2>> /var/log/telegraf.log" (the patch is attached) Considering that there are at least a dozen of ports using a daemon(8) utility but forgetting to specify the -f option, I wonder if it would not be wiser to change the daemon(8) utility to close by default the stdin/stdout/stderr after disassociating from a controlling terminal but before forking, as this is something that one expects as an essential step in daemonizing a process. IMO this may be preferred to filing a dozen of bug reports similar to this one for various other ports, even at the expense of incompatibility.