I discovered my CPU at 0% idle this morning. top reported systat using 95% of the CPU time. I didn't have systat running anywhere, then remembered I had it running in an xterm last night and killed the xterm without stopping systat. This seems to be 100% reproducable, but interestingly enough only happens when systat is run as root. Running systat as a normal user doesn't trigger this behavior.....killing its xterm kills systat in that case. It's annoying that systat doesn't die in this situation, the fact that it maxes the CPU is what really makes this a nasty behavior. Fix: I don't know of a fix but the workaround is: 1) Don't run systat as root 2) Don't kill an xterm with a running systat in it How-To-Repeat: startx open 2 xterms su to root in one of them start top in the other systat -if in the root xterm kill the xterm running systat watch systat run away to 100% cpu usage
For what it's worth I've done some more testing and the problem doesn't seem to be restricted to xterms. The same behaviour can be reproduced using pterms, aterms, and eterms. I've also confirmed that other people can reproduce the issue, although all on 6.x i386 boxes. -- Thanks, Josh Paetzel
The bug is in /usr/src/usr.bin/systat/keyboard.c, where an error return by getch() is ignored. When you close an xterm, it sends SIGHUP to all processes running on that terminal, and all further terminal I/O by those processes fails with ENXIO. But a process running as a different user (e.g. root) can't receive the SIGHUP, and systat ignores the ENXIO and retries endlessly. I presume the retry is to cope with the case where getch() was interrupted by a signal. So probably it should only retry if errno = EINTR. This is essentially what top does. Also, ferror(stdin) is a bogus test; getch() does not go through stdio and so ferror(stdin) will never be set, and the clearerr() is bogus too. EINTR is probably moot for FreeBSD anyway since it appears our ncurses handles that transparently, but it doesn't hurt to check. Here is a patch which fixes the problem. It still assumes the constant ERR returned by getch() is -1 which is not great style, but I'll touch as little as possible here. --- /usr/src/usr.bin/systat/keyboard.c.old Thu Mar 8 02:00:35 2007 +++ /usr/src/usr.bin/systat/keyboard.c Thu Mar 8 02:00:40 2007 @@ -42,6 +42,7 @@ #include <ctype.h> #include <signal.h> #include <termios.h> +#include <errno.h> #include "systat.h" #include "extern.h" @@ -58,9 +59,11 @@ do { refresh(); ch = getch() & 0177; - if (ch == 0177 && ferror(stdin)) { - clearerr(stdin); - continue; + if (ch == 0177) { + if (errno == EINTR) + continue; + else + die(0); } if (ch >= 'A' && ch <= 'Z') ch += 'a' - 'A'; -- Nate Eldredge nge@cs.hmc.edu
On Thu, 8 Mar 2007, Nate Eldredge wrote: > I presume the retry is to cope with the case where getch() was interrupted > by a signal. So probably it should only retry if errno = EINTR. This is > essentially what top does. Also, ferror(stdin) is a bogus test; getch() > does not go through stdio and so ferror(stdin) will never be set, and the > clearerr() is bogus too. > > EINTR is probably moot for FreeBSD anyway since it appears our ncurses > handles that transparently, but it doesn't hurt to check. ncurses has code to actively hide it, but this doesn't seem to be configured in FreeBSD (option HIDE_EINTR). Thus getch()'s man page may be wrong in claiming that the ncurses implementation never returns EINTR. In fact, it is a bug for getch() to never return EINTR, and a worse bug to loop in an input function when read() returns EINTR. Most unthreaded applications that use signal handlers need to change the BSD default of restarting syscalls after a signal, so that their signal handlers can just set a flag and depend on their main loop checking the flag promptly. Restarting syscalls prevents the flag being checked promptly by looping calling read() (or another syscall) in the kernel, and HIDE_EINTR would prevent the flag being checked promptly by looping calling read() in the library. However, EINTR is probably irrelevant for the current bug. systat doesn't catch SIGHUP in order to clean up up unsafely like it does for SIGINT, so any SIGHUP that gives EOF on the terminal would either: - kill the process uncleanly, or - not be delivered to the process because it is masked (no EINTR since it is not caught), or - not be delivered to the process because of job control stuff. So systat just needs a simple check for EOF from getch() and the remainly problem is that getch(), like getc(), unimproves on read() by not having any way to distinguish the non-error of EOF from errors. Bruce
Bruce Evans wrote: > However, EINTR is probably irrelevant for the current bug. systat > doesn't catch SIGHUP in order to clean up up unsafely like it does > for SIGINT, so any SIGHUP that gives EOF on the terminal would either: > - kill the process uncleanly, or > - not be delivered to the process because it is masked (no EINTR since > it is not caught), or > - not be delivered to the process because of job control stuff. I included the EINTR handling because systat does use SIGALRM. In fact it actually does all the display updating inside the SIGALRM handler (nice huh?). If we are sure that restarting syscalls are in effect (systat does not explicitly select this behavior), then I agree that systat can die on EINTR like any other error. But if not then EINTR could legitimately appear every time the alarm rings, and the correct behavior is to retry. > So systat just needs a simple check for EOF from getch() and the > remainly problem is that getch(), like getc(), unimproves on read() > by not having any way to distinguish the non-error of EOF from errors. Note getch() returns ERR, not EOF. (In the current implementation they are both equal to -1.) getc() is actually okay; when getc() returns EOF you can use ferror(f) and feof(f) to decide if it was an error or end-of-file. But yes, with getch() you're stuck. I guess because it's normally used on a terminal device in raw mode, where you wouldn't normally receive an end-of-file (since ^D wouldn't work). -- Nate Eldredge nge@cs.hmc.edu
All I have stumbled upon this PR after finding a systat process running at 99% cpu for the umpteenth time. I don't know enough about the subject matter to comment on the solution but I would be very happy for the problem to go away. The patch has not been committed, was there an issue with it to prevent this? Happy new year, Dominic Marks
Here's an updated patch against HEAD. I took the keyboard.c snippet from OpenBSD and added a signal handling for SIGHUP in main.c.
Updated patch to add a missing include for stdlib.h in keyboard.c. (For exit(3) prototype.)
The same problem also appears to affect many other full-screen ncurses apps: I've seen it occur with ee, sysinstall, sade and tzsetup too. -- Bruce
Responsible Changed From-To: freebsd-bugs->jh Take.
Author: jh Date: Sun Oct 11 12:32:25 2009 New Revision: 197956 URL: http://svn.freebsd.org/changeset/base/197956 Log: - Catch SIGHUP to perform cleanup before exiting. - Exit if getch() returns with an error other than EINTR. Otherwise systat(1) may get stuck in an infinite loop if it doesn't receive SIGHUP when terminal closes. [1] - Remove attempt to clear stdio error indicators. getch() doesn't use stdio, making it useless. [2] - Remove unneeded masking of getch() return value. [2] PR: bin/107171 Reviewed by: bde Approved by: trasz (mentor) Obtained from: OpenBSD [1] Suggested by: bde [2] MFC after: 1 month Modified: head/usr.bin/systat/keyboard.c head/usr.bin/systat/main.c Modified: head/usr.bin/systat/keyboard.c ============================================================================== --- head/usr.bin/systat/keyboard.c Sun Oct 11 12:23:56 2009 (r197955) +++ head/usr.bin/systat/keyboard.c Sun Oct 11 12:32:25 2009 (r197956) @@ -39,8 +39,10 @@ __FBSDID("$FreeBSD$"); static const char sccsid[] = "@(#)keyboard.c 8.1 (Berkeley) 6/6/93"; #endif +#include <errno.h> #include <ctype.h> #include <signal.h> +#include <stdlib.h> #include <termios.h> #include "systat.h" @@ -57,10 +59,11 @@ keyboard(void) move(CMDLINE, 0); do { refresh(); - ch = getch() & 0177; - if (ch == 0177 && ferror(stdin)) { - clearerr(stdin); - continue; + ch = getch(); + if (ch == ERR) { + if (errno == EINTR) + continue; + exit(1); } if (ch >= 'A' && ch <= 'Z') ch += 'a' - 'A'; Modified: head/usr.bin/systat/main.c ============================================================================== --- head/usr.bin/systat/main.c Sun Oct 11 12:23:56 2009 (r197955) +++ head/usr.bin/systat/main.c Sun Oct 11 12:32:25 2009 (r197956) @@ -133,6 +133,7 @@ main(int argc, char **argv) exit(1); } } + signal(SIGHUP, die); signal(SIGINT, die); signal(SIGQUIT, die); signal(SIGTERM, die); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
State Changed From-To: open->patched Patched in head (r197956).
Author: jh Date: Sun Nov 15 14:11:26 2009 New Revision: 199290 URL: http://svn.freebsd.org/changeset/base/199290 Log: MFC r197956: - Catch SIGHUP to perform cleanup before exiting. - Exit if getch() returns with an error other than EINTR. Otherwise systat(1) may get stuck in an infinite loop if it doesn't receive SIGHUP when terminal closes. - Remove attempt to clear stdio error indicators. getch() doesn't use stdio, making it useless. - Remove unneeded masking of getch() return value. PR: bin/107171 Approved by: trasz (mentor) Modified: stable/8/usr.bin/systat/keyboard.c stable/8/usr.bin/systat/main.c Directory Properties: stable/8/usr.bin/systat/ (props changed) Modified: stable/8/usr.bin/systat/keyboard.c ============================================================================== --- stable/8/usr.bin/systat/keyboard.c Sun Nov 15 11:43:28 2009 (r199289) +++ stable/8/usr.bin/systat/keyboard.c Sun Nov 15 14:11:26 2009 (r199290) @@ -39,8 +39,10 @@ __FBSDID("$FreeBSD$"); static const char sccsid[] = "@(#)keyboard.c 8.1 (Berkeley) 6/6/93"; #endif +#include <errno.h> #include <ctype.h> #include <signal.h> +#include <stdlib.h> #include <termios.h> #include "systat.h" @@ -57,10 +59,11 @@ keyboard(void) move(CMDLINE, 0); do { refresh(); - ch = getch() & 0177; - if (ch == 0177 && ferror(stdin)) { - clearerr(stdin); - continue; + ch = getch(); + if (ch == ERR) { + if (errno == EINTR) + continue; + exit(1); } if (ch >= 'A' && ch <= 'Z') ch += 'a' - 'A'; Modified: stable/8/usr.bin/systat/main.c ============================================================================== --- stable/8/usr.bin/systat/main.c Sun Nov 15 11:43:28 2009 (r199289) +++ stable/8/usr.bin/systat/main.c Sun Nov 15 14:11:26 2009 (r199290) @@ -133,6 +133,7 @@ main(int argc, char **argv) exit(1); } } + signal(SIGHUP, die); signal(SIGINT, die); signal(SIGQUIT, die); signal(SIGTERM, die); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Author: jh Date: Fri May 21 16:03:57 2010 New Revision: 208381 URL: http://svn.freebsd.org/changeset/base/208381 Log: MFC r197956: - Catch SIGHUP to perform cleanup before exiting. - Exit if getch() returns with an error other than EINTR. Otherwise systat(1) may get stuck in an infinite loop if it doesn't receive SIGHUP when terminal closes. - Remove attempt to clear stdio error indicators. getch() doesn't use stdio, making it useless. - Remove unneeded masking of getch() return value. PR: bin/107171 Modified: stable/7/usr.bin/systat/keyboard.c stable/7/usr.bin/systat/main.c Directory Properties: stable/7/usr.bin/systat/ (props changed) Modified: stable/7/usr.bin/systat/keyboard.c ============================================================================== --- stable/7/usr.bin/systat/keyboard.c Fri May 21 16:01:57 2010 (r208380) +++ stable/7/usr.bin/systat/keyboard.c Fri May 21 16:03:57 2010 (r208381) @@ -39,8 +39,10 @@ __FBSDID("$FreeBSD$"); static const char sccsid[] = "@(#)keyboard.c 8.1 (Berkeley) 6/6/93"; #endif +#include <errno.h> #include <ctype.h> #include <signal.h> +#include <stdlib.h> #include <termios.h> #include "systat.h" @@ -57,10 +59,11 @@ keyboard() move(CMDLINE, 0); do { refresh(); - ch = getch() & 0177; - if (ch == 0177 && ferror(stdin)) { - clearerr(stdin); - continue; + ch = getch(); + if (ch == ERR) { + if (errno == EINTR) + continue; + exit(1); } if (ch >= 'A' && ch <= 'Z') ch += 'a' - 'A'; Modified: stable/7/usr.bin/systat/main.c ============================================================================== --- stable/7/usr.bin/systat/main.c Fri May 21 16:01:57 2010 (r208380) +++ stable/7/usr.bin/systat/main.c Fri May 21 16:03:57 2010 (r208381) @@ -133,6 +133,7 @@ main(int argc, char **argv) exit(1); } } + signal(SIGHUP, die); signal(SIGINT, die); signal(SIGQUIT, die); signal(SIGTERM, die); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
State Changed From-To: patched->closed Fixed in head, stable/8 and stable/7.
*** Bug 32667 has been marked as a duplicate of this bug. ***