Summary: | ps default to 79 characters can break ps|grep in shell scripts | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Deepak Nagaraj <n.deepak> | ||||
Component: | bin | Assignee: | Mike Karels <karels> | ||||
Status: | Closed Not Accepted | ||||||
Severity: | Affects Many People | CC: | cem, gad, haasca2112, n.deepak, neon19, pi | ||||
Priority: | --- | Keywords: | patch | ||||
Version: | 11.0-RELEASE | ||||||
Hardware: | Any | ||||||
OS: | Any | ||||||
URL: | https://reviews.freebsd.org/D14614 | ||||||
Attachments: |
|
Description
Deepak Nagaraj
2017-02-17 07:37:28 UTC
Some history on this one. This dates back to an improvement by "bloom@" (Jim Bloom?) in BSD in 1985 to use the terminal's size: > get screen width ioctl added and hold passwd file open https://svnweb.freebsd.org/csrg/bin/ps/ps.c?r1=18105&r2=18106& Prior to that, it was hardcoded at 80. Kirk McKusick merged 5.4.1.1 (of what, I'm not sure), which changed the ioctl to check stdout instead of stdin (CSRG r27213). In 1990, marc@ added the fallback checking for stderr, stdout, and stdin, and dropped the width from 80 to 79, as you see today. The commit log was... terse: > new version https://svnweb.freebsd.org/csrg/bin/ps/ps.c?r1=40675&r2=40674&pathrev=40675 It hasn't changed significantly since. Apple's Darwin fork of ps(1) has worked around this slightly differently (and is basically the same approach I would take): https://opensource.apple.com/source/adv_cmds/adv_cmds-168/ps/ps.c.auto.html #ifdef __APPLE__ /* 3862041 */ if (!isatty(STDOUT_FILENO)) termwidth = UNLIMITED; #endif /* __APPLE__ */ Thanks for reporting this! A commit references this bug: Author: cem Date: Sat Mar 4 22:38:11 UTC 2017 New revision: 314685 URL: https://svnweb.freebsd.org/changeset/base/314685 Log: ps(1): Only detect terminal width if stdout is a tty If stdout isn't a tty, use unlimited width output rather than truncating to 79 characters. This is helpful for shell scripts or e.g., 'ps | grep foo'. This hardcoded width has some history: In The Beginning of History[0], the width of ps was hardcoded as 80 bytes. In 1985, Bloom@ added detection using TIOCGWINSZ on stdin.[1] In 1986, Kirk merged a change to check stdout's window size instead. In 1990, the fallback checks to stderr and stdin's TIOCGWINSZ were added by Marc@, with the commit message "new version."[2] OS X Darwin has a very similar modification to ps(1), which simply sets UNLIMITED for all non-tty outputs.[3] I've chosen to respect COLUMNS instead of behaving identically to Darwin here, but I don't feel strongly about that. We could match OS X for parity if that is desired. [0]: https://svnweb.freebsd.org/csrg/bin/ps/ps.c?annotate=1065 [1]: https://svnweb.freebsd.org/csrg/bin/ps/ps.c?r1=18105&r2=18106 [2]: https://svnweb.freebsd.org/csrg/bin/ps/ps.c?r1=40675&r2=40674&pathrev=40675 [3]: https://opensource.apple.com/source/adv_cmds/adv_cmds-168/ps/ps.c.auto.html PR: 217159 Reported by: Deepak Nagaraj <n.deepak at gmail.com> Changes: head/bin/ps/ps.c (In reply to Conrad Meyer from comment #1) Hi Conrad, Thanks for making this change! Is there a valid reason for the existence of "else if" that follows the isatty() check? i.e.: 199 else if ((ioctl(STDOUT_FILENO, TIOCGWINSZ, (char *)&ws) == -1 && 200 ioctl(STDERR_FILENO, TIOCGWINSZ, (char *)&ws) == -1 && 201 ioctl(STDIN_FILENO, TIOCGWINSZ, (char *)&ws) == -1) || 202 ws.ws_col == 0) Is there a way this ioctl() can fail, if isatty() is false? If not, it is dead code and we could get rid of at least the ioctl() part of it. Thanks again. Deepak Hi Deepak,
> Thanks for making this change! Is there a valid reason for the existence of "else if" that follows the isatty() check?
Yes (note that the check is for !isatty()) — if the terminal *is* a tty, we want to size output correctly. It isn't dead code. IMO we could remove the ioctl's on STDIN and STDERR completely, but it isn't worth getting shouted at on the mailing list for.
Reopening and assigning to Karels, as he broke it. In general, the default output line length of ps if standard output is not a terminal is undefined for a portable script. FreeBSD doesn't document it, but has always been 79 if the job is not interactive, and the window width if it is interactive. MacOS X documents it as unlimited. Ubuntu says that it is undefined, and might be 80, set according to TERM, or unlmited (see the description of the "args" code); but as noted, it uses unlimited. All of them document that -w sets the width to 132, and -ww to unlimited. Therefore, a script that wants unlimited length should clearly use the w option twice. This is portable to the all the systems listed, and to all previous versions of FreeBSD. Changing the behavior for the benefit of script writers who omit -ww does not seem justified. I do not think the behavior described in this bug is a problem. I'll note that the submitted patch does not change behavior if stderr or stdin refer to a terminal, but the patch that was committed does. > Changing the behavior for the benefit of script writers
> who omit -ww does not seem justified.
Omitting "-ww" falls into the category of "unknown unknown". Scripts tend to do "ps aux | grep". They check for ps options, and do not anticipate "ps" to break when the output is not a terminal. If these statements occur in startup/shutdown scripts, then we can have errant behavior in the system: duplicate processes may launch, shutdown script may fail to kill the process correctly, and so on. This is what we saw with Hadoop scripts, which is a well-known, popular application (and it works on Linux, Mac, etc).
Debugging this problem is no easy task, especially if Unix systems are not your forte.
We can use my original patch if the one that was committed has had undesirable side effects.
Sorry to hear that this took time to debug. Did you install Hadoop from ports/pkg, or from upstream? The port should be fixed if it is the source of the problem. It would be reasonable to report to the Hadoop project as well. -ww should not be unknown to anyone writing an ostensibly-portable script using ps, as it is documented in all of the man pages I listed. "ps axu" (no -) is legacy BSD syntax, so reasonably should be tested on a BSD system. Using -ww is portable to all systems, including existing FreeBSD systems that would never see this change. This issue was discussed on the FreeBSD committers list, then on the architecture list. Although there were not a lot of responses, none were in favor of this change. (In reply to Mike Karels from comment #10) > Although there were not a lot of responses, none were in favor of this change. That certainly isn't true — I am and have been in favor of the portable behavior. Re: r314685 responses: Ed Schouten didn't respond negatively (and offered improvements consistent with keeping the approach -- a form of approval). Bruce responds negatively to everything and anything, so maybe discount him a little bit. Bryan wasn't responding negatively to the idea of the change, just the minor -w vs -ww regression which was subsequently fixed in r314832. Baldwin's response wasn't especially negative. Garance's response was that the change didn't affect him at all. I don't really buy the "everyone was opposed" narrative you're trying to sell, Karels. Conrad, you said there wasn't consensus, but didn't otherwise comment. I should have included you as supporting the change; I thought that was obvious. I didn't count those who didn't reply specifically one way or the other. John and Garance both said that they favored reversion. In any case, that discussion is over, and I stand by my conclusion that there is consensus. I'm not going to argue it any more in this bug. As an additional data point, the patch I submitted has been used in NetScaler load balancer line (based on FreeBSD) for the last year without any problems. Prior to it, we had to debug erratic startup/shutdown behavior because (third party) script writers had missed the "-ww" flag for some applications. I've seen this in Spark scripts as well. It's particularly problematic because they work flawlessly on Linux and Mac. Changing application scripts is not always possible: it amounts to an audit of 3rd party application scripts and constant monitoring of any new applications for this particular defect. Compared to the effort needed to debug this, if the patch makes the application developer's life easier, I'd suggest we keep it. It may also be worth checking out what discussions prompted the current behavior on Linux and especially Mac (given its BSD heritage). Behavior could still be erratic with your patch, as the output line length would vary when a script was used from a terminal vs not. The broken scripts would still not be portable to existing versions of FreeBSD. Yes, but it is the same assumption on the other platforms I mentioned. Use the terminal width if available, else assume unlimited width. The behavior is broken even if you assume 80 as well: it may not really be the width of the terminal in today's displays, so you're likely using two different values depending on terminal availability. It makes more sense to assume unlimited width than 80 when ps is piped to filters like "grep". OK, I overlooked the "still" in your post: you do mention that it's broken now. I didn't have to raise that point. Nevertheless, it seems more sensible to assume unlimited width than 80. OK, at this point I'm not really adding any new information to this discussion. I've already made my case. I'll let you and the FreeBSD committee decide which way to go on this bug. Thanks anyway for your time and consideration. It was said:
> Garance's response was that the change didn't affect him at all.
To clarify, I explicitly said that I felt the change should be reverted, and I have said that multiple times in multiple threads. The issue doesn't effect me personally, because I've always had to write scripts which are portable. (that's because we have a variety of OS's here at RPI, so even if I'm the only person using some script, I have to write that script in a portable way). The fact that I write portable scripts does not mean my vote should be ignored.
When reviewing some change in behavior, I tend to look first to OpenBSD/NetBSD before Linux. And has I mentioned in one of the threads, the behavior that FreeBSD has in 11.x matches the other BSD's, while the behavior in 12-current does not.
I don't mind too much if some change is made here, but it seems wrong that my statements have been "summarized" to mean something very different from what my opinion is. I think it's also unfair to dismiss Bruce's vote "because he's negative on everything".
I discussed this with some of the people who objected to the previously-committed change. The consensus was to put in your change. I created a review with that change, plus updates to the man page. Yay! Thanks a lot. Deepak A commit references this bug: Author: karels Date: Sat Mar 10 00:10:47 UTC 2018 New revision: 330712 URL: https://svnweb.freebsd.org/changeset/base/330712 Log: Change ps(1) output width to unlimited if not interactive Apply patch submitted with PR 217159 to make ps use unlimited width when not associated with a terminal (i.e., none of stdout, stdin, or stderr is a tty). Update comments and man page correspondingly. This change was requested to work around lack of -ww in scripts from third-party packages, including Hadoop, and adds a small measure of Linux compatibility. Hopefully few if any non-interactive scripts depend on the old default of 79. PR: 217159 Submitted by: n.deepak at gmail.com Reviewed by: vangyzen jhb Differential Revision: https://reviews.freebsd.org/D14614 Changes: head/bin/ps/ps.1 head/bin/ps/ps.c Closing with commit of submitted change. *** Bug 234975 has been marked as a duplicate of this bug. *** What version of FreeBSD is this issue fixed in? (In reply to Neon from comment #25) It was never fixed. Why was patch rejected? IMHO, it fixed an issue with ps on FreeBSD. I am with Deepak on this, ps should default to unlimited width when piped and redirected to a file or at the very least default to width of 132 if -ww is not specified. (In reply to Neon from comment #27) I fully agree with you. Unfortunately, it was rejected by karels@, with support from Garance, jhb, vangyzen, and dab: https://reviews.freebsd.org/D14530 . (In reply to Conrad Meyer from comment #28) I am very disappointed that the fix was reverted. It makes FreeBSD less usable and consistent with other UNIX systems. So ps | grep continues to be broken on FreeBSD... (Deepak, you might want to update your blog posting (http://deepix.github.io/2016/10/10/psww.html) to say that FreeBSD's ps is still broken since fix got accepted but then got reverted). Looks like in the revert diffs (https://reviews.freebsd.org/D14530), they missed reverting some of the changes to ps.1 (that was applied in https://reviews.freebsd.org/D14614), (mainly line 107 should be adjusted to say "79" vs "unlimited") 104: If the 105: .Nm 106: process is associated with a terminal, the default output width is that of the 107: terminal; otherwise the output width is unlimited. 108: See also the 109: .Fl w 110: option. (In reply to Neon from comment #27) I agree with Neon on this one. My assumption is that the grep is going to be on the full width (unlimited) string coming from ps. This is very unsettling as I will be thinking things are not running when they very will could be. What is the reasoning for not just defaulting to unlimited. It can't take that much more memory or processing. Behaving differently just because it was always that way causes much trouble and no gain. And it violates POLA. |