Bug 217159 - ps default to 79 characters can break ps|grep in shell scripts
Summary: ps default to 79 characters can break ps|grep in shell scripts
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 11.0-RELEASE
Hardware: Any Any
: --- Affects Many People
Assignee: Mike Karels
URL: https://reviews.freebsd.org/D14614
Keywords: patch
Depends on:
Blocks:
 
Reported: 2017-02-17 07:37 UTC by Deepak Nagaraj
Modified: 2018-03-10 00:13 UTC (History)
4 users (show)

See Also:


Attachments
Patch to set unlimited terminal size if we can't determine terminal characteristics (346 bytes, patch)
2017-02-17 07:37 UTC, Deepak Nagaraj
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Deepak Nagaraj 2017-02-17 07:37:28 UTC
Created attachment 180064 [details]
Patch to set unlimited terminal size if we can't determine terminal characteristics

FreeBSD ps defaults to output width of 79 characters if it cannot determine tty width.  This can hurt scripts that use "ps | grep" and run, for example, ssh on a non-interactive shell.

I've provided more context here:

http://deepix.github.io/2016/10/10/psww.html

I've also provided a patch as an attachment in unified diff format.

Note that this problem does not exist on Linux or OS X.
Comment 1 Conrad Meyer freebsd_committer 2017-03-04 21:02:24 UTC
Some history on this one.

This dates back to an improvement by "bloom@" (Jim Bloom?) in BSD in 1985 to use the terminal's size:

  > get screen width ioctl added and hold passwd file open
  https://svnweb.freebsd.org/csrg/bin/ps/ps.c?r1=18105&r2=18106& 

Prior to that, it was hardcoded at 80.

Kirk McKusick merged 5.4.1.1 (of what, I'm not sure), which changed the ioctl to check stdout instead of stdin (CSRG r27213).

In 1990, marc@ added the fallback checking for stderr, stdout, and stdin, and dropped the width from 80 to 79, as you see today.  The commit log was... terse:

  > new version
  https://svnweb.freebsd.org/csrg/bin/ps/ps.c?r1=40675&r2=40674&pathrev=40675

It hasn't changed significantly since.
Comment 2 Conrad Meyer freebsd_committer 2017-03-04 21:06:31 UTC
Apple's Darwin fork of ps(1) has worked around this slightly differently (and is basically the same approach I would take):

https://opensource.apple.com/source/adv_cmds/adv_cmds-168/ps/ps.c.auto.html

#ifdef __APPLE__
	/* 3862041 */
	if (!isatty(STDOUT_FILENO))
		termwidth = UNLIMITED;
#endif /* __APPLE__ */
Comment 3 Conrad Meyer freebsd_committer 2017-03-04 22:38:33 UTC
Thanks for reporting this!
Comment 4 commit-hook freebsd_committer 2017-03-04 22:38:54 UTC
A commit references this bug:

Author: cem
Date: Sat Mar  4 22:38:11 UTC 2017
New revision: 314685
URL: https://svnweb.freebsd.org/changeset/base/314685

Log:
  ps(1): Only detect terminal width if stdout is a tty

  If stdout isn't a tty, use unlimited width output rather than truncating to
  79 characters.  This is helpful for shell scripts or e.g., 'ps | grep foo'.

  This hardcoded width has some history: In The Beginning of History[0], the
  width of ps was hardcoded as 80 bytes.  In 1985, Bloom@ added detection
  using TIOCGWINSZ on stdin.[1]  In 1986, Kirk merged a change to check
  stdout's window size instead.  In 1990, the fallback checks to stderr and
  stdin's TIOCGWINSZ were added by Marc@, with the commit message "new
  version."[2]

  OS X Darwin has a very similar modification to ps(1), which simply sets
  UNLIMITED for all non-tty outputs.[3]  I've chosen to respect COLUMNS
  instead of behaving identically to Darwin here, but I don't feel strongly
  about that.  We could match OS X for parity if that is desired.

  [0]: https://svnweb.freebsd.org/csrg/bin/ps/ps.c?annotate=1065
  [1]: https://svnweb.freebsd.org/csrg/bin/ps/ps.c?r1=18105&r2=18106
  [2]:
  https://svnweb.freebsd.org/csrg/bin/ps/ps.c?r1=40675&r2=40674&pathrev=40675
  [3]:
  https://opensource.apple.com/source/adv_cmds/adv_cmds-168/ps/ps.c.auto.html

  PR:		217159
  Reported by:	Deepak Nagaraj <n.deepak at gmail.com>

Changes:
  head/bin/ps/ps.c
Comment 5 Deepak Nagaraj 2017-03-21 21:26:17 UTC
(In reply to Conrad Meyer from comment #1)

Hi Conrad,

Thanks for making this change!  Is there a valid reason for the existence of "else if" that follows the isatty() check?

i.e.:

199	        else if ((ioctl(STDOUT_FILENO, TIOCGWINSZ, (char *)&ws) == -1 &&
200	             ioctl(STDERR_FILENO, TIOCGWINSZ, (char *)&ws) == -1 &&
201	             ioctl(STDIN_FILENO,  TIOCGWINSZ, (char *)&ws) == -1) ||
202	             ws.ws_col == 0)

Is there a way this ioctl() can fail, if isatty() is false?

If not, it is dead code and we could get rid of at least the ioctl() part of it.

Thanks again.
Deepak
Comment 6 Conrad Meyer freebsd_committer 2017-03-21 22:15:25 UTC
Hi Deepak,

> Thanks for making this change!  Is there a valid reason for the existence of "else if" that follows the isatty() check?

Yes (note that the check is for !isatty()) — if the terminal *is* a tty, we want to size output correctly.  It isn't dead code.  IMO we could remove the ioctl's on STDIN and STDERR completely, but it isn't worth getting shouted at on the mailing list for.
Comment 7 Conrad Meyer freebsd_committer 2018-02-28 00:32:54 UTC
Reopening and assigning to Karels, as he broke it.
Comment 8 Mike Karels freebsd_committer 2018-03-02 02:49:00 UTC
In general, the default output line length of ps if standard output is not a terminal is undefined for a portable script.  FreeBSD doesn't document it, but has always been 79 if the job is not interactive, and the window width if it is interactive. MacOS X documents it as unlimited.  Ubuntu says that it is undefined, and might be 80, set according to TERM, or unlmited (see the description of the "args" code); but as noted, it uses unlimited.  All of them document that -w sets the width to 132, and -ww to unlimited.  Therefore, a script that wants unlimited length should clearly use the w option twice.  This is portable to the all the systems listed, and to all previous versions of FreeBSD.  Changing the behavior for the benefit of script writers who omit -ww does not seem justified.  I do not think the behavior described in this bug is a problem.

I'll note that the submitted patch does not change behavior if stderr or stdin refer to a terminal, but the patch that was committed does.
Comment 9 Deepak Nagaraj 2018-03-02 06:02:00 UTC
> Changing the behavior for the benefit of script writers 
> who omit -ww does not seem justified.

Omitting "-ww" falls into the category of "unknown unknown". Scripts tend to do "ps aux | grep".  They check for ps options, and do not anticipate "ps" to break when the output is not a terminal.  If these statements occur in startup/shutdown scripts, then we can have errant behavior in the system: duplicate processes may launch, shutdown script may fail to kill the process correctly, and so on.  This is what we saw with Hadoop scripts, which is a well-known, popular application (and it works on Linux, Mac, etc).

Debugging this problem is no easy task, especially if Unix systems are not your forte.

We can use my original patch if the one that was committed has had undesirable side effects.
Comment 10 Mike Karels freebsd_committer 2018-03-03 02:21:44 UTC
Sorry to hear that this took time to debug.  Did you install Hadoop from ports/pkg, or from upstream?  The port should be fixed if it is the source of the problem.  It would be reasonable to report to the Hadoop project as well.

-ww should not be unknown to anyone writing an ostensibly-portable script using ps, as it is documented in all of the man pages I listed.  "ps axu"  (no -) is legacy BSD syntax, so reasonably should be tested on a BSD system.  Using -ww is portable to all systems, including existing FreeBSD systems that would never see this change.

This issue was discussed on the FreeBSD committers list, then on the architecture list.  Although there were not a lot of responses, none were in favor of this change.
Comment 11 Conrad Meyer freebsd_committer 2018-03-03 02:50:55 UTC
(In reply to Mike Karels from comment #10)
> Although there were not a lot of responses, none were in favor of this change.

That certainly isn't true — I am and have been in favor of the portable behavior.
Comment 12 Conrad Meyer freebsd_committer 2018-03-03 02:56:09 UTC
Re: r314685 responses:

Ed Schouten didn't respond negatively (and offered improvements consistent with keeping the approach -- a form of approval).

Bruce responds negatively to everything and anything, so maybe discount him a little bit.

Bryan wasn't responding negatively to the idea of the change, just the minor -w vs -ww regression which was subsequently fixed in r314832.

Baldwin's response wasn't especially negative.

Garance's response was that the change didn't affect him at all.

I don't really buy the "everyone was opposed" narrative you're trying to sell, Karels.
Comment 13 Mike Karels freebsd_committer 2018-03-03 03:25:24 UTC
Conrad, you said there wasn't consensus, but didn't otherwise comment.  I should have included you as supporting the change; I thought that was obvious.  I didn't count those who didn't reply specifically one way or the other.  John and Garance both said that they favored reversion.  In any case, that discussion is over, and I stand by my conclusion that there is consensus.  I'm not going to argue it any more in this bug.
Comment 14 Deepak Nagaraj 2018-03-03 05:00:59 UTC
As an additional data point, the patch I submitted has been used in NetScaler load balancer line (based on FreeBSD) for the last year without any problems.  Prior to it, we had to debug erratic startup/shutdown behavior because (third party) script writers had missed the "-ww" flag for some applications.  I've seen this in Spark scripts as well.

It's particularly problematic because they work flawlessly on Linux and Mac.  Changing application scripts is not always possible: it amounts to an audit of 3rd party application scripts and constant monitoring of any new applications for this particular defect.

Compared to the effort needed to debug this, if the patch makes the application developer's life easier, I'd suggest we keep it.  It may also be worth checking out what discussions prompted the current behavior on Linux and especially Mac (given its BSD heritage).
Comment 15 Mike Karels freebsd_committer 2018-03-03 19:36:25 UTC
Behavior could still be erratic with your patch, as the output line length would vary when a script was used from a terminal vs not.  The broken scripts would still not be portable to existing versions of FreeBSD.
Comment 16 Deepak Nagaraj 2018-03-04 01:48:13 UTC
Yes, but it is the same assumption on the other platforms I mentioned.  Use the terminal width if available, else assume unlimited width.

The behavior is broken even if you assume 80 as well: it may not really be the width of the terminal in today's displays, so you're likely using two different values depending on terminal availability.

It makes more sense to assume unlimited width than 80 when ps is piped to filters like "grep".
Comment 17 Deepak Nagaraj 2018-03-04 01:50:56 UTC
OK, I overlooked the "still" in your post: you do mention that it's broken now.  I didn't have to raise that point.

Nevertheless, it seems more sensible to assume unlimited width than 80.
Comment 18 Deepak Nagaraj 2018-03-04 02:00:17 UTC
OK, at this point I'm not really adding any new information to this discussion.  I've already made my case.  I'll let you and the FreeBSD committee decide which way to go on this bug.  Thanks anyway for your time and consideration.
Comment 19 Garance A Drosehn freebsd_committer 2018-03-06 02:53:18 UTC
It was said:
> Garance's response was that the change didn't affect him at all.

To clarify, I explicitly said that I felt the change should be reverted, and I have said that multiple times in multiple threads.  The issue doesn't effect me personally, because I've always had to write scripts which are portable.  (that's because we have a variety of OS's here at RPI, so even if I'm the only person using some script, I have to write that script in a portable way).  The fact that I write portable scripts does not mean my vote should be ignored.

   When reviewing some change in behavior, I tend to look first to OpenBSD/NetBSD before Linux.  And has I mentioned in one of the threads, the behavior that FreeBSD has in 11.x matches the other BSD's, while the behavior in 12-current does not.

I don't mind too much if some change is made here, but it seems wrong that my statements have been "summarized" to mean something very different from what my opinion is.  I think it's also unfair to dismiss Bruce's vote "because he's negative on everything".
Comment 20 Mike Karels freebsd_committer 2018-03-08 01:06:36 UTC
I discussed this with some of the people who objected to the previously-committed change.  The consensus was to put in your change.  I created a review with that change, plus updates to the man page.
Comment 21 Deepak Nagaraj 2018-03-08 01:18:42 UTC
Yay!  Thanks a lot.
Deepak
Comment 22 commit-hook freebsd_committer 2018-03-10 00:11:34 UTC
A commit references this bug:

Author: karels
Date: Sat Mar 10 00:10:47 UTC 2018
New revision: 330712
URL: https://svnweb.freebsd.org/changeset/base/330712

Log:
  Change ps(1) output width to unlimited if not interactive

  Apply patch submitted with PR 217159 to make ps use unlimited
  width when not associated with a terminal (i.e., none of stdout, stdin,
  or stderr is a tty). Update comments and man page correspondingly.
  This change was requested to work around lack of -ww in scripts from
  third-party packages, including Hadoop, and adds a small measure of
  Linux compatibility. Hopefully few if any non-interactive scripts
  depend on the old default of 79.

  PR:		217159
  Submitted by:	n.deepak at gmail.com
  Reviewed by:	vangyzen jhb
  Differential Revision:	https://reviews.freebsd.org/D14614

Changes:
  head/bin/ps/ps.1
  head/bin/ps/ps.c
Comment 23 Mike Karels freebsd_committer 2018-03-10 00:13:56 UTC
Closing with commit of submitted change.