Bug 151384 - rs(1) truncates lines longer than 1024 bytes (BUFSIZ)
Summary: rs(1) truncates lines longer than 1024 bytes (BUFSIZ)
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: Jaakko Heinonen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-11 07:10 UTC by kamikaze
Modified: 2011-10-07 15:35 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description kamikaze 2010-10-11 07:10:01 UTC
Recently rs has adopted a habit of damaging data in long lines
of input. I've been relying on it to reformat dependency data,
I had to switch to sed, which has a slight performance impact
for my use case.

How-To-Repeat: # jot -s\  -b 01234567 1000 | rs 0 1 | grep -vxF 01234567
012345
67
012
34567
012345
67
012
34567
012345
67
012
34567

The jot command prints the string 01234567 a thousand times in
a single row.

The rs command is supposed to generate an automatic(0) number of
rows with 1 column per row. I.e. every word stands in its own line.

The grep filters all intact words, so everything that is printed,
was damaged by rs.

This has the look of a repetitive pattern to me, probably this
happens at a fixed buffer boundary.
Comment 1 Ulrich Spoerlein freebsd_committer 2010-10-12 12:53:22 UTC
Responsible Changed
From-To: freebsd-bugs->uqs

I'll have a look at this. 

Can you please specify when this broke, ie. for which release did this work? 

Also, you may want to use xargs(1) to deal with overly long lines. Your example 
works fine like this: 

jot -s  -b 01234567 1000 | xargs -n1 | grep -vxF 01234567
Comment 2 hahnbc 2010-10-13 21:18:58 UTC
Question was:
"Can you please specify when this broke, ie. for which release did this work?"

Out of curiousity, I checked a handful of systems and couldn't find a
working version.

It was broken on the following:

FreeBSD 4.11-RELEASE-p14
FreeBSD 5.4-RELEASE-p1
FreeBSD 6.3-RELEASE-p14
FreeBSD 7.2-RELEASE-p6
FreeBSD 8.1-RELEASE

And just for fun, I tried it on a REALLY old system where it was also broken:

BSDI BSD/OS 3.1 (from 1998)

(The Description comment:  "Recently rs has adopted...." may be misleading.)

==

   - B
Comment 3 kamikaze 2010-10-13 22:00:22 UTC
On 13/10/2010 22:18, Bryan Hahn wrote:
> Question was:
> "Can you please specify when this broke, ie. for which release did this work?"

I've been using rs for a long time in pkg_upgrade and I only recognized
this problem recently, figuring out how some dependencies disappeared.

> Out of curiousity, I checked a handful of systems and couldn't find a
> working version.
> 
> It was broken on the following:
> 
> FreeBSD 4.11-RELEASE-p14
> FreeBSD 5.4-RELEASE-p1
> FreeBSD 6.3-RELEASE-p14
> FreeBSD 7.2-RELEASE-p6
> FreeBSD 8.1-RELEASE
> 
> And just for fun, I tried it on a REALLY old system where it was also broken:
> 
> BSDI BSD/OS 3.1 (from 1998)
> 
> (The Description comment:  "Recently rs has adopted...." may be misleading.)

Considering the /massive/ amount of testing I perform I figured
I'd have recognized this if it had happened before. I guess I was
wrong.

Thank you for your exhaustive effort to verify this problem.

As far as I'm concerned rs is salted ground, now. I will not use
it in my scripts any more until support for all branches where
it's broken has run out.

Considering your findings I'd expect it to be broken on Net-, Open-
and DragonFlyBSD, too.
Comment 4 kamikaze 2010-10-13 22:29:10 UTC
In response to uqs, about "when this broke" Bryan is probably a
better source.

The runtime of your xargs example is really terrible:

# jot -b 12345678 -s\  1000000 | time -h rs 0 1 > /dev/null
	0.27s real		0.10s user		0.01s sys

# jot -b 12345678 -s\  1000000 | time -h awk -F\  '{gsub(" ", "\n"); print}' > /dev/null
	0.39s real		0.17s user		0.04s sys

# jot -b 12345678 -s\  1000000 | time -h awk -F\  'BEGIN{RS=" "} {print $0}' > /dev/null
	0.57s real		0.53s user		0.00s sys

# jot -b 12345678 -s\  1000000 | time -h sed 's/ /\
/g' > /dev/null
	1.06s real		0.80s user		0.06s sys

# jot -b 12345678 -s\  1000000 | time -h xargs -n1 > /dev/null
	10m3.00s real		2m26.65s user		7m32.41s sys

Note the runtime factor between rs and xargs is somewhere
around 2000.

At least now I know that I should use AWK for this task.

Regards
Comment 5 kamikaze 2010-10-29 19:37:39 UTC
On 13/10/2010 22:18, Bryan Hahn wrote:
> Question was:
> "Can you please specify when this broke, ie. for which release did this work?"
> 
> Out of curiousity, I checked a handful of systems and couldn't find a
> working version.
> ...

Line 333 of of src/usr.bin/rs/rs.c states the nature of the problem:
/* two screenfuls should do */

There's a 2048 byte buffer and if your lines are longer than that,
you're screwed.

-- 
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Comment 6 Jaakko Heinonen freebsd_committer 2011-01-22 15:20:31 UTC
Responsible Changed
From-To: uqs->jh

Take.
Comment 7 dfilter service freebsd_committer 2011-02-07 18:10:24 UTC
Author: jh
Date: Mon Feb  7 18:10:18 2011
New Revision: 218411
URL: http://svn.freebsd.org/changeset/base/218411

Log:
  - Use LINE_MAX from limits.h as the maximum line length instead of
    BUFSIZ. Use LINE_MAX * 2 as the buffer size (BSIZE).
  - Error out if we encounter a line longer than LINE_MAX. The previous
    behavior was to silently split long lines and produce corrupted
    output.
  
  PR:		bin/151384

Modified:
  head/usr.bin/rs/rs.c

Modified: head/usr.bin/rs/rs.c
==============================================================================
--- head/usr.bin/rs/rs.c	Mon Feb  7 18:05:56 2011	(r218410)
+++ head/usr.bin/rs/rs.c	Mon Feb  7 18:10:18 2011	(r218411)
@@ -48,6 +48,7 @@ __FBSDID("$FreeBSD$");
 
 #include <err.h>
 #include <ctype.h>
+#include <limits.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -328,8 +329,8 @@ prepfile(void)
 		warnx("%d is colwidths, nelem %d", colwidths[i], nelem);*/
 }
 
-#define	BSIZE	2048
-char	ibuf[BSIZE];		/* two screenfuls should do */
+#define	BSIZE	(LINE_MAX * 2)
+char	ibuf[BSIZE];
 
 int
 getline(void)	/* get line; maintain curline, curlen; manage storage */
@@ -350,7 +351,7 @@ getline(void)	/* get line; maintain curl
 			curline = ibuf;
 		}
 	}
-	if (!putlength && endblock - curline < BUFSIZ) {   /* need storage */
+	if (!putlength && endblock - curline < LINE_MAX + 1) { /* need storage */
 		/*ww = endblock-curline; tt += ww;*/
 		/*printf("#wasted %d total %d\n",ww,tt);*/
 		if (!(curline = (char *) malloc(BSIZE)))
@@ -358,11 +359,16 @@ getline(void)	/* get line; maintain curl
 		endblock = curline + BSIZE;
 		/*printf("#endb %d curline %d\n",endblock,curline);*/
 	}
-	for (p = curline, i = 1; i < BUFSIZ; *p++ = c, i++)
-		if ((c = getchar()) == EOF || c == '\n')
+	for (p = curline, i = 0;; *p++ = c, i++) {
+		if ((c = getchar()) == EOF)
 			break;
+		if (i >= LINE_MAX)
+			errx(1, "maximum line length (%d) exceeded", LINE_MAX);
+		if (c == '\n')
+			break;
+	}
 	*p = '\0';
-	curlen = i - 1;
+	curlen = i;
 	return(c);
 }
 
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 8 Jaakko Heinonen freebsd_committer 2011-02-07 18:59:59 UTC
State Changed
From-To: open->patched

Patched in head (r218411).
Comment 9 kamikaze 2011-02-10 13:46:07 UTC
I propose documenting the change:

diff -u usr.bin/rs/rs.1.orig usr.bin/rs/rs.1
--- usr.bin/rs/rs.1.orig	2011-02-10 14:10:45.000000000 +0100
+++ usr.bin/rs/rs.1	2011-02-10 14:42:09.000000000 +0100
@@ -168,6 +168,18 @@
 unless the first non-ignored line is longer than the display width.
 Option letters which take numerical arguments interpret a missing
 number as zero unless otherwise indicated.
+.Sh LIMITATIONS
+Lines exceeding
+.Dv LINE_MAX
+defined in
+.In limits.h
+are not processed by
+.Nm
+and result in immediate termination. This behaviour is new as of
+.Fx 9.0 ,
+previous versions of
+.Nm
+silently split lines exceeding 2048 bytes and produce corrupted output.
 .Sh EXAMPLES
 The
 .Nm
Comment 10 dfilter service freebsd_committer 2011-08-17 16:23:12 UTC
Author: jh
Date: Wed Aug 17 15:22:58 2011
New Revision: 224945
URL: http://svn.freebsd.org/changeset/base/224945

Log:
  MFC r218411:
  
  - Use LINE_MAX from limits.h as the maximum line length instead of
    BUFSIZ. Use LINE_MAX * 2 as the buffer size (BSIZE).
  - Error out if we encounter a line longer than LINE_MAX. The previous
    behavior was to silently split long lines and produce corrupted
    output.
  
  PR:		bin/151384

Modified:
  stable/8/usr.bin/rs/rs.c
Directory Properties:
  stable/8/usr.bin/rs/   (props changed)

Modified: stable/8/usr.bin/rs/rs.c
==============================================================================
--- stable/8/usr.bin/rs/rs.c	Wed Aug 17 15:19:25 2011	(r224944)
+++ stable/8/usr.bin/rs/rs.c	Wed Aug 17 15:22:58 2011	(r224945)
@@ -52,6 +52,7 @@ __FBSDID("$FreeBSD$");
 
 #include <err.h>
 #include <ctype.h>
+#include <limits.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -332,8 +333,8 @@ prepfile(void)
 		warnx("%d is colwidths, nelem %d", colwidths[i], nelem);*/
 }
 
-#define	BSIZE	2048
-char	ibuf[BSIZE];		/* two screenfuls should do */
+#define	BSIZE	(LINE_MAX * 2)
+char	ibuf[BSIZE];
 
 int
 getline(void)	/* get line; maintain curline, curlen; manage storage */
@@ -354,7 +355,7 @@ getline(void)	/* get line; maintain curl
 			curline = ibuf;
 		}
 	}
-	if (!putlength && endblock - curline < BUFSIZ) {   /* need storage */
+	if (!putlength && endblock - curline < LINE_MAX + 1) { /* need storage */
 		/*ww = endblock-curline; tt += ww;*/
 		/*printf("#wasted %d total %d\n",ww,tt);*/
 		if (!(curline = (char *) malloc(BSIZE)))
@@ -362,11 +363,16 @@ getline(void)	/* get line; maintain curl
 		endblock = curline + BSIZE;
 		/*printf("#endb %d curline %d\n",endblock,curline);*/
 	}
-	for (p = curline, i = 1; i < BUFSIZ; *p++ = c, i++)
-		if ((c = getchar()) == EOF || c == '\n')
+	for (p = curline, i = 0;; *p++ = c, i++) {
+		if ((c = getchar()) == EOF)
 			break;
+		if (i >= LINE_MAX)
+			errx(1, "maximum line length (%d) exceeded", LINE_MAX);
+		if (c == '\n')
+			break;
+	}
 	*p = '\0';
-	curlen = i - 1;
+	curlen = i;
 	return(c);
 }
 
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 11 dfilter service freebsd_committer 2011-10-07 15:29:29 UTC
Author: jh
Date: Fri Oct  7 14:29:15 2011
New Revision: 226107
URL: http://svn.freebsd.org/changeset/base/226107

Log:
  MFC r218411:
  
  - Use LINE_MAX from limits.h as the maximum line length instead of
    BUFSIZ. Use LINE_MAX * 2 as the buffer size (BSIZE).
  - Error out if we encounter a line longer than LINE_MAX. The previous
    behavior was to silently split long lines and produce corrupted
    output.
  
  PR:		bin/151384

Modified:
  stable/7/usr.bin/rs/rs.c
Directory Properties:
  stable/7/usr.bin/rs/   (props changed)

Modified: stable/7/usr.bin/rs/rs.c
==============================================================================
--- stable/7/usr.bin/rs/rs.c	Fri Oct  7 14:27:20 2011	(r226106)
+++ stable/7/usr.bin/rs/rs.c	Fri Oct  7 14:29:15 2011	(r226107)
@@ -52,6 +52,7 @@ __FBSDID("$FreeBSD$");
 
 #include <err.h>
 #include <ctype.h>
+#include <limits.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -333,8 +334,8 @@ prepfile(void)
 		warnx("%d is colwidths, nelem %d", colwidths[i], nelem);*/
 }
 
-#define	BSIZE	2048
-char	ibuf[BSIZE];		/* two screenfuls should do */
+#define	BSIZE	(LINE_MAX * 2)
+char	ibuf[BSIZE];
 
 int
 getline(void)	/* get line; maintain curline, curlen; manage storage */
@@ -355,7 +356,7 @@ getline(void)	/* get line; maintain curl
 			curline = ibuf;
 		}
 	}
-	if (!putlength && endblock - curline < BUFSIZ) {   /* need storage */
+	if (!putlength && endblock - curline < LINE_MAX + 1) { /* need storage */
 		/*ww = endblock-curline; tt += ww;*/
 		/*printf("#wasted %d total %d\n",ww,tt);*/
 		if (!(curline = (char *) malloc(BSIZE)))
@@ -363,11 +364,16 @@ getline(void)	/* get line; maintain curl
 		endblock = curline + BSIZE;
 		/*printf("#endb %d curline %d\n",endblock,curline);*/
 	}
-	for (p = curline, i = 1; i < BUFSIZ; *p++ = c, i++)
-		if ((c = getchar()) == EOF || c == '\n')
+	for (p = curline, i = 0;; *p++ = c, i++) {
+		if ((c = getchar()) == EOF)
 			break;
+		if (i >= LINE_MAX)
+			errx(1, "maximum line length (%d) exceeded", LINE_MAX);
+		if (c == '\n')
+			break;
+	}
 	*p = '\0';
-	curlen = i - 1;
+	curlen = i;
 	return(c);
 }
 
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 12 Jaakko Heinonen freebsd_committer 2011-10-07 15:35:51 UTC
State Changed
From-To: patched->closed

Fixed in head, stable/8 and stable/7.