Bug 33852 - split(1) man page implies that input file is removed.
Summary: split(1) man page implies that input file is removed.
Status: Closed FIXED
Alias: None
Product: Documentation
Classification: Unclassified
Component: Books & Articles (show other bugs)
Version: Latest
Hardware: Any Any
: Normal Affects Only Me
Assignee: Giorgos Keramidas
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-01-14 01:00 UTC by Gary W. Swearingen
Modified: 2008-01-26 11:40 UTC (History)
0 users

See Also:


Attachments
file.diff (1.79 KB, patch)
2002-01-14 01:00 UTC, Gary W. Swearingen
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Gary W. Swearingen 2002-01-14 01:00:04 UTC
1) The split(1) man page can leave some readers wondering whether the
input file itself (or a copy of it) is split into the output files.
That is, does "split" remove the input file? (After splitting a log,
the whole log is gone.)

2) The synopsis line implies that the -b, -l, and -p options may be
used together and the -p option's description only partially clarifies
the fact that only one of those options may be used successfully.

3) The page's use of "file" for a filename and "name" for a prefix is
not as clear as it could be.
================

Fix: 1) Change wording.
2) Change first SYNOPSIS line from
    split [-b byte_count[k|m]] [-l line_count] [-p pattern]
to
    split [-b byte_count[k|m] | -l line_count | -p pattern]
3) Change words.

NOTE: I've put a one-line patch for split.c at the end to make "usage"
match "SYNOPSIS".  If nobody wants to use it now, I can write a PR for
it later.
How-To-Repeat: n/a
================
Comment 1 ru freebsd_committer freebsd_triage 2002-01-14 09:44:01 UTC
On Sun, Jan 13, 2002 at 05:01:54PM -0800, Gary W. Swearingen wrote:
> 
> 1) The split(1) man page can leave some readers wondering whether the
> input file itself (or a copy of it) is split into the output files.
> That is, does "split" remove the input file? (After splitting a log,
> the whole log is gone.)
> 
> 2) The synopsis line implies that the -b, -l, and -p options may be
> used together and the -p option's description only partially clarifies
> the fact that only one of those options may be used successfully.
> 
> 3) The page's use of "file" for a filename and "name" for a prefix is
> not as clear as it could be.
> 
[...]

> >Fix:
> 
> 1) Change wording.
> 2) Change first SYNOPSIS line from
>     split [-b byte_count[k|m]] [-l line_count] [-p pattern]
> to
>     split [-b byte_count[k|m] | -l line_count | -p pattern]
> 3) Change words.
> 
> NOTE: I've put a one-line patch for split.c at the end to make "usage"
> match "SYNOPSIS".  If nobody wants to use it now, I can write a PR for
> it later.
> 
> --- /tmp/split..orig.1	Sun Jan 13 13:59:09 2002
> +++ /tmp/split.1	Sun Jan 13 15:11:33 2002
> @@ -40,17 +40,21 @@
>  .Nd split a file into pieces
>  .Sh SYNOPSIS
>  .Nm
> -.Op Fl b Ar byte_count[k|m]
> -.Op Fl l Ar line_count
> -.Op Fl p Ar pattern
> -.Op Ar file Op Ar name
> +.Op Fl b Ar byte_count[k|m] | Fl l Ar line_count | Fl p Ar pattern
> +.Op Ar filename Op Ar prefix
> 
I don't like changing "file" to "filename", because "file" is a
standard value that's output if you don't give .Ar any arguments.

>  .Sh DESCRIPTION
>  The
>  .Nm
> -utility reads the given
> -.Ar file
> +utility reads file
> +.Ar filename
>  (or standard input if no file is specified)
> -and breaks it up into files of 1000 lines each.
> +and breaks it up into files of 1000 lines
> +(or an optionally specified size) each, leaving file
> +.Ar filename
> +unchanged.
> 
or an optionally specified pattern (-p).

This IMO unnecessarily duplicates options description.

> +No padding is added, so the last new file is normally smaller than the
> +others and proper catenation of the output files creates a copy of the
> +unsplit original.
> 
This clause is not true for the -p case, which is not size-constrained.

I'd be happy to commit this patch instead, if you like (based on
your version):

Index: split.1
===================================================================
RCS file: /home/ncvs/src/usr.bin/split/split.1,v
retrieving revision 1.6
diff -u -p -r1.6 split.1
--- split.1	2001/07/15 08:01:34	1.6
+++ split.1	2002/01/14 09:41:17
@@ -40,48 +40,44 @@
 .Nd split a file into pieces
 .Sh SYNOPSIS
 .Nm
-.Op Fl b Ar byte_count[k|m]
-.Op Fl l Ar line_count
-.Op Fl p Ar pattern
-.Op Ar file Op Ar name
+.Op Fl b Ar byte_count Ns Oo Cm k Ns | Ns Cm m Oc | Fl l Ar line_count | Fl p Ar pattern
+.Op Ar file Op Ar prefix
 .Sh DESCRIPTION
 The
 .Nm
 utility reads the given
 .Ar file
 (or standard input if no file is specified)
-and breaks it up into files of 1000 lines each.
+and breaks it up into files of 1000 lines each
+(if no options are specified), leaving the
+.Ar file
+unchanged.
 .Pp
 The options are as follows:
-.Bl -tag -width Ds
-.It Fl b
+.Bl -tag -width indent
+.It Fl b Ar byte_count Ns Op Cm k Ns | Ns Cm m
 Create smaller files
 .Ar byte_count
 bytes in length.
 If
-.Dq Li k
+.Cm k
 is appended to the number, the file is split into
 .Ar byte_count
 kilobyte pieces.
 If
-.Dq Li m
+.Cm m
 is appended to the number, the file is split into
 .Ar byte_count
 megabyte pieces.
-.It Fl l
+.It Fl l Ar line_count
 Create smaller files
-.Ar n
+.Ar line_count
 lines in length.
 .It Fl p Ar pattern
 The file is split whenever an input line matches
 .Ar pattern ,
 which is interpreted as an extended regular expression.
 The matching line will be the first line of the next output file.
-This option is incompatible with the
-.Fl b
-and
-.Fl l
-options.
 .El
 .Pp
 If additional arguments are specified, the first is used as the name
@@ -90,16 +86,16 @@ If a second additional argument is speci
 for the names of the files into which the file is split.
 In this case, each file into which the file is split is named by the
 prefix followed by a lexically ordered suffix in the range of
-.Dq Li aa-zz .
+.Dq Li aa Ns - Ns Li zz .
 .Pp
 If the
-.Ar name
+.Ar prefix
 argument is not specified, the file is split into lexically ordered
 files named in the range of
 .Dq Li xaa-zzz .
 .Sh BUGS
 For historical reasons, if you specify
-.Ar name ,
+.Ar prefix ,
 .Nm
 can only create 676 separate
 files.
Index: split.c
===================================================================
RCS file: /home/ncvs/src/usr.bin/split/split.c,v
retrieving revision 1.8
diff -u -p -r1.8 split.c
--- split.c	2001/12/12 23:09:07	1.8
+++ split.c	2002/01/14 09:41:18
@@ -116,11 +116,6 @@ main(argc, argv)
 			else if (*ep == 'm')
 				bytecnt *= 1048576;
 			break;
-		case 'p' :      /* pattern matching. */
-			if (regcomp(&rgx, optarg, REG_EXTENDED|REG_NOSUB) != 0)
-				errx(EX_USAGE, "%s: illegal regexp", optarg);
-			pflag = 1;
-			break;
 		case 'l':		/* Line count. */
 			if (numlines != 0)
 				usage();
@@ -128,6 +123,11 @@ main(argc, argv)
 				errx(EX_USAGE,
 				    "%s: illegal line count", optarg);
 			break;
+		case 'p' :		/* Pattern matching. */
+			if (regcomp(&rgx, optarg, REG_EXTENDED|REG_NOSUB) != 0)
+				errx(EX_USAGE, "%s: illegal regexp", optarg);
+			pflag = 1;
+			break;
 		default:
 			usage();
 		}
@@ -311,6 +311,6 @@ static void
 usage()
 {
 	(void)fprintf(stderr,
-"usage: split [-b byte_count] [-l line_count] [-p pattern] [file [prefix]]\n");
+"usage: split [-b byte_count | -l line_count | -p pattern] [file [prefix]]\n");
 	exit(EX_USAGE);
 }


-- 
Ruslan Ermilov		Oracle Developer/DBA,
ru@sunbay.com		Sunbay Software AG,
ru@FreeBSD.org		FreeBSD committer,
+380.652.512.251	Simferopol, Ukraine

http://www.FreeBSD.org	The Power To Serve
http://www.oracle.com	Enabling The Information Age
Comment 2 Gary W. Swearingen 2002-01-14 19:29:54 UTC
Ruslan Ermilov <ru@FreeBSD.org> writes:

> I don't like changing "file" to "filename", because "file" is a
> standard value that's output if you don't give .Ar any arguments.

Bad conventions are better (in this case) than none. :-)

> +and breaks it up into files of 1000 lines each
> +(if no options are specified), leaving the
> +.Ar file
> +unchanged.

That should be either "leaving the file _file_ unchanged" or "leaving
_file_ unchanged" because "the _file_" refers to a filename or option
argument which nobody will be concerned might change.

> -.Bl -tag -width Ds
> -.It Fl b
> +.Bl -tag -width indent
> +.It Fl b Ar byte_count Ns Op Cm k Ns | Ns Cm m

I think it's better to leave option arguments out of the option
description labels and leave them in the synopsis (at least for small
man pages where the synopsis is easily viewed).  It should result in
fewer man page bugs.  When an option has several forms of arguments or
is otherwise complex, it is probably best buried in the description and
still not in the description label.   But I've noticed it both ways.

>> +No padding is added, so the last new file is normally smaller than the
>> +others and proper catenation of the output files creates a copy of the
>> +unsplit original.

I'm glad you caught my -p (non-fixed-size chunks) oversight, but I
wonder if you would replace my sentence above with:

    +No padding is added, so the proper catenation of the output files
    +creates a copy of the unsplit original.

It could be at the end of the DESCRIPTION first paragraph, or as a new
last paragraph of the DESCRIPTION.  (I thought it best to omit shell
interaction by mentioning "cat prefix* >copy-of-original".)

Users shouldn't have to experiment to determine that padding is not
performed, especially since the first paragraph of the DESCRIPTION
will imply that it does pad out to 1000 lines (by default).

> -"usage: split [-b byte_count] [-l line_count] [-p pattern] [file [prefix]]\n");
> +"usage: split [-b byte_count | -l line_count | -p pattern] [file [prefix]]\n");

You might want to break that into two shorter lines.  I wasn't 100% sure
how to do it.  What's the FreeBSD standard limit?
Comment 3 Giorgos Keramidas freebsd_committer freebsd_triage 2006-05-16 16:05:33 UTC
Hi Ruslan,
Any change we can commit the last patch of this PR?

http://www.freebsd.org/cgi/query-pr.cgi?pr=docs/33852

IMHO, it looks ok, but I don't think we can expect Gary to review
it any time soon now...
Comment 4 ru freebsd_committer freebsd_triage 2006-05-16 16:21:44 UTC
On Tue, May 16, 2006 at 06:05:33PM +0300, Giorgos Keramidas wrote:
> Hi Ruslan,
> Any change we can commit the last patch of this PR?
> 
> http://www.freebsd.org/cgi/query-pr.cgi?pr=docs/33852
> 
> IMHO, it looks ok, but I don't think we can expect Gary to review
> it any time soon now...
> 
If you have time for this, go ahead and borrow the text
from POSIX.  I think it should fix all the issues that
are mentioned in the PR (except adding the -p option).
I mean, a good SYNOPSIS in my opinion would look like
this:

SYNOPSIS
     split [-l line_count] [-a suffix_length] [file [name]]
     split -b byte_count[k|m] [-a suffix_length] [file [name]]
     split -p pattern [-a suffix_length] [file [name]]

Feel free to also borrow any changes in option and
argument names, and any descriptional text if it makes
it look better.  Just make sure the SYNOPSIS and
usage() stay in sync.


Cheers,
-- 
Ruslan Ermilov
ru@FreeBSD.org
FreeBSD committer
Comment 5 Giorgos Keramidas freebsd_committer freebsd_triage 2006-08-08 22:26:22 UTC
State Changed
From-To: open->patched

I've adapted Ruslan's patch to the current state of CURRENT 
and committed it. 


Comment 6 Giorgos Keramidas freebsd_committer freebsd_triage 2006-08-08 22:26:22 UTC
Responsible Changed
From-To: freebsd-doc->keramida

MFC reminder.
Comment 7 dfilter service freebsd_committer freebsd_triage 2008-01-26 11:37:59 UTC
keramida    2008-01-26 11:37:54 UTC

  FreeBSD src repository (doc committer)

  Modified files:        (Branch: RELENG_6)
    usr.bin/split        split.1 
  Log:
  MFC: 1.19
  
  Update usage & SYNOPSIS and clarify that input files are not removed.
  Sort getopt option handling of -p too, while here.
  
  The changes are adapted from a patch by Ruslan Ermilov, posted as
  followup to docs/33852.
  
  PR:             docs/33852
  Submitted by:   Gary W. Swearingen <swear@blarg.net>
  
  Revision  Changes    Path
  1.15.2.1  +22 -10    src/usr.bin/split/split.1
_______________________________________________
cvs-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
Comment 8 Giorgos Keramidas freebsd_committer freebsd_triage 2008-01-26 11:38:07 UTC
State Changed
From-To: patched->closed

Merged to RELENG_6 too, as revision 1.15.2.1