| Summary: | split(1) man page implies that input file is removed. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Documentation | Reporter: | Gary W. Swearingen <swear> | ||||
| Component: | Books & Articles | Assignee: | Giorgos Keramidas <keramida> | ||||
| Status: | Closed FIXED | ||||||
| Severity: | Affects Only Me | ||||||
| Priority: | Normal | ||||||
| Version: | Latest | ||||||
| Hardware: | Any | ||||||
| OS: | Any | ||||||
| Attachments: |
|
||||||
On Sun, Jan 13, 2002 at 05:01:54PM -0800, Gary W. Swearingen wrote: > > 1) The split(1) man page can leave some readers wondering whether the > input file itself (or a copy of it) is split into the output files. > That is, does "split" remove the input file? (After splitting a log, > the whole log is gone.) > > 2) The synopsis line implies that the -b, -l, and -p options may be > used together and the -p option's description only partially clarifies > the fact that only one of those options may be used successfully. > > 3) The page's use of "file" for a filename and "name" for a prefix is > not as clear as it could be. > [...] > >Fix: > > 1) Change wording. > 2) Change first SYNOPSIS line from > split [-b byte_count[k|m]] [-l line_count] [-p pattern] > to > split [-b byte_count[k|m] | -l line_count | -p pattern] > 3) Change words. > > NOTE: I've put a one-line patch for split.c at the end to make "usage" > match "SYNOPSIS". If nobody wants to use it now, I can write a PR for > it later. > > --- /tmp/split..orig.1 Sun Jan 13 13:59:09 2002 > +++ /tmp/split.1 Sun Jan 13 15:11:33 2002 > @@ -40,17 +40,21 @@ > .Nd split a file into pieces > .Sh SYNOPSIS > .Nm > -.Op Fl b Ar byte_count[k|m] > -.Op Fl l Ar line_count > -.Op Fl p Ar pattern > -.Op Ar file Op Ar name > +.Op Fl b Ar byte_count[k|m] | Fl l Ar line_count | Fl p Ar pattern > +.Op Ar filename Op Ar prefix > I don't like changing "file" to "filename", because "file" is a standard value that's output if you don't give .Ar any arguments. > .Sh DESCRIPTION > The > .Nm > -utility reads the given > -.Ar file > +utility reads file > +.Ar filename > (or standard input if no file is specified) > -and breaks it up into files of 1000 lines each. > +and breaks it up into files of 1000 lines > +(or an optionally specified size) each, leaving file > +.Ar filename > +unchanged. > or an optionally specified pattern (-p). This IMO unnecessarily duplicates options description. > +No padding is added, so the last new file is normally smaller than the > +others and proper catenation of the output files creates a copy of the > +unsplit original. > This clause is not true for the -p case, which is not size-constrained. I'd be happy to commit this patch instead, if you like (based on your version): Index: split.1 =================================================================== RCS file: /home/ncvs/src/usr.bin/split/split.1,v retrieving revision 1.6 diff -u -p -r1.6 split.1 --- split.1 2001/07/15 08:01:34 1.6 +++ split.1 2002/01/14 09:41:17 @@ -40,48 +40,44 @@ .Nd split a file into pieces .Sh SYNOPSIS .Nm -.Op Fl b Ar byte_count[k|m] -.Op Fl l Ar line_count -.Op Fl p Ar pattern -.Op Ar file Op Ar name +.Op Fl b Ar byte_count Ns Oo Cm k Ns | Ns Cm m Oc | Fl l Ar line_count | Fl p Ar pattern +.Op Ar file Op Ar prefix .Sh DESCRIPTION The .Nm utility reads the given .Ar file (or standard input if no file is specified) -and breaks it up into files of 1000 lines each. +and breaks it up into files of 1000 lines each +(if no options are specified), leaving the +.Ar file +unchanged. .Pp The options are as follows: -.Bl -tag -width Ds -.It Fl b +.Bl -tag -width indent +.It Fl b Ar byte_count Ns Op Cm k Ns | Ns Cm m Create smaller files .Ar byte_count bytes in length. If -.Dq Li k +.Cm k is appended to the number, the file is split into .Ar byte_count kilobyte pieces. If -.Dq Li m +.Cm m is appended to the number, the file is split into .Ar byte_count megabyte pieces. -.It Fl l +.It Fl l Ar line_count Create smaller files -.Ar n +.Ar line_count lines in length. .It Fl p Ar pattern The file is split whenever an input line matches .Ar pattern , which is interpreted as an extended regular expression. The matching line will be the first line of the next output file. -This option is incompatible with the -.Fl b -and -.Fl l -options. .El .Pp If additional arguments are specified, the first is used as the name @@ -90,16 +86,16 @@ If a second additional argument is speci for the names of the files into which the file is split. In this case, each file into which the file is split is named by the prefix followed by a lexically ordered suffix in the range of -.Dq Li aa-zz . +.Dq Li aa Ns - Ns Li zz . .Pp If the -.Ar name +.Ar prefix argument is not specified, the file is split into lexically ordered files named in the range of .Dq Li xaa-zzz . .Sh BUGS For historical reasons, if you specify -.Ar name , +.Ar prefix , .Nm can only create 676 separate files. Index: split.c =================================================================== RCS file: /home/ncvs/src/usr.bin/split/split.c,v retrieving revision 1.8 diff -u -p -r1.8 split.c --- split.c 2001/12/12 23:09:07 1.8 +++ split.c 2002/01/14 09:41:18 @@ -116,11 +116,6 @@ main(argc, argv) else if (*ep == 'm') bytecnt *= 1048576; break; - case 'p' : /* pattern matching. */ - if (regcomp(&rgx, optarg, REG_EXTENDED|REG_NOSUB) != 0) - errx(EX_USAGE, "%s: illegal regexp", optarg); - pflag = 1; - break; case 'l': /* Line count. */ if (numlines != 0) usage(); @@ -128,6 +123,11 @@ main(argc, argv) errx(EX_USAGE, "%s: illegal line count", optarg); break; + case 'p' : /* Pattern matching. */ + if (regcomp(&rgx, optarg, REG_EXTENDED|REG_NOSUB) != 0) + errx(EX_USAGE, "%s: illegal regexp", optarg); + pflag = 1; + break; default: usage(); } @@ -311,6 +311,6 @@ static void usage() { (void)fprintf(stderr, -"usage: split [-b byte_count] [-l line_count] [-p pattern] [file [prefix]]\n"); +"usage: split [-b byte_count | -l line_count | -p pattern] [file [prefix]]\n"); exit(EX_USAGE); } -- Ruslan Ermilov Oracle Developer/DBA, ru@sunbay.com Sunbay Software AG, ru@FreeBSD.org FreeBSD committer, +380.652.512.251 Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age Ruslan Ermilov <ru@FreeBSD.org> writes: > I don't like changing "file" to "filename", because "file" is a > standard value that's output if you don't give .Ar any arguments. Bad conventions are better (in this case) than none. :-) > +and breaks it up into files of 1000 lines each > +(if no options are specified), leaving the > +.Ar file > +unchanged. That should be either "leaving the file _file_ unchanged" or "leaving _file_ unchanged" because "the _file_" refers to a filename or option argument which nobody will be concerned might change. > -.Bl -tag -width Ds > -.It Fl b > +.Bl -tag -width indent > +.It Fl b Ar byte_count Ns Op Cm k Ns | Ns Cm m I think it's better to leave option arguments out of the option description labels and leave them in the synopsis (at least for small man pages where the synopsis is easily viewed). It should result in fewer man page bugs. When an option has several forms of arguments or is otherwise complex, it is probably best buried in the description and still not in the description label. But I've noticed it both ways. >> +No padding is added, so the last new file is normally smaller than the >> +others and proper catenation of the output files creates a copy of the >> +unsplit original. I'm glad you caught my -p (non-fixed-size chunks) oversight, but I wonder if you would replace my sentence above with: +No padding is added, so the proper catenation of the output files +creates a copy of the unsplit original. It could be at the end of the DESCRIPTION first paragraph, or as a new last paragraph of the DESCRIPTION. (I thought it best to omit shell interaction by mentioning "cat prefix* >copy-of-original".) Users shouldn't have to experiment to determine that padding is not performed, especially since the first paragraph of the DESCRIPTION will imply that it does pad out to 1000 lines (by default). > -"usage: split [-b byte_count] [-l line_count] [-p pattern] [file [prefix]]\n"); > +"usage: split [-b byte_count | -l line_count | -p pattern] [file [prefix]]\n"); You might want to break that into two shorter lines. I wasn't 100% sure how to do it. What's the FreeBSD standard limit? Hi Ruslan, Any change we can commit the last patch of this PR? http://www.freebsd.org/cgi/query-pr.cgi?pr=docs/33852 IMHO, it looks ok, but I don't think we can expect Gary to review it any time soon now... On Tue, May 16, 2006 at 06:05:33PM +0300, Giorgos Keramidas wrote: > Hi Ruslan, > Any change we can commit the last patch of this PR? > > http://www.freebsd.org/cgi/query-pr.cgi?pr=docs/33852 > > IMHO, it looks ok, but I don't think we can expect Gary to review > it any time soon now... > If you have time for this, go ahead and borrow the text from POSIX. I think it should fix all the issues that are mentioned in the PR (except adding the -p option). I mean, a good SYNOPSIS in my opinion would look like this: SYNOPSIS split [-l line_count] [-a suffix_length] [file [name]] split -b byte_count[k|m] [-a suffix_length] [file [name]] split -p pattern [-a suffix_length] [file [name]] Feel free to also borrow any changes in option and argument names, and any descriptional text if it makes it look better. Just make sure the SYNOPSIS and usage() stay in sync. Cheers, -- Ruslan Ermilov ru@FreeBSD.org FreeBSD committer State Changed From-To: open->patched I've adapted Ruslan's patch to the current state of CURRENT and committed it. Responsible Changed From-To: freebsd-doc->keramida MFC reminder. keramida 2008-01-26 11:37:54 UTC
FreeBSD src repository (doc committer)
Modified files: (Branch: RELENG_6)
usr.bin/split split.1
Log:
MFC: 1.19
Update usage & SYNOPSIS and clarify that input files are not removed.
Sort getopt option handling of -p too, while here.
The changes are adapted from a patch by Ruslan Ermilov, posted as
followup to docs/33852.
PR: docs/33852
Submitted by: Gary W. Swearingen <swear@blarg.net>
Revision Changes Path
1.15.2.1 +22 -10 src/usr.bin/split/split.1
_______________________________________________
cvs-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
State Changed From-To: patched->closed Merged to RELENG_6 too, as revision 1.15.2.1 |
1) The split(1) man page can leave some readers wondering whether the input file itself (or a copy of it) is split into the output files. That is, does "split" remove the input file? (After splitting a log, the whole log is gone.) 2) The synopsis line implies that the -b, -l, and -p options may be used together and the -p option's description only partially clarifies the fact that only one of those options may be used successfully. 3) The page's use of "file" for a filename and "name" for a prefix is not as clear as it could be. ================ Fix: 1) Change wording. 2) Change first SYNOPSIS line from split [-b byte_count[k|m]] [-l line_count] [-p pattern] to split [-b byte_count[k|m] | -l line_count | -p pattern] 3) Change words. NOTE: I've put a one-line patch for split.c at the end to make "usage" match "SYNOPSIS". If nobody wants to use it now, I can write a PR for it later. How-To-Repeat: n/a ================