Bug 219024

Summary: csplit leaves last matching pattern at end of file N-1 as well as in file N
Product: Base System Reporter: J.R. Oldroyd <fbsd>
Component: binAssignee: Conrad Meyer <cem>
Status: Closed FIXED    
Severity: Affects Many People CC: cem, jilles, scott
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
[patch] csplit.c none

Description J.R. Oldroyd 2017-05-02 23:48:23 UTC
Created attachment 182255 [details]
[patch] csplit.c

When using:
    csplit -k - '/patt/' '{count}'
with an input containing less than count lines matching patt, a logic error causes the final pattern that matches to be left both at the start of file N and also at the end of file N-1.  This is due to the fact that when there are less than count matches, csplit will error when looking for another match.  In that error condition, csplit simply prints a message and exits, rather than calling its toomuch() function which is what truncates the previous output file.

Repeat by:
    echo "one\ntwo\nxxx 1\nthree\nfour\nxxx 2\nfive\nsix" | csplit -k - '/xxx/' '{10}'

Expected output:
xx00:
    one
    two

xx01:
    xxx 1
    three
    four

xx02:
    xxx 2
    five
    six

Actual output:
xx01:
    one
    two

xx02:
    xxx 1
    three
    four
    xxx 2

xx03:
    xxx 2
    five
    six

The "xxx 2" at the end of file xx02 is incorrect.  The attached patch fixes this by making sure that toomuch() is called in the error case when a third match is not found.
Comment 1 J.R. Oldroyd 2017-05-02 23:50:54 UTC
Sorry, I messed up the filename numbers in the above example.  The actual filenames are as they should be: xx00, xx01, xx02 - the error line is left at the end of file xx01.
Comment 2 J.R. Oldroyd 2017-05-08 14:41:23 UTC
Comments from Jilles Tjoelker, originally posted in PR#213510:

I would like to see some automated tests before committing new bug fixes to csplit. See usr.bin/tail/tests/ for an example of tests using kyua/atf in the modern way (some other tests were originally written for a different framework and should not be used as a reference). You may need to adjust mtree files and buildworld and installworld with WITH_TESTS=YES in /etc/src.conf.

... and ...

Yes, usr.bin/tail/tests did not exist at the time of 11.0-release. It is available in head and in stable/11. Alternatively, usr.bin/truncate/tests has similar tests and is available in 11.0-release.
Comment 3 J.R. Oldroyd 2017-05-08 14:47:37 UTC
It's a great idea to have automated testing of utilities, but the atf framework is new to me and, right now, not something I have time to study.  A quick look at the tests you mentioned for truncate(1) show an 11k test script.  Sorry, but figuring something like that out for csplit is just not on my ToDo list at the moment!  I just don't have time.

I'd point out that this is a very obvious error, with a one-line patch to a part of the code that is used only with those given flags.  I.e., applying this patch should not break other usage.

Hopefully someone else can find time to produce some test scripts.
Comment 4 Conrad Meyer freebsd_committer freebsd_triage 2017-05-08 15:18:57 UTC
J.R.,

Don't worry, I'll take care of it.
Comment 5 commit-hook freebsd_committer freebsd_triage 2017-05-08 15:52:22 UTC
A commit references this bug:

Author: cem
Date: Mon May  8 15:51:29 UTC 2017
New revision: 317942
URL: https://svnweb.freebsd.org/changeset/base/317942

Log:
  csplit(1): Fix extraneous output in edge case

  When the input to csplit contains fewer lines than the number of matches
  specified, extra output was mistakenly included in some output files.

  Fix the bug and add a simple ATF regression test.

  PR:		219024
  Submitted by:	J.R. Oldroyd <fbsd at opal.com>

Changes:
  head/usr.bin/csplit/Makefile
  head/usr.bin/csplit/csplit.c
  head/usr.bin/csplit/tests/
  head/usr.bin/csplit/tests/Makefile
  head/usr.bin/csplit/tests/csplit_test.sh
Comment 6 Kubilay Kocak freebsd_committer freebsd_triage 2021-07-08 01:33:17 UTC
^Triage: Apologies for noise, closing a still open duplicate, finalising metadata while im here.
Comment 7 Kubilay Kocak freebsd_committer freebsd_triage 2021-07-08 01:33:23 UTC
*** Bug 163062 has been marked as a duplicate of this bug. ***