Bug 213510

Summary: csplit does not work any more
Product: Base System Reporter: Martin Birgmeier <d8zNeCFG>
Component: binAssignee: Jilles Tjoelker <jilles>
Status: Closed FIXED    
Severity: Affects Only Me CC: fbsd, jilles, nq1n407ba, ross
Priority: --- Keywords: regression
Version: 11.0-STABLEFlags: jilles: mfc-stable11+
jilles: mfc-stable10-
jilles: mfc-stable9-
Hardware: Any   
OS: Any   
Attachments:
Description Flags
[patch] csplit.c none

Description Martin Birgmeier 2016-10-15 18:51:30 UTC
I am generating diffs of multiple files into a single output stream which consists of multiple sections looking like this:

--- ./accessibility/jovie/Makefile.ORIG 2016-08-26 17:51:07.219458267 +0200
+++ ./accessibility/jovie/Makefile      2016-08-26 17:48:41.413457178 +0200
@@ -1,9 +1,16 @@
 # $FreeBSD: head/accessibility/jovie/Makefile 420774 2016-08-24 08:20:31Z tcberner $
...
--- ./accessibility/kaccessible/Makefile.ORIG   2016-08-26 17:51:07.268463953 +0200
+++ ./accessibility/kaccessible/Makefile        2016-08-26 17:48:41.505435723 +0200
@@ -1,9 +1,16 @@
 # $FreeBSD: head/accessibility/kaccessible/Makefile 420774 2016-08-24 08:20:31Z tcberner $
...

I now want to split this stream using csplit:

csplit -k -n 3 - '/^[-*][-*][-*] .*\.ORIG/' '{1000}'

This produces files xx000 to xx998 of size 0, and file xx999 with the following contents (one line only):

--- ./accessibility/jovie/Makefile.ORIG 2016-08-26 17:51:07.219458267 +0200

This worked correctly in 10.3.
Comment 1 Fernando Herrero CarrĂ³n 2016-11-16 16:47:52 UTC
This is only a "me too". I'm using FreeBSD 11.0-RELEASE-p2 #0: Mon Oct 24 06:55:27 UTC 2016

A simple example is not working for me:

echo "uno\ndos\nxxx\ntres\ncuatro" | csplit - '/xxx/'

I would expect two files created, one with:

uno
dos

the other one with:

tres
cuatro

Instead, I have:

xx00 which is empty
xx01:
uno

Cheers,
Fernando
Comment 2 Ross McKelvie 2016-11-21 15:07:28 UTC
I have experienced the same issue and an additional issue with csplit on 11.0-RELEASE-p3.  To replicate create a file called 'bigfile' with the contents:

BEGINS
Section One
Text for section one.

Section Two
Text for section two.

Section Three
Text for section three.

ENDS

Run csplit:
csplit bigfile '/Section/' '{1}'

Expected result:
3 files: xx00, xx01 and xx02 each containing text for each section with a trailing newline; for example xx00:
BEGINS
Section One
Text for section one.

ENDS

Actual result:
2 files: xx00 and xx01.  xx00 is empty and xx01 contains the single line "Section One".

Splitting by line number is also broken.

Run csplit:
csplit bigfile 4 7

Expected result:
3 files: xx00, xx01 and xx02 each containing text for each section with a trailing newline.

Actual result:
3 files: xx00 containing the single line "Section One", xx01 containing the single line "Text here from section one" and xx02 containing a newline character.

A potential workaround is instead using the split utility, for example:
split -p 'Section' bigfile

However, split does not offer the complexity of csplit and is not suitable as a direct substitute.
Comment 3 J.R. Oldroyd 2017-04-27 16:38:36 UTC
Same failure confirmed here.

Fortunately, "split -p patt" can be used as a work-around in my case.
Comment 4 Jilles Tjoelker freebsd_committer freebsd_triage 2017-04-30 21:59:26 UTC
I can reproduce this and I suspect it has to do with changes to the return value of fputs().
Comment 5 J.R. Oldroyd 2017-04-30 23:41:03 UTC
Created attachment 182200 [details]
[patch] csplit.c

Patch fixes the fputs() return value checks, and also fixes a logic error in truncating the last file.  Thanks, Jilles, for pointing out the change to fputs().

There was also a logic error, repeatable like this:
    csplit -k - '/patt/' '{99}'
Then, give input containing < 99 lines matching /patt/.  You'll observe that the last line containing /patt/ was written both at the start of file N and also left at the end of file N-1.  This was due to the fact that toomuch() was not being called in the error case when a match N+1 was not found.  Also fixed.
Comment 6 commit-hook freebsd_committer freebsd_triage 2017-05-02 21:56:44 UTC
A commit references this bug:

Author: jilles
Date: Tue May  2 21:56:20 UTC 2017
New revision: 317709
URL: https://svnweb.freebsd.org/changeset/base/317709

Log:
  csplit: Fix check of fputs() return value, making csplit work again.

  As of r295638, fputs() returns the number of bytes written (if not more than
  INT_MAX). This broke csplit completely, since csplit assumed only success
  only for the return value 0.

  PR:		213510
  Submitted by:	J.R. Oldroyd
  MFC after:	1 week
  Relnotes:	yes

Changes:
  head/usr.bin/csplit/csplit.c
Comment 7 Jilles Tjoelker freebsd_committer freebsd_triage 2017-05-02 22:13:33 UTC
(In reply to J.R. Oldroyd from comment #5)
I committed the fputs() return value check part, which should return csplit's functionality to what it was in 10.3. I plan to MFC this to stable/11 before release 11.1.

Thanks for the other bug fix, but I would like to see some automated tests before committing new bug fixes to csplit. See usr.bin/tail/tests/ for an example of tests using kyua/atf in the modern way (some other tests were originally written for a different framework and should not be used as a reference). You may need to adjust mtree files and buildworld and installworld with WITH_TESTS=YES in /etc/src.conf.

Regression tests for this bug can go into this bug report but other bugs should go into new bug reports or Phabricator reviews.

Another bug that could be fixed is the {LINE_MAX} limit. This should be easy using getline(3).
Comment 8 J.R. Oldroyd 2017-05-02 23:34:53 UTC
There is no src/usr.bin/tail/tests/ on my 11.0 system.  I will file a separate PR for that bug with a repeat-by and the patch.
Comment 9 Jilles Tjoelker freebsd_committer freebsd_triage 2017-05-07 19:11:16 UTC
(In reply to J.R. Oldroyd from comment #8)
Yes, usr.bin/tail/tests did not exist at the time of 11.0-release. It is available in head and in stable/11. Alternatively, usr.bin/truncate/tests has similar tests and is available in 11.0-release.
Comment 10 J.R. Oldroyd 2017-05-08 14:39:15 UTC
Please see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219024 regarding the left-over matching line at the end of file N-1.
Comment 11 commit-hook freebsd_committer freebsd_triage 2017-05-12 14:39:07 UTC
A commit references this bug:

Author: jilles
Date: Fri May 12 14:38:09 UTC 2017
New revision: 318238
URL: https://svnweb.freebsd.org/changeset/base/318238

Log:
  MFC r317709: csplit: Fix check of fputs() return value, making csplit work
  again.

  As of r295638, fputs() returns the number of bytes written (if not more than
  INT_MAX). This broke csplit completely, since csplit assumed success only
  for the return value 0.

  PR:		213510
  Relnotes:	yes

Changes:
_U  stable/11/
  stable/11/usr.bin/csplit/csplit.c
Comment 12 Jilles Tjoelker freebsd_committer freebsd_triage 2017-05-12 14:41:16 UTC
Fixed in head and stable/11, thanks!