Bug 197531

Summary: textproc/bsdgrep: segfaults using grep --color -f
Product: Ports & Packages Reporter: Jason McSweeney <loadzero.dev>
Component: Individual Port(s)Assignee: Kyle Evans <kevans>
Status: Closed FIXED    
Severity: Affects Only Me CC: emaste, kenorb, kevans
Priority: --- Flags: bugzilla: maintainer-feedback? (gabor)
Version: Latest   
Hardware: Any   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195763
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197555
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=181263
Attachments:
Description Flags
Test file which crashes with grep none

Description Jason McSweeney 2015-02-11 03:54:37 UTC
uname -a

FreeBSD vagrant-freebsd-10 10.1-RELEASE FreeBSD 10.1-RELEASE #0 r274401: Tue Nov 11 21:02:49 UTC 2014     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

Name           : bsd-grep
Version        : 20111002_1
Installed on   : Wed Feb 11 02:37:39 UTC 2015
Origin         : textproc/bsdgrep
Architecture   : freebsd:10:x86:64
Prefix         : /usr/local

On my 10.1-RELEASE machine, bsdgrep (from ports) segfaults when I use it thusly

/usr/local/bin/grep --color -f pats.lrg listing.sml 
Segmentation fault (core dumped)

With color disabled, it gives the correct output

/usr/local/bin/grep -f pats.lrg listing.sml 
192	./i860

I don't have a patch to fix it, but I have narrowed down the bug somewhat, and have included a patch to aid in debugging.

I believe this bug also affects base /usr/bin/bsdgrep, and some downstream vendors (apple).

The contents of the files in xxd format (use xxd -r to decode) (xxd from vim-lite)

xxd pats.lrg

0000000: 2e2f 6736 3538 3136 0a2e 2f69 3836 0a2e  ./g65816../i86..
0000010: 2f65 3133 3278 730a 2e2f 7073 780a 2e2f  /e132xs../psx../
0000020: 7368 320a 2e2f 7368 340a 2e2f 6d33 3737  sh2../sh4../m377
0000030: 3130 0a2e 2f6d 3638 3039 0a2e 2f7a 3830  10../m6809../z80
0000040: 0a2e 2f6e 6563 0a2e 2f74 6d73 3332 3033  ../nec../tms3203
0000050: 310a 2e2f 7636 300a 2e2f 7431 310a 2e2f  1../v60../t11../
0000060: 7a38 3030 300a 2e2f 7a31 3830 0a2e 2f6d  z8000../z180../m
0000070: 3635 3032 0a2e 2f72 7370 0a2e 2f74 6d73  6502../rsp../tms
0000080: 3939 3030 0a2e 2f6d 6970 730a 2e2f 6172  9900../mips../ar
0000090: 6d37 0a2e 2f61 7263 6f6d 7061 6374 0a2e  m7../arcompact..
00000a0: 2f64 7370 3536 6b0a 2e2f 746c 6373 3930  /dsp56k../tlcs90
00000b0: 300a 2e2f 746d 7333 3430 3130 0a2e 2f68  0../tms34010../h
00000c0: 380a 2e2f 616c 746f 320a 2e2f 706f 7765  8../alto2../powe
00000d0: 7270 630a 2e2f 6d36 3830 3030 0a2e 2f69  rpc../m68000../i
00000e0: 3338 360a 2e2f 7570 6437 3831 300a       386../upd7810.

xxd listing.sml

0000000: 3139 3209 2e2f 6938 3630 0a              192../i860.

gdb gives this backtrace (on a debug version built with make WITH_DEBUG=yes)

gdb --args ./work/grep-20111002/grep --color -f pats.lrg listing.sml

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
(gdb) r
Starting program: /usr/ports/textproc/bsdgrep/work/grep-20111002/grep --color -f pats.lrg listing.sml

Program received signal SIGSEGV, Segmentation fault.
0x000000000040d893 in fastcmp (fg=0x801c1f9c0, data=0x802c57000, type=STR_BYTE)
    at /usr/ports/textproc/bsdgrep/work/grep-20111002/regex/tre-fastmatch.c:1031
1031		  if (fg->icase ? (tolower(pat_byte[i]) == tolower(str_byte[i]))
Current language:  auto; currently minimal
(gdb) bt
#0  0x000000000040d893 in fastcmp (fg=0x801c1f9c0, data=0x802c57000, type=STR_BYTE)
    at /usr/ports/textproc/bsdgrep/work/grep-20111002/regex/tre-fastmatch.c:1031
#1  0x000000000040cc5b in tre_match_fast (fg=0x801c1f9c0, data=0x802c57000, len=5, type=STR_BYTE, nmatch=1, pmatch=0x7fffffffe7c8, eflags=4)
    at /usr/ports/textproc/bsdgrep/work/grep-20111002/regex/tre-fastmatch.c:940
#2  0x0000000000406afd in tre_fastnexec (preg=0x801c1f9c0, string=0x801c57000 "192\t./i860\n", len=18446744073709551615, nmatch=1, 
    pmatch=0x7fffffffe7c8, eflags=4) at /usr/ports/textproc/bsdgrep/work/grep-20111002/regex/fastmatch.c:135
#3  0x0000000000406bf2 in tre_fastexec (preg=0x801c1f9c0, string=0x801c57000 "192\t./i860\n", nmatch=1, pmatch=0x7fffffffe7c8, eflags=4)
    at /usr/ports/textproc/bsdgrep/work/grep-20111002/regex/fastmatch.c:146
#4  0x00000000004057e7 in procline (l=0x7fffffffe938, nottext=0) at util.c:287
#5  0x0000000000405455 in procfile (fn=0x7fffffffedd8 "listing.sml") at util.c:231
#6  0x0000000000404152 in main (argc=5, argv=0x7fffffffeb50) at grep.c:719

valgrind shows this

valgrind --db-attach=yes --track-origins=yes ./work/grep-20111002/grep --color -f pats.lrg listing.sml 
==56492== Memcheck, a memory error detector
==56492== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==56492== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==56492== Command: ./work/grep-20111002/grep --color -f pats.lrg listing.sml
==56492== 
==56492== Conditional jump or move depends on uninitialised value(s)
==56492==    at 0x40D899: fastcmp (tre-fastmatch.c:1031)
==56492==    by 0x40CC5A: tre_match_fast (tre-fastmatch.c:940)
==56492==    by 0x406AFC: tre_fastnexec (fastmatch.c:135)
==56492==    by 0x406BF1: tre_fastexec (fastmatch.c:146)
==56492==    by 0x4057E6: procline (util.c:287)
==56492==    by 0x405454: procfile (util.c:231)
==56492==    by 0x404151: main (grep.c:719)
==56492==  Uninitialised value was created by a heap allocation
==56492==    at 0x10152B3: malloc (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==56492==    by 0x4056B4: grep_malloc (util.c:390)
==56492==    by 0x402CBA: grep_open (file.c:268)
==56492==    by 0x405216: procfile (util.c:193)
==56492==    by 0x404151: main (grep.c:719)
==56492== 
==56492== 

After digging around in the source, it looks like procline calls tre_fastexec multiple times under --color mode, and pushes the indices of pmatch.so and pmatch.eo past the end of input line.

The bug seems to be either somewhere in tre_fastnexec where it uses the CALL_WITH_OFFSET macro to increment rm_so and rm_eo

52       ret = fn;								\
53       for (unsigned i = 0; (!(eflags & REG_NOSUB) && (i < nmatch)); i++)\
54 	{								\
55 	  pmatch[i].rm_so += offset;					\
56 	  pmatch[i].rm_eo += offset;					\
57 	}								\
58       return ret;

or, possibly the bug is in the handling of rm_so and rm_eo inside procline after tre_fastexec is called.

Either way, this small patch for debugging purposes should help to pinpoint the problem.

--- util.c.orig	2015-02-11 03:25:00 UTC
+++ util.c
@@ -47,6 +47,7 @@ __FBSDID("$FreeBSD: user/gabor/grep/trun
 #include <string.h>
 #include <unistd.h>
 #include <wchar.h>
+#include <assert.h>
 #include <wctype.h>
 
 #include "fastmatch.h"
@@ -284,8 +285,12 @@ procline(struct str *l, int nottext)
 		/* Loop to compare with all the patterns */
 		for (i = 0; i < patterns; i++) {
 			if (fg_pattern[i].pattern)
+			{
+				assert(pmatch.rm_so <= l->len &&
+				       pmatch.rm_eo <= l->len);
 				r = fastexec(&fg_pattern[i],
 				    l->dat, 1, &pmatch, eflags);
+			}
 			else
 				r = regexec(&r_pattern[i], l->dat, 1,
 				    &pmatch, eflags);
Comment 1 Bugzilla Automation freebsd_committer freebsd_triage 2015-02-11 03:54:37 UTC
Auto-assigned to maintainer gabor@FreeBSD.org
Comment 2 Jason McSweeney 2015-02-12 03:37:44 UTC
This additional test case also trips it up - but it segfaults inside printline

echo i860 > /tmp/p
./work/grep-20111002/grep  --color -e i860 -e i86 /tmp/p

I believe it is related to this bug

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197555

(gdb) r
Starting program: /usr/ports/textproc/bsdgrep/work/grep-20111002/grep --color -e i860 -e i86 /tmp/p
i860
i860

Program received signal SIGSEGV, Segmentation fault.
0x00000008011d0a60 in memchr () from /lib/libc.so.7
(gdb) bt
#0  0x00000008011d0a60 in memchr () from /lib/libc.so.7
#1  0x00000008011d0599 in fwrite () from /lib/libc.so.7
#2  0x00000008011d043d in fwrite () from /lib/libc.so.7
#3  0x0000000000406190 in printline (line=0x7fffffffe8a8, sep=58, matches=0x7fffffffe740, m=2) at util.c:469
#4  0x0000000000405d81 in procline (l=0x7fffffffe8a8, nottext=0) at util.c:366
#5  0x0000000000405455 in procfile (fn=0x7fffffffed73 "/tmp/p") at util.c:231
#6  0x0000000000404152 in main (argc=7, argv=0x7fffffffeac0) at grep.c:719
Comment 3 kenorb 2016-04-18 15:55:36 UTC
Created attachment 169441 [details]
Test file which crashes with grep

This file crashes with: grep --color=auto -e test -e test test.info
Comment 4 kenorb 2016-04-18 15:55:55 UTC
Same is happening on OS X with grep (BSD grep) 2.5.1-FreeBSD

$ \grep --color=auto -e test -e test test.info
...
Segmentation fault: 11

Check the attached file to test.
Comment 5 kenorb 2016-04-19 09:06:05 UTC
Related blog post: http://blog.loadzero.com/blog/digging-deeper-into-the-grep-segfault/
Comment 6 Kyle Evans freebsd_committer freebsd_triage 2017-01-23 19:01:26 UTC
FWIW- The patch in this PR seems to clear up this issue as well: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195763
Comment 7 commit-hook freebsd_committer freebsd_triage 2017-04-03 23:17:32 UTC
A commit references this bug:

Author: emaste
Date: Mon Apr  3 23:16:51 UTC 2017
New revision: 316477
URL: https://svnweb.freebsd.org/changeset/base/316477

Log:
  bsdgrep: fix matching behaviour

  - Set REG_NOTBOL if we've already matched beginning of line and we're
    examining later parts

  - For each pattern we examine, apply it to the remaining bits of the
    line rather than (potentially) smaller subsets

  - Check for REG_NOSUB after we've looked at all patterns initially
    matching the line

  - Keep track of the last match we made to later determine if we're
    simply not matching any longer or if we need to proceed another byte
    because we hit a zero-length match

  - Match the earliest and longest bit of each line before moving the
    beginning of what we match to further in the line, past the end of the
    longest match; this generally matches how gnugrep(1) seems to behave,
    and seems like pretty good behavior to me

  - Finally, bail out of printing any matches if we were set to print all
    (empty pattern) but -o (output matches) was set

  PR:		195763, 180990, 197555, 197531, 181263, 209116
  Submitted by:	"Kyle Evans" <kevans91@ksu.edu>
  Reviewed by:	cem
  MFC after:	1 month
  Relnotes:	Yes
  Differential Revision:	https://reviews.freebsd.org/D10104

Changes:
  head/usr.bin/grep/util.c
Comment 8 commit-hook freebsd_committer freebsd_triage 2017-04-05 18:42:13 UTC
A commit references this bug:

Author: emaste
Date: Wed Apr  5 18:41:47 UTC 2017
New revision: 316536
URL: https://svnweb.freebsd.org/changeset/base/316536

Log:
  bsdgrep: create additional tests for coverage on recent fixes

  Create additional tests to cover regressions that were discovered by
  PRs linked to reviews D10098, D10102, and D10104.

  It is worth noting that neither bsdgrep(1) nor gnugrep(1) in the base
  system currently pass all of these tests, and gnugrep(1) not quite being
  up to snuff was also noted in at least one of the PRs.

  PR:		175314 202022 195763 180990 197555 197531 181263 209116
  Submitted by:	Kyle Evans <kevans91@ksu.edu>
  Reviewed by:	cem, ngie, emaste
  MFC after:	1 month
  Differential Revision:	https://reviews.freebsd.org/D10112

Changes:
  head/contrib/netbsd-tests/usr.bin/grep/d_color_a.in
  head/contrib/netbsd-tests/usr.bin/grep/d_color_a.out
  head/contrib/netbsd-tests/usr.bin/grep/d_color_b.in
  head/contrib/netbsd-tests/usr.bin/grep/d_color_b.out
  head/contrib/netbsd-tests/usr.bin/grep/d_color_c.out
  head/contrib/netbsd-tests/usr.bin/grep/d_escmap.in
  head/contrib/netbsd-tests/usr.bin/grep/d_f_file_empty.in
  head/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_a.in
  head/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_a.out
  head/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_b.in
  head/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_b.out
  head/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_c.in
  head/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_c.out
  head/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_d.in
  head/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_e.in
  head/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_e.out
  head/contrib/netbsd-tests/usr.bin/grep/t_grep.sh
  head/usr.bin/grep/tests/Makefile
Comment 9 commit-hook freebsd_committer freebsd_triage 2017-08-16 00:12:38 UTC
A commit references this bug:

Author: kevans
Date: Wed Aug 16 00:12:25 UTC 2017
New revision: 322555
URL: https://svnweb.freebsd.org/changeset/base/322555

Log:
  bsdgrep: Fix matching behavior and add regression tests

  MFC r316477: bsdgrep: fix matching behaviour

  - Set REG_NOTBOL if we've already matched beginning of line and we're
    examining later parts

  - For each pattern we examine, apply it to the remaining bits of the
    line rather than (potentially) smaller subsets

  - Check for REG_NOSUB after we've looked at all patterns initially
    matching the line

  - Keep track of the last match we made to later determine if we're
    simply not matching any longer or if we need to proceed another byte
    because we hit a zero-length match

  - Match the earliest and longest bit of each line before moving the
    beginning of what we match to further in the line, past the end of the
    longest match; this generally matches how gnugrep(1) seems to behave,
    and seems like pretty good behavior to me

  - Finally, bail out of printing any matches if we were set to print all
    (empty pattern) but -o (output matches) was set

  MFC r316489: bsdgrep: Initialize vars to avoid a false positive GCC warning

  MFC r316491: bsdgrep: revert color changes from r316477

  r316477 changed the color output to match exactly the in-tree GNU grep,
  but introduces unnecessary escape sequences.

  MFC r316536: bsdgrep: create additional tests for coverage on recent fixes

  Create additional tests to cover regressions that were discovered by
  PRs linked to reviews D10098, D10102, and D10104.

  It is worth noting that neither bsdgrep(1) nor gnugrep(1) in the base
  system currently pass all of these tests, and gnugrep(1) not quite being
  up to snuff was also noted in at least one of the PRs.

  MFC r317052: bsdgrep: fix zero-length matches without the -o flag

  r316477 broke zero-length matches when not using the -o flag, by
  skipping over them entirely.

  Add a regression test so that it doesn't break again in the future.

  PR:		175314, 180990, 181263, 195763, 197531, 197555, 202022, 209116
  Approved by:	emaste (mentor, blanket MFC)
  Relnotes:	yes

Changes:
_U  stable/11/
  stable/11/contrib/netbsd-tests/usr.bin/grep/d_color_a.in
  stable/11/contrib/netbsd-tests/usr.bin/grep/d_color_a.out
  stable/11/contrib/netbsd-tests/usr.bin/grep/d_color_b.in
  stable/11/contrib/netbsd-tests/usr.bin/grep/d_color_b.out
  stable/11/contrib/netbsd-tests/usr.bin/grep/d_color_c.out
  stable/11/contrib/netbsd-tests/usr.bin/grep/d_escmap.in
  stable/11/contrib/netbsd-tests/usr.bin/grep/d_f_file_empty.in
  stable/11/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_a.in
  stable/11/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_a.out
  stable/11/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_b.in
  stable/11/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_b.out
  stable/11/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_c.in
  stable/11/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_c.out
  stable/11/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_d.in
  stable/11/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_e.in
  stable/11/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_e.out
  stable/11/contrib/netbsd-tests/usr.bin/grep/t_grep.sh
  stable/11/usr.bin/grep/tests/Makefile
  stable/11/usr.bin/grep/util.c
Comment 10 commit-hook freebsd_committer freebsd_triage 2018-04-19 18:23:33 UTC
A commit references this bug:

Author: kevans
Date: Thu Apr 19 18:22:47 UTC 2018
New revision: 467795
URL: https://svnweb.freebsd.org/changeset/ports/467795

Log:
  [PATCH] textproc/bsdgrep: Update to 2.6.0 and set LICENSE

  The following list of changes are roughly what has occurred since the last
  update:

  Features:

   - With bsdgrep -r, the working directory is implied if no directory is
     specified
   - bsdgrep will now behave as bsdgrep -r does when it's named rgrep
   - bsdgrep now understands -z/--null-data to use \0 as EOL
   - GNU regex compatibility is now indicated with a "GNU compatible" in
     the version string

  Fixes:

   - --mmap no longer hangs when coming across an EOF without an
     accompanying EOL
   - -o/--color matching generally improved, now produces earliest /
     longest matches
   - Context output now more closely aligns with GNU grep
   - Zero-length matches no longer exhibit broken behavior
   - Every output line now honors -b/-H/-n flags

  Other fixes are also present.

  While here, move to the version as reported by `grep -V` -- I intend to bump
  the version in base sensible as bugfixes/features roll in.

  Tested with:	Poudriere (amd64, head)
  Tested with:	Poudriere (amd64, 11.1)
  Tested with:	Poudriere (amd64, 10.4)
  Glanced at by:	Portlint -AC
  PR:		227639, 197531
  Approved by:	portmgr (feld)

Changes:
  head/textproc/bsdgrep/Makefile
  head/textproc/bsdgrep/distinfo
  head/textproc/bsdgrep/files/patch-Makefile
  head/textproc/bsdgrep/pkg-plist
Comment 11 Kyle Evans freebsd_committer freebsd_triage 2018-04-19 18:25:21 UTC
This has now been addressed in both ports and base, so I am closing it. Unfortunately I don't think this can be MFH'd to 2018Q2, but it will certainly appear in 2018Q3.