Bug 197531 - textproc/bsdgrep: segfaults using grep --color -f
Summary: textproc/bsdgrep: segfaults using grep --color -f
Status: In Progress
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: Gabor Kovesdan
Depends on:
Reported: 2015-02-11 03:54 UTC by Jason McSweeney
Modified: 2017-04-17 14:47 UTC (History)
3 users (show)

See Also:
bugzilla: maintainer-feedback? (gabor)

Test file which crashes with grep (5.59 KB, text/plain)
2016-04-18 15:55 UTC, kenorb
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jason McSweeney 2015-02-11 03:54:37 UTC
uname -a

FreeBSD vagrant-freebsd-10 10.1-RELEASE FreeBSD 10.1-RELEASE #0 r274401: Tue Nov 11 21:02:49 UTC 2014     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

Name           : bsd-grep
Version        : 20111002_1
Installed on   : Wed Feb 11 02:37:39 UTC 2015
Origin         : textproc/bsdgrep
Architecture   : freebsd:10:x86:64
Prefix         : /usr/local

On my 10.1-RELEASE machine, bsdgrep (from ports) segfaults when I use it thusly

/usr/local/bin/grep --color -f pats.lrg listing.sml 
Segmentation fault (core dumped)

With color disabled, it gives the correct output

/usr/local/bin/grep -f pats.lrg listing.sml 
192	./i860

I don't have a patch to fix it, but I have narrowed down the bug somewhat, and have included a patch to aid in debugging.

I believe this bug also affects base /usr/bin/bsdgrep, and some downstream vendors (apple).

The contents of the files in xxd format (use xxd -r to decode) (xxd from vim-lite)

xxd pats.lrg

0000000: 2e2f 6736 3538 3136 0a2e 2f69 3836 0a2e  ./g65816../i86..
0000010: 2f65 3133 3278 730a 2e2f 7073 780a 2e2f  /e132xs../psx../
0000020: 7368 320a 2e2f 7368 340a 2e2f 6d33 3737  sh2../sh4../m377
0000030: 3130 0a2e 2f6d 3638 3039 0a2e 2f7a 3830  10../m6809../z80
0000040: 0a2e 2f6e 6563 0a2e 2f74 6d73 3332 3033  ../nec../tms3203
0000050: 310a 2e2f 7636 300a 2e2f 7431 310a 2e2f  1../v60../t11../
0000060: 7a38 3030 300a 2e2f 7a31 3830 0a2e 2f6d  z8000../z180../m
0000070: 3635 3032 0a2e 2f72 7370 0a2e 2f74 6d73  6502../rsp../tms
0000080: 3939 3030 0a2e 2f6d 6970 730a 2e2f 6172  9900../mips../ar
0000090: 6d37 0a2e 2f61 7263 6f6d 7061 6374 0a2e  m7../arcompact..
00000a0: 2f64 7370 3536 6b0a 2e2f 746c 6373 3930  /dsp56k../tlcs90
00000b0: 300a 2e2f 746d 7333 3430 3130 0a2e 2f68  0../tms34010../h
00000c0: 380a 2e2f 616c 746f 320a 2e2f 706f 7765  8../alto2../powe
00000d0: 7270 630a 2e2f 6d36 3830 3030 0a2e 2f69  rpc../m68000../i
00000e0: 3338 360a 2e2f 7570 6437 3831 300a       386../upd7810.

xxd listing.sml

0000000: 3139 3209 2e2f 6938 3630 0a              192../i860.

gdb gives this backtrace (on a debug version built with make WITH_DEBUG=yes)

gdb --args ./work/grep-20111002/grep --color -f pats.lrg listing.sml

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
(gdb) r
Starting program: /usr/ports/textproc/bsdgrep/work/grep-20111002/grep --color -f pats.lrg listing.sml

Program received signal SIGSEGV, Segmentation fault.
0x000000000040d893 in fastcmp (fg=0x801c1f9c0, data=0x802c57000, type=STR_BYTE)
    at /usr/ports/textproc/bsdgrep/work/grep-20111002/regex/tre-fastmatch.c:1031
1031		  if (fg->icase ? (tolower(pat_byte[i]) == tolower(str_byte[i]))
Current language:  auto; currently minimal
(gdb) bt
#0  0x000000000040d893 in fastcmp (fg=0x801c1f9c0, data=0x802c57000, type=STR_BYTE)
    at /usr/ports/textproc/bsdgrep/work/grep-20111002/regex/tre-fastmatch.c:1031
#1  0x000000000040cc5b in tre_match_fast (fg=0x801c1f9c0, data=0x802c57000, len=5, type=STR_BYTE, nmatch=1, pmatch=0x7fffffffe7c8, eflags=4)
    at /usr/ports/textproc/bsdgrep/work/grep-20111002/regex/tre-fastmatch.c:940
#2  0x0000000000406afd in tre_fastnexec (preg=0x801c1f9c0, string=0x801c57000 "192\t./i860\n", len=18446744073709551615, nmatch=1, 
    pmatch=0x7fffffffe7c8, eflags=4) at /usr/ports/textproc/bsdgrep/work/grep-20111002/regex/fastmatch.c:135
#3  0x0000000000406bf2 in tre_fastexec (preg=0x801c1f9c0, string=0x801c57000 "192\t./i860\n", nmatch=1, pmatch=0x7fffffffe7c8, eflags=4)
    at /usr/ports/textproc/bsdgrep/work/grep-20111002/regex/fastmatch.c:146
#4  0x00000000004057e7 in procline (l=0x7fffffffe938, nottext=0) at util.c:287
#5  0x0000000000405455 in procfile (fn=0x7fffffffedd8 "listing.sml") at util.c:231
#6  0x0000000000404152 in main (argc=5, argv=0x7fffffffeb50) at grep.c:719

valgrind shows this

valgrind --db-attach=yes --track-origins=yes ./work/grep-20111002/grep --color -f pats.lrg listing.sml 
==56492== Memcheck, a memory error detector
==56492== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==56492== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==56492== Command: ./work/grep-20111002/grep --color -f pats.lrg listing.sml
==56492== Conditional jump or move depends on uninitialised value(s)
==56492==    at 0x40D899: fastcmp (tre-fastmatch.c:1031)
==56492==    by 0x40CC5A: tre_match_fast (tre-fastmatch.c:940)
==56492==    by 0x406AFC: tre_fastnexec (fastmatch.c:135)
==56492==    by 0x406BF1: tre_fastexec (fastmatch.c:146)
==56492==    by 0x4057E6: procline (util.c:287)
==56492==    by 0x405454: procfile (util.c:231)
==56492==    by 0x404151: main (grep.c:719)
==56492==  Uninitialised value was created by a heap allocation
==56492==    at 0x10152B3: malloc (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==56492==    by 0x4056B4: grep_malloc (util.c:390)
==56492==    by 0x402CBA: grep_open (file.c:268)
==56492==    by 0x405216: procfile (util.c:193)
==56492==    by 0x404151: main (grep.c:719)

After digging around in the source, it looks like procline calls tre_fastexec multiple times under --color mode, and pushes the indices of pmatch.so and pmatch.eo past the end of input line.

The bug seems to be either somewhere in tre_fastnexec where it uses the CALL_WITH_OFFSET macro to increment rm_so and rm_eo

52       ret = fn;								\
53       for (unsigned i = 0; (!(eflags & REG_NOSUB) && (i < nmatch)); i++)\
54 	{								\
55 	  pmatch[i].rm_so += offset;					\
56 	  pmatch[i].rm_eo += offset;					\
57 	}								\
58       return ret;

or, possibly the bug is in the handling of rm_so and rm_eo inside procline after tre_fastexec is called.

Either way, this small patch for debugging purposes should help to pinpoint the problem.

--- util.c.orig	2015-02-11 03:25:00 UTC
+++ util.c
@@ -47,6 +47,7 @@ __FBSDID("$FreeBSD: user/gabor/grep/trun
 #include <string.h>
 #include <unistd.h>
 #include <wchar.h>
+#include <assert.h>
 #include <wctype.h>
 #include "fastmatch.h"
@@ -284,8 +285,12 @@ procline(struct str *l, int nottext)
 		/* Loop to compare with all the patterns */
 		for (i = 0; i < patterns; i++) {
 			if (fg_pattern[i].pattern)
+			{
+				assert(pmatch.rm_so <= l->len &&
+				       pmatch.rm_eo <= l->len);
 				r = fastexec(&fg_pattern[i],
 				    l->dat, 1, &pmatch, eflags);
+			}
 				r = regexec(&r_pattern[i], l->dat, 1,
 				    &pmatch, eflags);
Comment 1 Bugzilla Automation freebsd_committer 2015-02-11 03:54:37 UTC
Auto-assigned to maintainer gabor@FreeBSD.org
Comment 2 Jason McSweeney 2015-02-12 03:37:44 UTC
This additional test case also trips it up - but it segfaults inside printline

echo i860 > /tmp/p
./work/grep-20111002/grep  --color -e i860 -e i86 /tmp/p

I believe it is related to this bug


(gdb) r
Starting program: /usr/ports/textproc/bsdgrep/work/grep-20111002/grep --color -e i860 -e i86 /tmp/p

Program received signal SIGSEGV, Segmentation fault.
0x00000008011d0a60 in memchr () from /lib/libc.so.7
(gdb) bt
#0  0x00000008011d0a60 in memchr () from /lib/libc.so.7
#1  0x00000008011d0599 in fwrite () from /lib/libc.so.7
#2  0x00000008011d043d in fwrite () from /lib/libc.so.7
#3  0x0000000000406190 in printline (line=0x7fffffffe8a8, sep=58, matches=0x7fffffffe740, m=2) at util.c:469
#4  0x0000000000405d81 in procline (l=0x7fffffffe8a8, nottext=0) at util.c:366
#5  0x0000000000405455 in procfile (fn=0x7fffffffed73 "/tmp/p") at util.c:231
#6  0x0000000000404152 in main (argc=7, argv=0x7fffffffeac0) at grep.c:719
Comment 3 kenorb 2016-04-18 15:55:36 UTC
Created attachment 169441 [details]
Test file which crashes with grep

This file crashes with: grep --color=auto -e test -e test test.info
Comment 4 kenorb 2016-04-18 15:55:55 UTC
Same is happening on OS X with grep (BSD grep) 2.5.1-FreeBSD

$ \grep --color=auto -e test -e test test.info
Segmentation fault: 11

Check the attached file to test.
Comment 5 kenorb 2016-04-19 09:06:05 UTC
Related blog post: http://blog.loadzero.com/blog/digging-deeper-into-the-grep-segfault/
Comment 6 Kyle Evans 2017-01-23 19:01:26 UTC
FWIW- The patch in this PR seems to clear up this issue as well: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195763
Comment 7 commit-hook freebsd_committer 2017-04-03 23:17:32 UTC
A commit references this bug:

Author: emaste
Date: Mon Apr  3 23:16:51 UTC 2017
New revision: 316477
URL: https://svnweb.freebsd.org/changeset/base/316477

  bsdgrep: fix matching behaviour

  - Set REG_NOTBOL if we've already matched beginning of line and we're
    examining later parts

  - For each pattern we examine, apply it to the remaining bits of the
    line rather than (potentially) smaller subsets

  - Check for REG_NOSUB after we've looked at all patterns initially
    matching the line

  - Keep track of the last match we made to later determine if we're
    simply not matching any longer or if we need to proceed another byte
    because we hit a zero-length match

  - Match the earliest and longest bit of each line before moving the
    beginning of what we match to further in the line, past the end of the
    longest match; this generally matches how gnugrep(1) seems to behave,
    and seems like pretty good behavior to me

  - Finally, bail out of printing any matches if we were set to print all
    (empty pattern) but -o (output matches) was set

  PR:		195763, 180990, 197555, 197531, 181263, 209116
  Submitted by:	"Kyle Evans" <kevans91@ksu.edu>
  Reviewed by:	cem
  MFC after:	1 month
  Relnotes:	Yes
  Differential Revision:	https://reviews.freebsd.org/D10104

Comment 8 commit-hook freebsd_committer 2017-04-05 18:42:13 UTC
A commit references this bug:

Author: emaste
Date: Wed Apr  5 18:41:47 UTC 2017
New revision: 316536
URL: https://svnweb.freebsd.org/changeset/base/316536

  bsdgrep: create additional tests for coverage on recent fixes

  Create additional tests to cover regressions that were discovered by
  PRs linked to reviews D10098, D10102, and D10104.

  It is worth noting that neither bsdgrep(1) nor gnugrep(1) in the base
  system currently pass all of these tests, and gnugrep(1) not quite being
  up to snuff was also noted in at least one of the PRs.

  PR:		175314 202022 195763 180990 197555 197531 181263 209116
  Submitted by:	Kyle Evans <kevans91@ksu.edu>
  Reviewed by:	cem, ngie, emaste
  MFC after:	1 month
  Differential Revision:	https://reviews.freebsd.org/D10112