Bug 184733 - bsdgrep(1) doesn't match a regular expression containing "|" against UTF-16 file [regression]
Summary: bsdgrep(1) doesn't match a regular expression containing "|" against UTF-16 f...
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 9.2-STABLE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks: 230332
  Show dependency treegraph
 
Reported: 2013-12-12 20:00 UTC by toomas.aas
Modified: 2020-12-05 03:41 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description toomas.aas 2013-12-12 20:00:00 UTC
$ egrep -V
egrep (BSD grep) 2.5.1-FreeBSD

$ echo abc > testfile
$ iconv -f ASCII -t UTF-16LE testfile > utftestfile

$ egrep -c "a.b" /tmp/utftestfile
1
$ egrep -c "a.b|d" /tmp/utftestfile
0


The expected result is that the second "egrep" command should also
return 1. This works as expected when using GNU grep 2.15 installed
from ports. Also this works as expected with "bsdgrep -E" on FreeBSD
9.1 i386 system.

How-To-Repeat: See "Full Description"
Comment 1 Kyle Evans freebsd_committer freebsd_triage 2017-01-21 03:45:23 UTC
A couple of notes here, as of right now:

`egrep -c` and `bsdgrep -Ec` seem to be behaving consistently on this one now. Also, I've gotten as far as isolating it to a problem somewhere in the GNU compatibility bits. Enabling WITHOUT_GNU_GREP_COMPAT in /etc/src.conf and rebuilding bsdgrep makes it Just Work (TM).

At this point, I'm not sure how to proceed. I did verify that we're setting the cflags right (in accordance with /usr/include/gnu/regex.h), other than that nothing else sticks out as blatantly wrong.
Comment 2 Kyle Evans freebsd_committer freebsd_triage 2017-01-21 04:00:23 UTC
(In reply to Kyle Evans from comment #1)

Also worth noting: this equivalent test on a relatively recent Debian machine:

> grep (GNU grep) 2.27

$ echo abc > testfile
$ iconv -f ASCII -t UTF-16LE testfile > utftestfile
$ egrep -c "a.b" /tmp/utftestfile
0
$ egrep -c "a.b|d" /tmp/utftestfile
0
Comment 3 Eitan Adler freebsd_committer freebsd_triage 2018-05-20 23:51:57 UTC
For bugs matching the following conditions:
- Status == In Progress
- Assignee == "bugs@FreeBSD.org"
- Last Modified Year <= 2017

Do
- Set Status to "Open"
Comment 4 Kyle Evans freebsd_committer freebsd_triage 2018-08-03 16:15:33 UTC
Adding this to tracking PR; will mark fixed/overcome by events once bsdgrep loses the bits that allow it to be linked against gnuregex.
Comment 5 Kyle Evans freebsd_committer freebsd_triage 2020-12-05 03:41:49 UTC
This is mostly OBE as bsdgrep will now use libregex by default rather than libgnuregex. 11.4 still links against it, but I would tend to recommend not using bsdgrep on 11.x.