Bug 209352 - usr.bin/sed: Bug involving "\<".
Summary: usr.bin/sed: Bug involving "\<".
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2016-05-07 00:58 UTC by Pedro F. Giffuni
Modified: 2016-06-27 20:56 UTC (History)
1 user (show)

See Also:
pfg: mfc-stable10+


Attachments
Fix (from OpenBSD) (3.00 KB, patch)
2016-05-07 01:57 UTC, Pedro F. Giffuni
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Pedro F. Giffuni freebsd_committer freebsd_triage 2016-05-07 00:58:43 UTC
This was noted from the openbsd-bugs list :

(Reply from Tim Chase 2016-04-24)...
Can be simplified to the test-case

 $ echo a,a,a,a,a | sed -r 's/\<.,/X&/g'
 Xa,a,Xa,a,a
 $ echo a,a,a,a,a | gsed -r 's/\<.,/X&/g'
 Xa,Xa,Xa,Xa,a

It appears the determination of "\<" can get thrown off when the
character preceding it is part of a replacement.
_____

I have reproduced it with the sed in FreeBSD
_____

There is a solution candidate in openbsd-tech, but the patch doesn't apply directly:

From  Martijn van Duren - 2016-05-04)
For those interested: The problem comes from the fact that the string
pointer increments to the end of the previous match and is then called
with the REG_NOTBOL. The REG_NOTBOL combined with a match at the begin
of the string causes our regex library to treat the word as not begin of
word.
The TRE implementation does the reverse and treats this case as if it
always is begin of word. This causes a similar bug under MacOS:
$ echo 'foo foofoo' | sed -E 's/\<foo/bar/g'
bar barbar

I've solved this problem by converting sed to use REG_STARTEND more
explicitly. Although this isn't a POSIX specified flag, it is already
used by sed and shouldn't be a problem.
Comment 1 Pedro F. Giffuni freebsd_committer freebsd_triage 2016-05-07 01:57:14 UTC
Created attachment 170076 [details]
Fix (from OpenBSD)

The fix based on the porting on the OpenBSD list, goes on top of r299211.
Comment 2 Pedro F. Giffuni freebsd_committer freebsd_triage 2016-05-11 19:31:22 UTC
The complementary regex fix is here:

https://reviews.freebsd.org/D6257


And for the record this is the offending code:

https://cgit.freedesktop.org/mesa/mesa/tree/src/intel/isl/isl_format_layout_gen.bash
Comment 3 Pedro F. Giffuni freebsd_committer freebsd_triage 2016-05-17 15:36:04 UTC
This is all currently being revisited in OpenBSD (where it came from),
and they are finding other bugs in the process.
Comment 4 commit-hook freebsd_committer freebsd_triage 2016-05-25 15:43:15 UTC
A commit references this bug:

Author: pfg
Date: Wed May 25 15:42:39 UTC 2016
New revision: 300684
URL: https://svnweb.freebsd.org/changeset/base/300684

Log:
  sed: convert sed to use REG_STARTEND more explicitly.

  Summarizing the findings in the OpenBSD list:

  This solves a reproduceable issue with very recent Mesa where REG_NOTBOL
  combined with a match at the begin of the string causes our regex library
  to treat the word as not begin of word.

  Thanks to Martijn van Duren and Ingo Schwarze for taking the time to
  solve this in the least invasive way.

  PR:		209352, 209387
  Taken from:     openbsd-tech (Martijn van Duren)
  MFC after:	1 month

Changes:
  head/usr.bin/sed/process.c
Comment 5 commit-hook freebsd_committer freebsd_triage 2016-06-27 20:54:28 UTC
A commit references this bug:

Author: pfg
Date: Mon Jun 27 20:54:03 UTC 2016
New revision: 302228
URL: https://svnweb.freebsd.org/changeset/base/302228

Log:
  sed(1): convert sed to use REG_STARTEND more explicitly.

  Summarizing the findings in the OpenBSD list:

  This solves a reproduceable issue with very recent Mesa where REG_NOTBOL
  combined with a match at the begin of the string causes our regex library
  to treat the word as not begin of word.

  Bump __FreeBSD_version: JIC we hit the issue in recent Mesa ports.

  PR:		209352, 209387 (exp-run)
  Taken from:     openbsd-tech (Martijn van Duren)
  MFC after:	1 month

Changes:
_U  stable/10/
  stable/10/sys/sys/param.h
  stable/10/usr.bin/sed/process.c