Bug 223524 - whatis 'c++' fails with regex error
Summary: whatis 'c++' fails with regex error
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs mailing list
Keywords: patch
Depends on:
Reported: 2017-11-08 11:02 UTC by Wolfram Schneider
Modified: 2018-09-10 17:40 UTC (History)
2 users (show)

See Also:

link mandoc against libgnuregex (832 bytes, patch)
2018-09-09 10:59 UTC, Yuri Pankov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Wolfram Schneider freebsd_committer 2017-11-08 11:02:25 UTC
The manual page of whatis(1) claims that the search is by default “case-insensitive substring matching”

  By default, apropos searches for makewhatis(8) databases in the default
  paths stipulated by man(1) and uses case-insensitive substring matching
  (the = operator) over manual names and descriptions 

  whatis is a synonym for apropos -f.

     -f      Search for all words in expression in manual page names only.
             The search is case insensitive and matches whole words only.  In
             this mode, macro keys, comparison operators, and logical
             operators are not available.

But when I tried to search for the term “c++” I get an regcomp error:

$ whatis 'c++'
whatis: regcomp /[[:<:]]c++[[:>:]]/: repetition-operator operand invalid
whatis: ignoring trailing c++
usage: whatis [-afk] [-C file] [-M path] [-m path] [-O outkey] [-S arch]
	      [-s section] name ...

Quoting the plus character returns zero results:

$ whatis 'c\+\+'
whatis: nothing appropriate

Using a single plus gives some results:

$ whatis 'c+'
c++filt(1) - decode C++ symbols
Comment 1 Yuri Pankov 2018-09-09 09:05:44 UTC
There are several problems here:
- by default, regcomp() is called with REG_EXTENDED, so '+' is a special character and needs to be escaped
- once you quote '+', it becomes ordinary, but '+' is NOT a word character, so matching fails

`whatis c+` (single plus) returns the same results as `whatis c`, as '+' means "match 1 or more occurrences".

whatis could be changed to use REG_BASIC instead of REG_EXTENDED, but still, "c++" is not a word, and won't match.

Given the above, this isn't really a bug as man page says "words" and not "exact strings" :-)
Comment 2 Wolfram Schneider freebsd_committer 2018-09-09 09:48:00 UTC
On FreeBSD 10.4-STABLE I can run a apropos keyword search for c++

$ whatis 'c\+\+'    
clang(1)                 - the Clang C, C++, and Objective-C compiler

I can search without quotes, and don't get an error message
$ whatis 'c++' 

If something changed and is no longer working, I consider it a bug and it needs to be fixed.
Comment 3 Yuri Pankov 2018-09-09 10:59:22 UTC
Created attachment 196984 [details]
link mandoc against libgnuregex

That's absolutely correct.

The difference with 10.4 comes from the fact we used the base's GNU grep linked against libgnuregex for whatis, hence the difference in handling the regexps.

Attaching PoC patch linking mandoc against libgnuregex which makes the whatis behavior (nearly) the same as in 10.4.
Comment 4 Kyle Evans freebsd_committer 2018-09-10 17:40:47 UTC

I'll take a look at fixing this in the upcoming weeks. The branch at [1] is inherently wrong; they can't just take a string and stuff it into an ERE like that and hope it works-- one needs to escape any special characters to make them ordinary, and the user shouldn't be expected to do this manually.

The behavior from stable/10 is technically wrong, but yields the correct result by coincidence because gnuregex allows 'c++', though it has a different meaning than expected. I think we can come up with a reasonable compromise that still does exactly what you expect it to do.

[1] https://svnweb.freebsd.org/base/head/contrib/mdocml/mansearch.c?revision=324362&view=markup#l754