Bug 235456

Summary: [maintainer update] japanese/mh: Fix man pages breakage on FreeBSD 11.x and later
Product: Ports & Packages Reporter: WATANABE Kazuhiro <CQG00620>
Component: Individual Port(s)Assignee: Kurt Jaeger <pi>
Status: Closed FIXED    
Severity: Affects Some People CC: pi, schwarze
Priority: --- Flags: pi: maintainer-feedback+
pi: merge-quarterly+
Version: Latest   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
A patch for japanese/mh
none
The groff output of mh(1) man page without the patch
none
The mandoc output of mh(1) man page without the patch
none
The mandoc output of mh(1) man page with the patch
none
poudriere testport log on FreeBSD 12.0 without the patch
none
poudriere testport log on FreeBSD 12.0 with the patch
none
Correct mandoc output under C locale.
none
Correct mandoc output under ja_JP.UTF-8 locale.
none
Incorrect mandoc output under ja_JP.eucJP locale on 11.2-RELEASE.
none
Incorrect mandoc output under ja_JP.SJIS locale on 11.2-RELEASE. none

Description WATANABE Kazuhiro 2019-02-03 08:56:23 UTC
On recent FreeBSD, japanese/mh's man pages has been broken with mandoc,
the default man page formatter since 11.0-RELEASE.
Because the man pages uses some features that is not supported by mandoc.

(1) The man pages uses .fc macro which is not supported by mandoc.

(2) Some man pages uses .so macro with an absolute file path.
    Mandoc's .so macro only supports a file in the current directory.

(3) Local macro definitions has a trailing whitespace and comment.
    For example:

     .de De  \" Defaults section

    Mandoc ignores these definitions unless the trailing component has
    been removed.  Mandoc's '.de' only accepts a macro name.

(4) The man pages uses \*(lq and \*(rq macros which is not displayed anything
    with mandoc on a tty terminal other than UTF-8 locale.

(5) Many of whitespace in a quoted strings is escaped by backslash.
    The escaped character is not displayed with mandoc.

    For example, "PROFILE\ COMPONENTS" is displayed 'PROFILECOMPONENTS'

How to fix:

(1) Remove the .fc macro and delimiter character.
    Replace the padding character to TAB character.
    (patch-conf_doc_tmac.h and patch-conf_doc_me2man.sed)

(2) Replace the .so macro to the specified file contents.
    (patch-conf_makefiles_doc and an AWK script in Makefile)

(3) Remove the trailing components from the macro name (patch-conf_doc_tmac.h).

(4) Define local macros such as '.ds lq \&"'.  This idea is taken from
    the top(1) man page. (patch-conf_doc_tmac.h)

(5) 's/\\[ 0]/ /g' (patch-conf_doc_me2man.sed)
Comment 1 WATANABE Kazuhiro 2019-02-03 09:01:32 UTC
Created attachment 201672 [details]
A patch for japanese/mh
Comment 2 WATANABE Kazuhiro 2019-02-03 09:04:24 UTC
Created attachment 201673 [details]
The groff output of mh(1) man page without the patch

It is created by the following commands:

/usr/bin/zcat /usr/local/man/man1/mh.1.gz | tbl | groff -S -P-h -Wall -mtty-char -man -Tascii -P-c | col -b > mh.1.groff_without_patch.txt
Comment 3 WATANABE Kazuhiro 2019-02-03 09:05:53 UTC
Created attachment 201674 [details]
The mandoc output of mh(1) man page without the patch

It is created by the following commands:

/usr/bin/zcat /usr/local/man/man1/mh.1.gz | mandoc | col -b > mh.1.mandoc_without_patch.txt
Comment 4 WATANABE Kazuhiro 2019-02-03 09:06:56 UTC
Created attachment 201675 [details]
The mandoc output of mh(1) man page with the patch
Comment 5 WATANABE Kazuhiro 2019-02-03 09:08:53 UTC
Created attachment 201676 [details]
poudriere testport log on FreeBSD 12.0 without the patch
Comment 6 WATANABE Kazuhiro 2019-02-03 09:09:59 UTC
Created attachment 201677 [details]
poudriere testport log on FreeBSD 12.0 with the patch
Comment 7 WATANABE Kazuhiro 2019-02-03 09:14:32 UTC
To committer: Please commit the another patch which is reported in Bug 233463
if it has not been committed.  It is highly required on FreeBSD 12.x.
Comment 8 commit-hook freebsd_committer freebsd_triage 2019-02-03 10:51:34 UTC
A commit references this bug:

Author: pi
Date: Sun Feb  3 10:51:16 UTC 2019
New revision: 492045
URL: https://svnweb.freebsd.org/changeset/ports/492045

Log:
  japanese/mh: Fix man pages, runtime error

  - man pages breakage on FreeBSD 11.x and later
  - runtime error due to lld 6.0 issue, fixed by using lld 7.0

  PR:		233463, 235456
  Submitted by:	WATANABE Kazuhiro <CQG00620@nifty.ne.jp> (maintainer), nyan
  MFH:		2019Q1

Changes:
  head/japanese/mh/Makefile
  head/japanese/mh/files/patch-conf_doc_me2man.sed
  head/japanese/mh/files/patch-conf_doc_tmac.h
  head/japanese/mh/files/patch-conf_makefiles_doc
Comment 9 commit-hook freebsd_committer freebsd_triage 2019-02-03 10:54:41 UTC
A commit references this bug:

Author: pi
Date: Sun Feb  3 10:53:45 UTC 2019
New revision: 492046
URL: https://svnweb.freebsd.org/changeset/ports/492046

Log:
  MFH: r492045

  japanese/mh: Fix man pages, runtime error

  - man pages breakage on FreeBSD 11.x and later
  - runtime error due to lld 6.0 issue, fixed by using lld 7.0

  PR:		233463, 235456
  Submitted by:	WATANABE Kazuhiro <CQG00620@nifty.ne.jp> (maintainer), nyan
  Approved by:	portmgr (unbreak blanket)

Changes:
_U  branches/2019Q1/
  branches/2019Q1/japanese/mh/Makefile
  branches/2019Q1/japanese/mh/files/patch-conf_doc_me2man.sed
  branches/2019Q1/japanese/mh/files/patch-conf_doc_tmac.h
  branches/2019Q1/japanese/mh/files/patch-conf_makefiles_doc
Comment 10 Ingo Schwarze 2019-02-03 15:03:50 UTC
From a mandoc(1) perspective, these are five different issues, so i'm commenting on them one by one, rather than all together.

Let's start with issue (4).  "\*(lq" and "\*(rq" are not macros, but expansions of predefined strings.  I tried, but failed to reproduce any problem with them with -current mandoc.  They seem to just work out of the box with both -T utf8 and -T ascii, both in small test files and in the context of the nmh(1) manual page itself.

Can you provide a minimal test file exhibiting the problem, together with the exact command you type for formatting and the complete output you get from the formatter, and an explanation which output you would like to get instead?
Comment 11 Ingo Schwarze 2019-02-03 15:28:45 UTC
I can't reproduce issue (5) either.  The unpaddable non-breaking space character "\ " appears to work exactly as expected.  Can you provide a minimal test file exhibiting the problem, again with the exact formatting command used and the actual vs. the desired output?

Besides, your workaround for issue (5) is likely to cause misformatting of unrelated input lines because it removes the non-breaking property from "\ "
regardless of context.
Comment 12 Ingo Schwarze 2019-02-06 21:18:24 UTC
Issue (3) was a real bug in mandoc, thanks for reporting it.
Note that the whitespace after ".de De" is a tab character
rather than a bunch of space characters.

Fixed in mandoc.bsd.lv roff.c rev. 1.363:
http://mandoc.bsd.lv/cgi-bin/cvsweb/roff.c#rev1.363
Comment 13 Ingo Schwarze 2019-02-06 21:30:02 UTC
Issue (1) has been known as a missing feature for quite some time, see the second entry from the top in: http://mandoc.bsd.lv/cgi-bin/cvsweb/TODO?rev=HEAD

It is low priority because manual pages needing it are extremely rare, on the order of about 0.01% of pages in the OpenBSD ports tree.  Probably, i will eventually implement it, but i can't give an ETA.

Issue (2) is a WONTFIX, the restriction on permissible include paths is in place for security reasons.  Even if there weren't security issues with it, the .so request is too fragile for use in manual pages, so upstream should consider getting rid of it.
Comment 14 WATANABE Kazuhiro 2019-02-11 09:06:22 UTC
(In reply to Ingo Schwarze from comment #10)
> From a mandoc(1) perspective, these are five different issues, so i'm
> commenting on them one by one, rather than all together.
> 
> Let's start with issue (4).  "\*(lq" and "\*(rq" are not macros, but
> expansions of predefined strings.  I tried, but failed to reproduce any
> problem with them with -current mandoc.  They seem to just work out of the
> box with both -T utf8 and -T ascii, both in small test files and in the
> context of the nmh(1) manual page itself.
> 
> Can you provide a minimal test file exhibiting the problem, together with
> the exact command you type for formatting and the complete output you get
> from the formatter, and an explanation which output you would like to get
> instead?

Thanks for your folow-up(s).

I've tested with the following procedure on FreeBSD 11.2 and 12.0.
It's a sample document (sample.1):

.TH EXAMPLE 1
.SH DESCRIPTION
\*(lqThis paragraph is quoted.\*(rq
.SH "SEE\ ALSO"
example\-example(1)

...And run the following commands:

$ LC_CTYPE=C              mandoc sample.1 > sample_C.catman
$ LC_CTYPE=UTF-8          mandoc sample.1 > sample_UTF-8.catman
$ LC_CTYPE=ja_JP.eucJP    mandoc sample.1 > sample_ja_JP.eucJP.catman  
$ LC_CTYPE=ja_JP.SJIS     mandoc sample.1 > sample_ja_JP.SJIS.catman
$ LC_CTYPE=ja_JP.UTF-8    mandoc sample.1 > sample_ja_JP.UTF-8.catman


First of all, there is no problem which belongs to the issue (4) and (5)
on FreeBSD 12.0, as you suggested.  These issues seems to have been fixed.

I don't know whether the difference is caused by the different version of
mandoc (1.14.3 for 11.2-RELEASE and -stable, 1.14.4 for 12.0-RELEASE and
-current).

I had only tested these commands on 11.2, and had not considered
the possibilities the different results between 11.2 and 12.0...


* Correct output

All locale listed above on 12.0.  C, UTF-8 and ja_JP.UTF-8 locale on 11.2:

----------------------------------------------
DESCRIPTION
       "This paragraph is quoted."

SEE ALSO
       example-example(1)
----------------------------------------------
(Omit the header and footer)
 
These are the same output except ja_JP.UTF-8 locale.  For ja_JP.UTF-8
it is used LEFT DOUBLE QUOTATION MARK and RIGHT DOUBLE QUOTATION MARK
(U+201C and U+201D) instead of ASCII double quotes.  See the attached files.
 
 
* Incorrect output

ja_JP.eucJP and ja_JP.SJIS locale on 11.2.  These are losts double quotes
and a escaped whitespace in a section header.  It's the ja_JP.eucJP output:

----------------------------------------------
DESCRIPTION
       This paragraph is quoted.

SEALSO
       example-example(1)
----------------------------------------------

Not SEEALSO but SEALSO.  It's the following sequence:

S <BS> S E <BS> E E <BS> E <BS> A <BS> A L <BS> L S <BS> S O <BS> O

The extra <BS> follows to the last "E".  The ja_JP.SJIS output is a bit
different.  See the attached files.
Comment 15 WATANABE Kazuhiro 2019-02-11 09:08:33 UTC
Created attachment 201920 [details]
Correct mandoc output under C locale.

It is created by the following command:
$ LC_CTYPE=C              mandoc sample.1 > sample_C.catman
Comment 16 WATANABE Kazuhiro 2019-02-11 09:10:01 UTC
Created attachment 201921 [details]
Correct mandoc output under ja_JP.UTF-8 locale.

It is created by the following command:
$ LC_CTYPE=ja_JP.UTF-8    mandoc sample.1 > sample_ja_JP.UTF-8.catman
Comment 17 WATANABE Kazuhiro 2019-02-11 09:11:25 UTC
Created attachment 201922 [details]
Incorrect mandoc output under ja_JP.eucJP locale on 11.2-RELEASE.

It is created by the following command on FreeBSD 11.2-RELEASE (mandoc 1.14.3):
$ LC_CTYPE=ja_JP.eucJP    mandoc sample.1 > sample_ja_JP.eucJP.catman
Comment 18 WATANABE Kazuhiro 2019-02-11 09:12:19 UTC
Created attachment 201923 [details]
Incorrect mandoc output under ja_JP.SJIS locale on 11.2-RELEASE.

It is created by the following command on FreeBSD 11.2-RELEASE (mandoc 1.14.3):
$ LC_CTYPE=ja_JP.SJIS     mandoc sample.1 > sample_ja_JP.SJIS.catman
Comment 19 WATANABE Kazuhiro 2019-02-11 09:16:36 UTC
(In reply to Ingo Schwarze from comment #11)
> Besides, your workaround for issue (5) is likely to cause misformatting of
> unrelated input lines because it removes the non-breaking property from "\ "
> regardless of context.

Indeed.  Most of it appears in a SYNOPSYS section like this:

----------------------------------------------
 SYNOPSIS
-       scan [+folder] [msgs] [-clear] [-noclear] [-form formatfile] [-format
-            string] [-header] [-noheader] [-width columns] [-reverse]
+       scan [+folder] [msgs] [-clear] [-noclear] [-form formatfile]
+            [-format string] [-header] [-noheader] [-width columns] [-reverse]
             [-noreverse] [-file filename] [-help]
----------------------------------------------
(- == 11.2 with substitution, + == 12.0 without substitution)

But on FreeBSD 11.2-RELEASE, it is better to lost the non-breakng property
than to lost the escaped whitespace.  I will have remained the substitution
code until FreeBSD releases that bundles old mandoc (version < 1.14.4) becomes
end-of-life.