Par 1.52 on FreeBSD does not work as expected by the upstreams author. On FreeBSD, the isspace() system call returns true for the non-breaking space character 0xA0, but this is an unintended side effect. Quoting a message from the upstreams author: -------------------------------------------------------------------------------- From: "Adam M. Costello" <amc+0zjyiz+@nicemice.net> Date: Tue, 2 Dec 2003 21:19:10 +0000 To: Jean-Baptiste Quenot <jb.quenot@caraldi.com> User-Agent: Mutt/1.5.4i > on FreeBSD, the locales definitions include non-breaking space in the > list of spaces, thus isspace(160) is true, and as a result all my > nbsps are filtered out, and lines are broken on them. > > I noticed that the GNU libc has removed 0xA0 from spaces on purpose. > But the BSD guys seem to have another approach, as this kind of stuff > is "implementation specific". That's interesting. This was not an issue in Par 1.51, because it didn't call setlocale(), so only ASCII characters were recognized by isspace(), isalnum(), islower(), etc. In par 1.52, a call to setlocale() was added so that non-ASCII letters and digits would be recognized for the purpose of the g,B,P,Q options. An unforseen side effect is that non-ASCII white-space characters are now recognized. -------------------------------------------------------------------------------- Here is the fragment declaring SPACE and BLANK for the ISO Latin 1 locale on FreeBSD: /* * Standard LOCALE_CTYPE for the ISO 8859-1 Locale * * $FreeBSD: src/share/mklocale/la_LN.ISO8859-1.src,v 1.3 2001/11/30 05:05:53 ache Exp $ */ ... SPACE 0x09 - 0x0d ' ' 0xa0 UPPER 'A' - 'Z' 0xc0 - 0xd6 0xd8 - 0xde XDIGIT '0' - '9' 'a' - 'f' 'A' - 'F' BLANK ' ' '\t' 0xa0 Fix: Apply the following patch: -------------------------------------------------------------------------------- Thanks in advance, -- Jean-Baptiste Quenot http://caraldi.com/jbq/--TwhnCiDeLN0I9ePbvfhECydO25ZGvxvjWKmqzlkurqHTNv0v Content-Type: text/plain; name="file.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="file.diff" --- par.c.orig Sun Mar 28 16:00:15 2004 +++ par.c Sun Mar 28 16:04:00 2004 @@ -403,7 +403,8 @@ } continue; } - if (isspace(c)) ch = ' '; + // Exclude non-breaking space from the class of space chars + if (isspace(c) && c != 0xA0) ch = ' '; else blank = 0; additem(cbuf, &ch, errmsg); if (*errmsg) goto rlcleanup; -------------------------------------------------------------------------------- How-To-Repeat: Set your locale settings to an 8 bit character set like ISO8859-1. Insert non-breaking spaces in a text, and notice how par converts them to spaces, and even wrapping the lines on them.
As suggested by Greg Shenaut <gkshenaut@ucdavis.edu>, the following statement would be closer to the original intention of the upstreams author: if (isspace(c) && isascii(c)) ch = ' '; Regards, -- Jean-Baptiste Quenot http://caraldi.com/jbq/
Jean-Baptiste Quenot writes: > As suggested by Greg Shenaut <gkshenaut@ucdavis.edu>, the following > statement would be closer to the original intention of the upstreams > author: > > if (isspace(c) && isascii(c)) ch = ' '; Aaah. So the author may be aware of this? Do you know if he's incorporating this in his current offering? M -- Mark Murray iumop ap!sdn w,I idlaH
* Mark Murray: > Jean-Baptiste Quenot writes: > > > As suggested by Greg Shenaut <gkshenaut@ucdavis.edu>, the following > > statement would be closer to the original intention of the upstreams > > author: > > > > if (isspace(c) && isascii(c)) ch = ' '; > > Aaah. So the author may be aware of this? Do you know if he's > incorporating this in his current offering? Yes, I'm quoting his words above in the original PR. http://www.nicemice.net/par/ « The latest version of Par, released on 2001-Apr-29 » Par does not seem to be updated on a regular basis, it is rather stable. So I don't think an update will soon be available. Discussion with the author has happened by the end of last year, and I haven't heard from him since then. Furthermore, the problem only arises on FreeBSD, because it's the only system I know of that includes the non-breaking space in the class of space characters in the Standard C Library. Regards, -- Jean-Baptiste Quenot http://caraldi.com/jbq/
Mark, as a maintainer of textproc/par, do you approve a patch in http://www.freebsd.org/cgi/query-pr.cgi?q=64845 ? -- Pav Lucistnik <pav@oook.cz> <pav@FreeBSD.org> Quantum physics was developed in the 1930's, as a result of a bet between Albert Einstein and Niels Bohr, to see who could come up with the most ridiculous theory and still have it published.
State Changed From-To: open->feedback Asked maintainer for approval.
Responsible Changed From-To: freebsd-ports-bugs->pav Handle
State Changed From-To: feedback->closed Committed, thanks!