Bug 64845 - Par must exclude non-breaking space from the class of space chars
Summary: Par must exclude non-breaking space from the class of space chars
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: Normal Affects Only Me
Assignee: Pav Lucistnik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-03-28 15:20 UTC by Jean-Baptiste Quenot
Modified: 2004-04-17 13:46 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jean-Baptiste Quenot 2004-03-28 15:20:21 UTC
Par 1.52 on FreeBSD does not work as expected by the upstreams author.  On
FreeBSD, the isspace() system call returns true for the non-breaking space
character 0xA0, but this is an unintended side effect.

Quoting a message from the upstreams author:
--------------------------------------------------------------------------------
From: "Adam M. Costello" <amc+0zjyiz+@nicemice.net>
Date: Tue, 2 Dec 2003 21:19:10 +0000
To: Jean-Baptiste Quenot <jb.quenot@caraldi.com>
User-Agent: Mutt/1.5.4i
                                                                                                                 
> on FreeBSD, the locales definitions include non-breaking space in the
> list of spaces, thus isspace(160) is true, and as a result all my
> nbsps are filtered out, and lines are broken on them.
>
> I noticed that the GNU libc has removed 0xA0 from spaces on purpose.
> But the BSD guys seem to have another approach, as this kind of stuff
> is "implementation specific".
                                                                                                                 
That's interesting.  This was not an issue in Par 1.51, because it
didn't call setlocale(), so only ASCII characters were recognized
by isspace(), isalnum(), islower(), etc.  In par 1.52, a call to
setlocale() was added so that non-ASCII letters and digits would be
recognized for the purpose of the g,B,P,Q options.
                                                                                                                 
An unforseen side effect is that non-ASCII white-space characters are
now recognized.
--------------------------------------------------------------------------------

Here is the fragment declaring SPACE and BLANK for the ISO Latin 1 locale on
FreeBSD:
                                                                                                                 
/*
 * Standard LOCALE_CTYPE for the ISO 8859-1 Locale
 *
 * $FreeBSD: src/share/mklocale/la_LN.ISO8859-1.src,v 1.3 2001/11/30 05:05:53 ache Exp $
 */
                                                                                                                 
...
                                                                                                                 
SPACE           0x09 - 0x0d ' ' 0xa0
UPPER           'A' - 'Z' 0xc0 - 0xd6 0xd8 - 0xde
XDIGIT          '0' - '9' 'a' - 'f' 'A' - 'F'
BLANK           ' ' '\t' 0xa0

Fix: Apply the following patch:
--------------------------------------------------------------------------------


Thanks in advance,
-- 
Jean-Baptiste Quenot
http://caraldi.com/jbq/--TwhnCiDeLN0I9ePbvfhECydO25ZGvxvjWKmqzlkurqHTNv0v
Content-Type: text/plain; name="file.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="file.diff"

--- par.c.orig	Sun Mar 28 16:00:15 2004
+++ par.c	Sun Mar 28 16:04:00 2004
@@ -403,7 +403,8 @@
         }
         continue;
       }
-      if (isspace(c)) ch = ' ';
+      // Exclude non-breaking space from the class of space chars
+      if (isspace(c) && c != 0xA0) ch = ' ';
       else blank = 0;
       additem(cbuf, &ch, errmsg);
       if (*errmsg) goto rlcleanup;
--------------------------------------------------------------------------------
How-To-Repeat: Set your locale settings to an 8 bit character set like ISO8859-1.  Insert
non-breaking spaces in a text, and notice how par converts them to spaces, and
even wrapping the lines on them.
Comment 1 Jean-Baptiste Quenot 2004-03-28 22:03:17 UTC
As  suggested by  Greg  Shenaut  <gkshenaut@ucdavis.edu>, the  following
statement would  be closer  to the original  intention of  the upstreams
author:

if (isspace(c) && isascii(c)) ch = ' ';

Regards,
-- 
Jean-Baptiste Quenot
http://caraldi.com/jbq/
Comment 2 Mark Murray 2004-03-28 23:32:51 UTC
Jean-Baptiste Quenot writes:
> As  suggested by  Greg  Shenaut  <gkshenaut@ucdavis.edu>, the  following
> statement would  be closer  to the original  intention of  the upstreams
> author:
> 
> if (isspace(c) && isascii(c)) ch = ' ';

Aaah. So the author may be aware of this? Do you know if he's incorporating
this in his current offering?

M
--
Mark Murray
iumop ap!sdn w,I idlaH
Comment 3 Jean-Baptiste Quenot 2004-03-29 09:41:31 UTC
* Mark Murray:

> Jean-Baptiste Quenot writes:
>
> > As suggested by Greg  Shenaut <gkshenaut@ucdavis.edu>, the following
> > statement would be closer to the original intention of the upstreams
> > author:
> >
> > if (isspace(c) && isascii(c)) ch = ' ';
>
> Aaah. So  the  author may  be  aware  of  this? Do  you know  if  he's
> incorporating this in his current offering?

Yes, I'm quoting his words above in the original PR.

http://www.nicemice.net/par/

« The latest version of Par, released on 2001-Apr-29 »

Par does not seem to be updated on a regular basis, it is rather stable.
So I don't think an update  will soon be available.  Discussion with the
author has happened  by the end of  last year, and I  haven't heard from
him  since  then.  Furthermore,  the  problem  only arises  on  FreeBSD,
because it's  the only system I  know of that includes  the non-breaking
space in the class of space characters in the Standard C Library.

Regards,
-- 
Jean-Baptiste Quenot
http://caraldi.com/jbq/
Comment 4 Pav Lucistnik freebsd_committer freebsd_triage 2004-04-01 18:20:32 UTC
Mark, as a maintainer of textproc/par, do you approve a patch in

http://www.freebsd.org/cgi/query-pr.cgi?q=64845

?

-- 
Pav Lucistnik <pav@oook.cz>
              <pav@FreeBSD.org>

Quantum physics was developed in the 1930's, as a result of a bet between
Albert Einstein and Niels Bohr, to see who could come up with the most
ridiculous theory and still have it published.
Comment 5 Pav Lucistnik freebsd_committer freebsd_triage 2004-04-01 18:20:34 UTC
State Changed
From-To: open->feedback

Asked maintainer for approval. 


Comment 6 Pav Lucistnik freebsd_committer freebsd_triage 2004-04-01 18:20:34 UTC
Responsible Changed
From-To: freebsd-ports-bugs->pav

Handle
Comment 7 Pav Lucistnik freebsd_committer freebsd_triage 2004-04-17 13:46:38 UTC
State Changed
From-To: feedback->closed

Committed, thanks!