Bug 183876 - [patch] fmt(1): /usr/src/usr.bin/fmt/ (not 8 bit clean) for German & French
Summary: [patch] fmt(1): /usr/src/usr.bin/fmt/ (not 8 bit clean) for German & French
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 10.0-BETA3
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2013-11-12 01:20 UTC by jhs
Modified: 2022-10-17 12:40 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description jhs 2013-11-12 01:20:00 UTC
	

	2003 I looked at fmt.c to make it 8 bit clean for 4.8-RELEASE,
	I was conservative & my patches did just a subset, I recall.

	http://www.berklix.com/~jhs/src/bsd/fixes/FreeBSD/src/gen/usr.bin/fmt/

	I maintained patches since.

	2010 I posted to hackers@ Tue May 25 11:29:20 UTC 2010
	http://lists.freebsd.org/pipermail/freebsd-hackers/2010-May/031901.html

	& got 1 comment:
	Christian's from Wed May 26 18:05:53 UTC 2010,
	http://lists.freebsd.org/pipermail/freebsd-hackers/2010-May/031927.html

	I'm still using & maintaining patches through to current & 10.0-BETA3,

	Tonight 2 BSD people (cc'd) asked I'd sent patches, 
	so this also a send-pr.


	WRT Christian's comment from Wed May 26 18:05:53 UTC 2010,

	I don't know about ISO 8859-1 and UTF-8, (I dislike & avoid
	national char set stuff as much as possible), but I want
	to be able to edit files that simultaneously contain eg all
	of English German & French etc, so setting some var to eg
	just German would be inappropriate.  8 bit clean would be ideal,
	next best would be my patches I suppose.

	We no longer use 7 bit teletypes, & no longer need parity,
	so fmt.c could be made pretty much 8 bit clean, (apart from
	eg Null etc which'd doubtless be too much hastle).    - Or
	it can be tweaked to allow some chars as I recall I did,

	Options presumably are still the 4 from Tue May 25
	11:29:20 UTC 2010
	http://lists.freebsd.org/pipermail/freebsd-hackers/2010-May/031901.html

	I assume either adopting Solution 1 (Discard "& 0x7f" ) or 
	Solution 2 (my patches) would not disrupt locale users,
	but would stop fmt failing on some 8 bit text.

Fix: 

Look at my posting 
	http://lists.freebsd.org/pipermail/freebsd-hackers/2010-May/031901.html
	& my patches
	http://www.berklix.com/~jhs/src/bsd/fixes/FreeBSD/src/gen/usr.bin/fmt/
How-To-Repeat: 	
	Read the code
Comment 1 Jordan Hubbard 2013-11-12 01:51:56 UTC
On Nov 11, 2013, at 5:06 PM, Julian H. Stacey <jhs@berklix.com> wrote:

> 	I don't know about ISO 8859-1 and UTF-8, (I dislike & avoid
> 	national char set stuff as much as possible), but I want


Well, nobody can ever accuse you of following the herd!   If there ever was a herd you were a member of, in fact, Im sure the species has long since gone extinct. ;-)

Seriously though, this war is over and UTF-8 won.  There may be some small pockets of resistance, but theyre demographically less than significant (insert standard analogy here of soldiers still fighting WWII on isolated islands in the Pacific).  The Linux crowd switched as early as 2002, and OS X has been using UTF-8 on the CLI as the default for at least 5 years now.

Required reading:
	http://www.cl.cam.ac.uk/~mgk25/unicode.html
	http://www.madboa.com/geek/utf8/

P.S. UTF-8 is not a national character set either.  It was actually invented by Ken Thompson in 1992 and drawn on a placemat (http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt).  It has an excellent pedigree. :)

- Jordan
Comment 2 Christian Weisgerber 2013-11-12 20:17:37 UTC
Julian H. Stacey:

> 	I don't know about ISO 8859-1 and UTF-8, (I dislike & avoid
> 	national char set stuff as much as possible), but I want

That is your problem right there.

> 	to be able to edit files that simultaneously contain eg all
> 	of English German & French etc, so setting some var to eg
> 	just German would be inappropriate.  8 bit clean would be ideal,
> 	next best would be my patches I suppose.

You MUST define a character set for this.  "8-bit clean" is meaningless
for a tool that deals with runs of characters.  Without a defined
character set, you have no idea what those bytes mean.  Is 0x90 a
printable character?  Is it a control character?  Is it part of a
multibyte character?

And setting, for example, LC_CTYPE=de_DE.ISO8859-1 does in no way
limit you to German.  For LC_CTYPE purposes, the language/country
part of the locale specification isn't used.

This is definitely a PEBKAC.

-- 
Christian "naddy" Weisgerber                          naddy@mips.inka.de
Comment 3 jhs 2013-11-13 17:48:51 UTC
Christian Weisgerber wrote:
> Julian H. Stacey:
> 
> > 	I don't know about ISO 8859-1 and UTF-8, (I dislike & avoid
> > 	national char set stuff as much as possible), but I want
> 
> That is your problem right there.

My perspective & experience or `problem' as you mislabel it, is I
was supporting Unix Internationalisation back in 1985, & long since
tired of agravating German umlauts issues (Umlauts even back then
had AE OE UE [& SS] replacements but few used them).

Your problem is being German you had an incentive to attain umlauts,
& probably being younger, wasted less time achieving umlauts going
straight to the since available UTF; but myopic that others may be
averse to waste more time for superflous national oddities that
cleaner Roman derivatives like Italian & English etc find superfluous.

It seemed best to make fmt.c 8 bit clean[er], to help process
arbitrary text, harm no one, & not disturb users of eg UTF.

Your problem is you would obstruct a cleaner fmt, so fmt continues
to fail until users are forced to waste their time too like you did,
reading & configuring internationalisation variables some don't need. **


> > 	to be able to edit files that simultaneously contain eg all
> > 	of English German & French etc, so setting some var to eg
> > 	just German would be inappropriate.  8 bit clean would be ideal,
> > 	next best would be my patches I suppose.
> 
> You MUST define a character set for this.  "8-bit clean" is meaningless
> for a tool that deals with runs of characters.  Without a defined
> character set, you have no idea what those bytes mean.  Is 0x90 a

Not true. See below. **

> printable character?  Is it a control character?  Is it part of a
> multibyte character?
> 
> And setting, for example, LC_CTYPE=de_DE.ISO8859-1 does in no way
> limit you to German.  For LC_CTYPE purposes, the language/country
> part of the locale specification isn't used.
> 
> This is definitely a PEBKAC.

Avoid junk acronyms.

Re-Read original post 
http://lists.freebsd.org/pipermail/freebsd-hackers/2010-May/031901.html

Particularly:
        Example: Pasting notes into an xterm, clauses from
        http://seafrance.com in English then French original &
        German, to get the feel of what an unclear English translation

**:
Sometimes I mouse paste from Firefox in English, French, German &
other languages, making notes in a single file with vi in an
xterm, all with standard env. no Locale. & it edits OK in vi, &
displays with cat in xterm, till !}fmt in vi wraps long lines,
when fmt breaks it. So I fixed fmt.

It would Not be appropriate to set a German locale, nor a French etc.
Other utils might misbehave now or later See eg man sort re LC_ALL.

No way I'd keep exiting vi & resetting LC_CTYPE between 
mouse pastes from different language pages, The default American works fine.

I'm not bothered if vi+xterm might mis-display some odd accent,
as I can see something is there, so long as fmt does not strip the
accent, but FreeBSD fmt.c Does strip the French accents & German
umlauts, that's why I fixed fmt.c

Summary:
 Making fmt.c 8 bit cleaner would not break UTF & unicode I believe
 so no reason to object to removal of fmt.c '& 0x7f' cruft etc.

Cheers,
Julian
-- 
Julian Stacey, BSD Unix Linux C Sys Eng Consultant, Munich http://berklix.com
 Interleave replies below like a play script.  Indent old text with "> ".
 Send plain text, not quoted-printable, HTML, base64, or multipart/alternative.
    Extradite NSA spy chief Alexander.  http://berklix.eu/jhs/blog/2013_10_30
Comment 4 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 08:01:25 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 5 Graham Perrin freebsd_committer freebsd_triage 2022-10-17 12:40:55 UTC
Keyword: 

    patch
or  patch-ready

– in lieu of summary line prefix: 

    [patch]

* bulk change for the keyword
* summary lines may be edited manually (not in bulk). 

Keyword descriptions and search interface: 

    <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>