2003 I looked at fmt.c to make it 8 bit clean for 4.8-RELEASE, I was conservative & my patches did just a subset, I recall. http://www.berklix.com/~jhs/src/bsd/fixes/FreeBSD/src/gen/usr.bin/fmt/ I maintained patches since. 2010 I posted to hackers@ Tue May 25 11:29:20 UTC 2010 http://lists.freebsd.org/pipermail/freebsd-hackers/2010-May/031901.html & got 1 comment: Christian's from Wed May 26 18:05:53 UTC 2010, http://lists.freebsd.org/pipermail/freebsd-hackers/2010-May/031927.html I'm still using & maintaining patches through to current & 10.0-BETA3, Tonight 2 BSD people (cc'd) asked I'd sent patches, so this also a send-pr. WRT Christian's comment from Wed May 26 18:05:53 UTC 2010, I don't know about ISO 8859-1 and UTF-8, (I dislike & avoid national char set stuff as much as possible), but I want to be able to edit files that simultaneously contain eg all of English German & French etc, so setting some var to eg just German would be inappropriate. 8 bit clean would be ideal, next best would be my patches I suppose. We no longer use 7 bit teletypes, & no longer need parity, so fmt.c could be made pretty much 8 bit clean, (apart from eg Null etc which'd doubtless be too much hastle). - Or it can be tweaked to allow some chars as I recall I did, Options presumably are still the 4 from Tue May 25 11:29:20 UTC 2010 http://lists.freebsd.org/pipermail/freebsd-hackers/2010-May/031901.html I assume either adopting Solution 1 (Discard "& 0x7f" ) or Solution 2 (my patches) would not disrupt locale users, but would stop fmt failing on some 8 bit text. Fix: Look at my posting http://lists.freebsd.org/pipermail/freebsd-hackers/2010-May/031901.html & my patches http://www.berklix.com/~jhs/src/bsd/fixes/FreeBSD/src/gen/usr.bin/fmt/ How-To-Repeat: Read the code
On Nov 11, 2013, at 5:06 PM, Julian H. Stacey <jhs@berklix.com> wrote: > I don't know about ISO 8859-1 and UTF-8, (I dislike & avoid > national char set stuff as much as possible), but I want Well, nobody can ever accuse you of following the herd! If there ever was a herd you were a member of, in fact, Im sure the species has long since gone extinct. ;-) Seriously though, this war is over and UTF-8 won. There may be some small pockets of resistance, but theyre demographically less than significant (insert standard analogy here of soldiers still fighting WWII on isolated islands in the Pacific). The Linux crowd switched as early as 2002, and OS X has been using UTF-8 on the CLI as the default for at least 5 years now. Required reading: http://www.cl.cam.ac.uk/~mgk25/unicode.html http://www.madboa.com/geek/utf8/ P.S. UTF-8 is not a national character set either. It was actually invented by Ken Thompson in 1992 and drawn on a placemat (http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt). It has an excellent pedigree. :) - Jordan
Julian H. Stacey: > I don't know about ISO 8859-1 and UTF-8, (I dislike & avoid > national char set stuff as much as possible), but I want That is your problem right there. > to be able to edit files that simultaneously contain eg all > of English German & French etc, so setting some var to eg > just German would be inappropriate. 8 bit clean would be ideal, > next best would be my patches I suppose. You MUST define a character set for this. "8-bit clean" is meaningless for a tool that deals with runs of characters. Without a defined character set, you have no idea what those bytes mean. Is 0x90 a printable character? Is it a control character? Is it part of a multibyte character? And setting, for example, LC_CTYPE=de_DE.ISO8859-1 does in no way limit you to German. For LC_CTYPE purposes, the language/country part of the locale specification isn't used. This is definitely a PEBKAC. -- Christian "naddy" Weisgerber naddy@mips.inka.de
Christian Weisgerber wrote: > Julian H. Stacey: > > > I don't know about ISO 8859-1 and UTF-8, (I dislike & avoid > > national char set stuff as much as possible), but I want > > That is your problem right there. My perspective & experience or `problem' as you mislabel it, is I was supporting Unix Internationalisation back in 1985, & long since tired of agravating German umlauts issues (Umlauts even back then had AE OE UE [& SS] replacements but few used them). Your problem is being German you had an incentive to attain umlauts, & probably being younger, wasted less time achieving umlauts going straight to the since available UTF; but myopic that others may be averse to waste more time for superflous national oddities that cleaner Roman derivatives like Italian & English etc find superfluous. It seemed best to make fmt.c 8 bit clean[er], to help process arbitrary text, harm no one, & not disturb users of eg UTF. Your problem is you would obstruct a cleaner fmt, so fmt continues to fail until users are forced to waste their time too like you did, reading & configuring internationalisation variables some don't need. ** > > to be able to edit files that simultaneously contain eg all > > of English German & French etc, so setting some var to eg > > just German would be inappropriate. 8 bit clean would be ideal, > > next best would be my patches I suppose. > > You MUST define a character set for this. "8-bit clean" is meaningless > for a tool that deals with runs of characters. Without a defined > character set, you have no idea what those bytes mean. Is 0x90 a Not true. See below. ** > printable character? Is it a control character? Is it part of a > multibyte character? > > And setting, for example, LC_CTYPE=de_DE.ISO8859-1 does in no way > limit you to German. For LC_CTYPE purposes, the language/country > part of the locale specification isn't used. > > This is definitely a PEBKAC. Avoid junk acronyms. Re-Read original post http://lists.freebsd.org/pipermail/freebsd-hackers/2010-May/031901.html Particularly: Example: Pasting notes into an xterm, clauses from http://seafrance.com in English then French original & German, to get the feel of what an unclear English translation **: Sometimes I mouse paste from Firefox in English, French, German & other languages, making notes in a single file with vi in an xterm, all with standard env. no Locale. & it edits OK in vi, & displays with cat in xterm, till !}fmt in vi wraps long lines, when fmt breaks it. So I fixed fmt. It would Not be appropriate to set a German locale, nor a French etc. Other utils might misbehave now or later See eg man sort re LC_ALL. No way I'd keep exiting vi & resetting LC_CTYPE between mouse pastes from different language pages, The default American works fine. I'm not bothered if vi+xterm might mis-display some odd accent, as I can see something is there, so long as fmt does not strip the accent, but FreeBSD fmt.c Does strip the French accents & German umlauts, that's why I fixed fmt.c Summary: Making fmt.c 8 bit cleaner would not break UTF & unicode I believe so no reason to object to removal of fmt.c '& 0x7f' cruft etc. Cheers, Julian -- Julian Stacey, BSD Unix Linux C Sys Eng Consultant, Munich http://berklix.com Interleave replies below like a play script. Indent old text with "> ". Send plain text, not quoted-printable, HTML, base64, or multipart/alternative. Extradite NSA spy chief Alexander. http://berklix.eu/jhs/blog/2013_10_30
For bugs matching the following criteria: Status: In Progress Changed: (is less than) 2014-06-01 Reset to default assignee and clear in-progress tags. Mail being skipped
Keyword: patch or patch-ready – in lieu of summary line prefix: [patch] * bulk change for the keyword * summary lines may be edited manually (not in bulk). Keyword descriptions and search interface: <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>