Bug 210995 - cat -v fails to tag characters in extended table with M- prefix with some locales
Summary: cat -v fails to tag characters in extended table with M- prefix with some loc...
Status: Closed Works As Intended
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs mailing list
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2016-07-11 03:52 UTC by Sevan Janiyan
Modified: 2016-09-13 08:30 UTC (History)
3 users (show)

See Also:


Attachments
remove isprint check & switch setlocale() to use LC_ALL (558 bytes, patch)
2016-07-11 03:52 UTC, Sevan Janiyan
no flags Details | Diff
use wide chars with fallback to plain ones (3.24 KB, patch)
2016-08-05 14:05 UTC, Andrey A. Chernov
no flags Details | Diff
use wide chars with fallback to plain ones (3.24 KB, patch)
2016-08-05 14:11 UTC, Andrey A. Chernov
no flags Details | Diff
use wide chars with fallback to plain ones (3.29 KB, patch)
2016-08-05 14:41 UTC, Andrey A. Chernov
no flags Details | Diff
use wide chars with fallback to plain ones (1.26 KB, patch)
2016-09-13 08:20 UTC, Andrey A. Chernov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sevan Janiyan 2016-07-11 03:52:59 UTC
Created attachment 172364 [details]
remove isprint check & switch setlocale() to use LC_ALL

Using the following test:
setenv LC_ALL de_CH.ISO8859-1
foreach i ( `jot 178 178 356` )
echo a | tr 'a' "\$i" | cat -v
end

output will contain some printable characters listed instead with a question mark.

removing the !isprint() check for vflag results in printable characters being displayed correctly again. Making the behaviour of cat -v consistent whether locale is set to C (default) or another.

Attached patch removes the !isprint() check and switches from just setting locale settings set in LC_CTYPE to LC_ALL to import all set locale variables.
Comment 1 John Baldwin freebsd_committer freebsd_triage 2016-08-04 17:10:07 UTC
Taking a guess at adding some folks I think might be more knowledgable about locales to cc.
Comment 2 Andrey A. Chernov freebsd_committer 2016-08-05 06:38:43 UTC
Removing isprint() is good, but why you need LC_ALL?
Comment 3 Andrey A. Chernov freebsd_committer 2016-08-05 06:54:20 UTC
From second thought, this patch does not help at all.
Looking at the code I realize that cat should use wide chars everywhere, getwc(), isw*() etc (with fallback to getc() on error). Current code suppose single byte locale and will works incorrectly for any multibyte locale.
Comment 4 Andrey A. Chernov freebsd_committer 2016-08-05 14:05:02 UTC
Created attachment 173315 [details]
use wide chars with fallback to plain ones

More testing required, I test a little.
Comment 5 Andrey A. Chernov freebsd_committer 2016-08-05 14:11:58 UTC
Created attachment 173316 [details]
use wide chars with fallback to plain ones

Move err label higher to not miss clearerr(fp)
Comment 6 Andrey A. Chernov freebsd_committer 2016-08-05 14:41:24 UTC
Created attachment 173318 [details]
use wide chars with fallback to plain ones

Use traditional M-^X form for non-ascii controls
Comment 7 Andrey A. Chernov freebsd_committer 2016-08-05 14:49:38 UTC
BTW, this is wrong:
foreach i ( `jot 178 178 356` )
echo a | tr 'a' "\$i" | cat -v

tr suppose to take \octal numbers only, but they are decimal.
Comment 8 Andrey A. Chernov freebsd_committer 2016-08-05 15:02:28 UTC
It must be 
foreach i ( `( echo obase=8 ; jot 178 178 255 1) | bc` )
Comment 9 Sevan Janiyan 2016-08-15 09:50:21 UTC
(In reply to Andrey Chernov from comment #2)
unless you want to be specific in the LC_ variables to inherit, it's better moving forward that you opt for LC_ALL, at present (tested on FreeBSD), only having LC_CTYPE means that localisation is broken

Eg
Stock cat in base:
% cat ouauouau
cat: ouauouau: No such file or directory

Using LC_ALL instead.
% /usr/obj/usr/src/bin/cat/cat thotuhou
cat: thotuhou: Datei oder Verzeichnis nicht gefunden

I could ammend the patch to use the addition of the specific variable but it'd create less churn in the future to switch or add new additions to setlocale().
Comment 10 Sevan Janiyan 2016-08-15 10:14:48 UTC
(In reply to Andrey Chernov from comment #6)

Will give the new patch a spin & provide feedback


(In reply to Andrey Chernov from comment #8)

thanks, I'll use that in my testing
Comment 11 Andrey A. Chernov freebsd_committer 2016-09-13 08:20:13 UTC
Created attachment 174708 [details]
use wide chars with fallback to plain ones

Better version
Comment 12 Andrey A. Chernov freebsd_committer 2016-09-13 08:30:57 UTC
Since initial test was wrong (see my fixed version at
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210995#c8 )
my patch does not solve this problem, because the problem does not exist.
It solves completely different problem with multibyte encodings, and I plan to commit it soon.
About LC_ALL, please fill separate PR.