Bug 229682 - x11/xterm (possible terminal driver problem): UTF-8 characters cause the misalignment of tabstops
Summary: x11/xterm (possible terminal driver problem): UTF-8 characters cause the misa...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Many People
Assignee: Emanuel Haupt
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-11 02:32 UTC by kd-dev
Modified: 2018-09-20 09:29 UTC (History)
3 users (show)

See Also:
bugzilla: maintainer-feedback? (ehaupt)


Attachments
patch setting oflags to TAB0 by default (605 bytes, patch)
2018-09-05 07:06 UTC, Emanuel Haupt
no flags Details | Diff
Screenshot (16.89 KB, image/png)
2018-09-05 07:08 UTC, Emanuel Haupt
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description kd-dev 2018-07-11 02:32:30 UTC
In the default configuration of x11/xterm UTF-8 characters cause tabs
that occur after them to be misaligned according to their byte-length:

Observed behavior:
	$ printf 'a\tb\n\302\254\tb\n'
	a       b
	¬      b

Expected behavior:
	$ printf 'a\tb\n\302\254\tb\n'
	a       b
	¬       b

Calling `stty tabs' or `stty tab0' beforehand results in the expected
behaviour.
Comment 1 Emanuel Haupt freebsd_committer 2018-07-11 06:36:43 UTC
Confirmed. It's not isolated to the latest release. I've downgraded to 332 and observed the same behavior.

I've cc'ed Thomas from upstream.
Comment 2 Thomas E. Dickey 2018-07-11 08:13:12 UTC
I asked yesterday if the stty hardware tabstops were used or not, and Drake replied that the misbehavior was seen using the software (terminal-driver) tabs.

I don't see anything in my #333 changes which hints at this area.
Comment 3 Emanuel Haupt freebsd_committer 2018-07-11 08:35:19 UTC
(In reply to dickey from comment #2)
It happens in 332 as well. It was not introduced in changes between 332 and 333.
Comment 4 Thomas E. Dickey 2018-07-11 09:29:40 UTC
I'll have to investigate to pinpoint the problem.  The example
appears to show that the terminal driver is assuming the character
0xac is using more than one cell.  That would be the same effect
as if the print statement were modified to something like this:

printf 'a\tb\n\302\254\tb\n\302\254       b\n'

that is, 7 spaces between the two characters.  If xterm's showing
that incorrectly, that's a problem with xterm (or the locale settings
which it uses for width computation).

Drake said this happened in 12-Current (which I've not installed).
Comment 5 kd-dev 2018-07-11 12:41:30 UTC
ktrace(1) shows that x11/xterm does not receive any tab characters:

 75446 xterm    GIO   fd 4 read 24 bytes
       0x0000 6120 2020 2020 2020 620d 0ac2 ac20 2020  |a       b....   |
       0x0010 2020 2062 0d0a 2420                      |   b..$ |

Other terminals (vt(4) & x11/sterm) do receive tab characters, and
x11/xterm does when `stty tabs' is set.  I have tried changing the
`c_oflags' setting in the x11/xterm source and changing various
tab-related settings in the `termcap' database but the default
behavior remained the same.
Comment 6 Emanuel Haupt freebsd_committer 2018-07-11 13:16:28 UTC
(In reply to dickey from comment #4)
It also happens in 11.2-RELEASE.
Comment 7 Thomas E. Dickey 2018-07-12 00:27:55 UTC
Drake's octal dump in comment #5 shows 6 spaces.

That has to be coming from the terminal driver,
which apparently is confused about the width of the
character.  If it were sending 7 spaces, as I commented
before, and if xterm miscounted, that would be a bug in xterm.

Offhand, the only "recent" changes that I recall making in xterm
deal with the wrapping behavior, and the treatment of ambiguous-width
characters.  This doesn't fall into that category (but there's always
the potential for bugs).

The default setting for tabs has long been to use the terminal driver,
since it's safer than assuming that some terminal handles hardware tabs,
and that the tab-stops have been initialized properly.  (A quick check
of (u)rxvt, Terminal.app, iTerm2 shows the same default, for instance).

I can add something to the ttyMode resource to simplify initialization,
but that's a feature, not a bug-fix.
Comment 8 Emanuel Haupt freebsd_committer 2018-07-12 11:52:58 UTC
(In reply to dickey from comment #7)
Thank you for your analysis. I've added ed@ cc. Maybe he can shed some light into the terminal driver related issue.
Comment 9 Ed Schouten freebsd_committer 2018-09-04 07:07:24 UTC
This can be attributed to the fact that FreeBSD's TTY layer does not implement termios IUTF8, right?
Comment 10 Thomas E. Dickey 2018-09-04 07:52:12 UTC
iutf8 applies to input characters.

The example isn't explicit, but I read it as output behavior.
Drake may clarify.
Comment 11 kd-dev 2018-09-05 00:49:53 UTC
(In reply to Ed Schouten from comment #9)
This happens because the configuration of x11/xterm in ports relies
on FreeBSD's TTY layer to expand tabs, but the TTY layer expands tabs
based on the column number which is updated with the byte count and so
does not support multi-byte characters.

To further clarify: this is about TTY output, and this would not be an
issue if FreeBSD's TTY layer supported multi-byte characters or if the
port was patched to disable `TAB3' when applicable.
Comment 12 Emanuel Haupt freebsd_committer 2018-09-05 07:06:36 UTC
Created attachment 196879 [details]
patch setting oflags to TAB0 by default

Screenshot showing stty behaviour
Comment 13 Emanuel Haupt freebsd_committer 2018-09-05 07:08:12 UTC
Created attachment 196880 [details]
Screenshot

Thank you for clarifying. I've seen that other terminals (gnome-terminal, xfce4-terminal) already use tab0 by default.

The attached quick patch seems to do the trick (see screenshot). Thomas, would it be acceptable to to add a ifdef __freebsd__ macro in main.c setting d_tio.c_oflag to TAB0?
Comment 14 Thomas E. Dickey 2018-09-05 09:29:21 UTC
A patch to the initialization seems like clutter.

If I added a tabs keyword to the ttyModes resource parsing,
then that could be patched in the system's app-defaults file.
Comment 15 Emanuel Haupt freebsd_committer 2018-09-05 14:52:56 UTC
Sounds good to me.
Comment 16 commit-hook freebsd_committer 2018-09-20 09:19:32 UTC
A commit references this bug:

Author: ehaupt
Date: Thu Sep 20 09:19:07 UTC 2018
New revision: 480159
URL: https://svnweb.freebsd.org/changeset/ports/480159

Log:
  - Update to 336
  - Pacify portlint
  - Since FreeBSD terminal drivers are not aware of UTF-8, change default ttyModes
    to tabs (tab0).

  PR:		229682 (based on)

Changes:
  head/x11/xterm/Makefile
  head/x11/xterm/distinfo
  head/x11/xterm/files/
  head/x11/xterm/files/patch-UXTerm.ad