Bug 233567 - sysutils/screen: screen crashes after a while with vertical regions on stable/12 and head
Summary: sysutils/screen: screen crashes after a while with vertical regions on stable...
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: Cy Schubert
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-27 15:45 UTC by Trond Endrestøl
Modified: 2021-02-19 22:15 UTC (History)
1 user (show)

See Also:
bugzilla: maintainer-feedback? (cy)


Attachments
List of installed ports (57.59 KB, text/plain)
2018-11-28 07:15 UTC, Trond Endrestøl
no flags Details
.screenrc (1.48 KB, text/plain)
2018-11-28 07:17 UTC, Trond Endrestøl
no flags Details
This patch works. (3.04 KB, patch)
2021-02-15 22:59 UTC, Cy Schubert
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Trond Endrestøl 2018-11-27 15:45:43 UTC
Some time after 12.0-CURRENT made its debut, sysutils/screen started crashing whenever I use vertical regions, C-x |. It doesn't crash immediately. 11.x and older doesn't have this problem. sysutils/screen isn't compiled differently in any of the actively maintained branches. It doesn't matter if I use plain xterm, sterm (st), Alacritty, or PuTTY from Windows. Running sysutils/screen without any regions is pretty much safe. Maybe sysutils/tmux is the way forward. I know this isn't much to go on, but at least a PR has been created.
Comment 1 Cy Schubert freebsd_committer freebsd_triage 2018-11-27 20:24:06 UTC
I'm not able to reproduce this on 13-CURRENT. How long does it take before it crashes?

Do you have a core dump you can share?

What other packages or ports do you have installed?

What options did you use when building screen? Or did you use a package?

Can you provide uname -a output please?
Comment 2 Trond Endrestøl 2018-11-28 07:15:19 UTC
(In reply to Cy Schubert from comment #1)
It may take an hour or so. Running synth along with htop and gstat should suffice. I believe you can substitute make buildworld for synth.

No core dumps were created.

See attached list for installed ports.

screen is configured like this:

Options        :
        INFO           : on
        MAN            : on
        NAMED_PIPES    : off
        NCURSES_BASE   : off
        NCURSES_DEFAULT: on
        NCURSES_PORT   : off
        NETHACK        : on
        SHOWENC        : off
        SOCKETS        : on
        SYSTEM_SCREENRC: on
        XTERM_256      : on

uname -aKU; freebsd-version -ku
FreeBSD FQDN 12.0-PRERELEASE FreeBSD 12.0-PRERELEASE #0 r340833: Fri Nov 23 16:30:43 CET 2018     root@FQDN:/usr/obj/usr/src/amd64.amd64/sys/E5530  amd64 1200500 1200500
12.0-PRERELEASE
12.0-PRERELEASE

I usually run my screen with window 1 (gstat) at the top, with window 2 (synth) and 3 (htop) side by side beneath window 1, i.e.:

C-a 1
C-a s
C-a :resize 7
C-a TAB
C-a |
C-a 2
C-a :resize 80
C-a TAB
C-a 3
C-a TAB
C-a TAB

My .screenrc is also attached.
Comment 3 Trond Endrestøl 2018-11-28 07:15:55 UTC
Created attachment 199630 [details]
List of installed ports
Comment 4 Trond Endrestøl 2018-11-28 07:17:10 UTC
Created attachment 199631 [details]
.screenrc
Comment 5 Trond Endrestøl 2018-11-28 07:24:18 UTC
(In reply to Trond.Endrestol from comment #4)
Never mind the titles I've given to the numerous windows, the file once belonged to a different system.
Comment 6 Trond Endrestøl 2018-11-28 07:26:29 UTC
(In reply to Trond.Endrestol from comment #2)
That should be C-a S, not C-a s.
Comment 7 Trond Endrestøl 2018-11-28 09:04:18 UTC
(In reply to Cy Schubert from comment #1)
I managed to reproduce the crash, but sadly no core files was produced this time either despite the promised core dump. Is there a way I can ensure the creation of a core dump? "ulimit -c" is set to unlimited.

The crash happened as I was switching between windows in the bottom left region (C-a n, C-a p).
Screenshot is available at https://ximalas.info/~trond/screen-2018-11-28/
The system ran stable/12 r341120 when this crash happened.

The last time this happened, screen had been left untouched over night with synth busy compiling my ports.

I failed to reproduce this crash using the same layout on a VM running head r340929 while running make buildworld buildkernel. I used PuTTY from Windows and ran PuTTY in full screen mode. I didn't fool around switching between the windows like I just did on my laptop, and maybe that's a clue.
Comment 8 Cy Schubert freebsd_committer freebsd_triage 2018-11-28 15:24:56 UTC
Reason there is no dump is screen is setuid root. Without a dump I will not be able to fix. So, please turn off the setuid bit (chmod -s /usr/local/bin/screen), then use it until it crashes. You won't need it to be setuid root unless you use it for screen sharing with other users.

Then post the dump.

Can you give me uname -a too?
Comment 9 Trond Endrestøl 2018-11-28 16:02:51 UTC
(In reply to Cy Schubert from comment #8)
I created a new screen, detached from it, reattached, and attached gdb to the forked off screen process earlier this morning. I'd like for synth to finish the current run, and then I'll muck about.

uname -a:

FreeBSD FQDN 12.0-PRERELEASE FreeBSD 12.0-PRERELEASE #0 r340833: Fri Nov 23 16:30:43 CET 2018     root@FQDN:/usr/obj/usr/src/amd64.amd64/sys/E5530  amd64 1200500 1200500
Comment 10 Trond Endrestøl 2018-11-28 17:50:28 UTC
(In reply to Trond.Endrestol from comment #9)
Sorry, it's now running:

FreeBSD FQDN 12.0-PRERELEASE FreeBSD 12.0-PRERELEASE #0 r341120: Wed Nov 28 08:55:40 CET 2018     root@FQDN:/usr/obj/usr/src/amd64.amd64/sys/E5530  amd64 1200500 1200500
Comment 11 Trond Endrestøl 2018-11-29 18:23:59 UTC
(In reply to Trond.Endrestol from comment #10)
I'm unable to reproduce the crash despite several attempts today.
screen is running without the setuid bit, so I should be able to get a core dump eventually, I hope.
I'll keep trying to reproduce the crash throughout the weekend while checking out if OpenSSL from base in stable/12 is useable by the ports I have installed.
Comment 12 Trond Endrestøl 2018-12-01 06:54:53 UTC
(In reply to Trond.Endrestol from comment #11)
We now have a core file to work with:

https://ximalas.info/~trond/screen-2018-11-28/screen-2018-12-01T04:15+0100.core

$ gdb /usr/local/bin/screen screen-2018-12-01T04\:15+0100.core 
GNU gdb (GDB) 8.2 [GDB v8.2 for FreeBSD]
[...]
Reading symbols from /usr/local/bin/screen...done.
[New LWP 100551]
Core was generated by `screen'.
Program terminated with signal SIGABRT, Aborted.
#0  thr_kill () at thr_kill.S:3
3       RSYSCALL(thr_kill)
(gdb) bt
#0  thr_kill () at thr_kill.S:3
#1  0x00000008005188b4 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
#2  0x000000080048b0e9 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
#3  0x0000000000217d83 in CoreDump ()
#4  <signal handler called>
#5  0x000000000025f64f in LayPause ()
#6  0x00000000002328d6 in win_readev_fn ()
#7  0x0000000000261a67 in sched ()
#8  0x000000000021733e in main ()
(gdb) 

uname -aKU was at the time:

FreeBSD FQDN 12.0-PRERELEASE FreeBSD 12.0-PRERELEASE #0 r341345: Fri Nov 30 20:40:30 CET 2018     root@FQDN:/usr/obj/usr/src/amd64.amd64/sys/E5530  amd64 1200500 1200500

A new screenshot is available from https://ximalas.info/~trond/screen-2018-11-28/
https://ximalas.info/~trond/screen-2018-11-28/screenshot-2018-12-01-01.png
Comment 13 Trond Endrestøl 2018-12-01 12:32:00 UTC
(In reply to Trond.Endrestol from comment #12)
Another core file has emerged:

https://ximalas.info/~trond/screen-2018-11-28/screen-2018-12-01T12:49+0100.core

Its backtrace goes like this:

[New LWP 100875]
Core was generated by `screen'.
Program terminated with signal SIGABRT, Aborted.
#0  thr_kill () at thr_kill.S:3
3       RSYSCALL(thr_kill)
(gdb) bt
#0  thr_kill () at thr_kill.S:3
#1  0x00000008005188b4 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
#2  0x000000080048b0e9 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
#3  0x0000000000217d83 in CoreDump ()
#4  <signal handler called>
#5  0x000000000025f64f in LayPause ()
#6  0x00000000002328d6 in win_readev_fn ()
#7  0x0000000000261a67 in sched ()
#8  0x000000000021733e in main ()
(gdb) 

It crashes in the same location as before.

I'll see if I can create a screen executable containing debug info once the current synth batch is complete.
Comment 14 Trond Endrestøl 2018-12-03 05:36:30 UTC
(In reply to Trond.Endrestol from comment #13)
I compiled screen with debug symbols and a new core file is now available:

https://ximalas.info/~trond/screen-2018-11-28/screen-2018-12-02T23:27+0100.core

This time the backtrace goes like:

[New LWP 100768]
Core was generated by `screen'.
Program terminated with signal SIGABRT, Aborted.
#0  thr_kill () at thr_kill.S:3

warning: Source file is more recent than executable.
3       RSYSCALL(thr_kill)
(gdb) bt
#0  thr_kill () at thr_kill.S:3
#1  0x00000008005188b4 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
#2  0x000000080048b0e9 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
#3  0x0000000000217d83 in CoreDump (sigsig=<optimized out>) at screen.c:1660
#4  <signal handler called>
#5  0x000000000025f65f in LayPause (layer=0x800e4b018, pause=<optimized out>) at layer.c:1160
#6  0x00000000002328d6 in win_readev_fn (ev=<optimized out>, data=0x800e4b000 "") at window.c:1959
#7  0x0000000000261a77 in sched () at sched.c:237
#8  0x000000000021733e in main (ac=0, av=0x7fffffffe4d0) at screen.c:1466

Moving up the stack frames, we get to LayPause():

(gdb) up
#1  0x00000008005188b4 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
52              return (__sys_thr_kill(id, s));
(gdb) 
#2  0x000000080048b0e9 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
67              (void)raise(SIGABRT);
(gdb) 
#3  0x0000000000217d83 in CoreDump (sigsig=<optimized out>) at screen.c:1660
1660      for (disp = displays; disp; disp = disp->d_next) {
(gdb) 
#4  <signal handler called>
(gdb) 
#5  0x000000000025f65f in LayPause (layer=0x800e4b018, pause=<optimized out>) at layer.c:1160
1160                if (dw_left(ml, xe, UTF8))

This gives us some more context. It's related to the handling of UTF-8, which might explain why not so many are hit by this bug:

(gdb) list
1155              if (xe > vp->v_xe) xe = vp->v_xe;
1156    
1157    #if defined(DW_CHARS) && defined(UTF8)
1158              if (layer->l_encoding == UTF8 && xe < vp->v_xe && win) {
1159                struct mline *ml = win->w_mlines + line;
1160                if (dw_left(ml, xe, UTF8))
1161                  xe++;
1162              }
1163    #endif
1164    

I need to disable optimizations and recompile to get accurate values in the next core dump:

(gdb) info locals
ml = <optimized out>
xs = 80
xe = 232
vp = 0x800780150
win = <optimized out>
cv = 0x8010280c0
line = 45
(gdb) print *layer
$3 = {l_cvlist = 0x8010280c0, l_width = 192, l_height = 63, l_x = 61, l_y = 54, l_encoding = 8, l_layfn = 0x269f98 <WinLf>, l_data = 0x800e4b000, l_next = 0x0, l_bottom = 0x800e4b018, l_blocking = 0, l_mode = 0, l_mouseevent = {buffer = "\000\000", len = 0, start = 0}, l_pause = {d = 0, left = 0x8007ea600, 
    right = 0x8007ea780, top = 45, bottom = 54, lines = 94}}
(gdb) print *vp
$4 = {v_next = 0x0, v_canvas = 0x8010280c0, v_xoff = 80, v_yoff = 7, v_xs = 80, v_xe = 271, v_ys = 7, v_ye = 69}
(gdb)
Comment 15 Trond Endrestøl 2018-12-09 09:19:51 UTC
(In reply to Trond.Endrestol from comment #14)
I have now disabled optimizations for screen, and this is what it looks like after the most recent crash:

gdb /usr/local/bin/screen screen-2018-12-09T09\:59+0100.core 
GNU gdb (GDB) 8.2 [GDB v8.2 for FreeBSD]
[...]
Reading symbols from /usr/local/bin/screen...done.
[New LWP 100734]
Core was generated by `screen'.
Program terminated with signal SIGABRT, Aborted.
#0  thr_kill () at thr_kill.S:3
3       RSYSCALL(thr_kill)
(gdb) bt
#0  thr_kill () at thr_kill.S:3
#1  0x000000080053f904 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
#2  0x00000008004b20e9 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
#3  0x0000000000217f40 in CoreDump (sigsig=11) at screen.c:1678
#4  <signal handler called>
#5  0x0000000000282189 in LayPause (layer=0x800e76018, pause=0) at layer.c:1160
#6  0x000000000024247f in win_readev_fn (ev=0x800e760c8, data=0x800e76000 "") at window.c:1959
#7  0x0000000000286315 in sched () at sched.c:237
#8  0x0000000000217758 in main (ac=0, av=0x7fffffffe4b8) at screen.c:1466
(gdb) up
#1  0x000000080053f904 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
52              return (__sys_thr_kill(id, s));
(gdb) 
#2  0x00000008004b20e9 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
67              (void)raise(SIGABRT);
(gdb) 
#3  0x0000000000217f40 in CoreDump (sigsig=11) at screen.c:1678
1678        abort();
(gdb) 
#4  <signal handler called>
(gdb) 
#5  0x0000000000282189 in LayPause (layer=0x800e76018, pause=0) at layer.c:1160
1160                if (dw_left(ml, xe, UTF8))
(gdb) list
1155              if (xe > vp->v_xe) xe = vp->v_xe;
1156    
1157    #if defined(DW_CHARS) && defined(UTF8)
1158              if (layer->l_encoding == UTF8 && xe < vp->v_xe && win) {
1159                struct mline *ml = win->w_mlines + line;
1160                if (dw_left(ml, xe, UTF8))
1161                  xe++;
1162              }
1163    #endif
1164    
(gdb) info locals
ml = 0x800e73b70
xs = 80
xe = 250
vp = 0x8007a7150
cv = 0x801055280
line = 61
win = 0x800e76000
(gdb) print *ml
$1 = {image = 0x80132f9e0 "36382     1 36382 36382 36382 trond     1000  20   0 12208  3024 S   0  0.0  0.0  0:00.00    \034", attr = 0x800e44000 "", font = 0x8013bbf20 "", fontx = 0x800e0d3c0 "", color = 0x8013bbe40 "", colorx = 0x800e0d3c0 ""}
(gdb) print *layer
$2 = {l_cvlist = 0x801055280, l_width = 192, l_height = 63, l_x = 0, l_y = 0, l_encoding = 8, l_layfn = 0x291068 <WinLf>, l_data = 0x800e76000, l_next = 0x0, l_bottom = 0x800e76018, l_blocking = 0, l_mode = 0, l_mouseevent = {buffer = "\000\000", len = 0, start = 0}, l_pause = {d = 0, left = 0x800e15300, 
    right = 0x800e15480, top = 0, bottom = 61, lines = 94}}
(gdb) 

Core file is: https://ximalas.info/~trond/screen-2018-11-28/screen-2018-12-09T09:59+0100.core
A screenshot showing screen after it crashed: https://ximalas.info/~trond/screen-2018-11-28/screenshot-2018-12-09-01.png
Comment 16 Trond Endrestøl 2021-02-15 18:02:16 UTC
I just noticed r565281 and it can be related to this issue.

Unfortunately the patches for ansi.c and encoding.c leaves the terminal rather useless with every character being rendered as ÿ. I have recompiled my screen-4.8.0_1 without patch-ansi.c and patch-encoding.c to regain useability. This is on stable/12 amd64 r369150.
Comment 17 Christos Chatzaras 2021-02-15 18:22:25 UTC
(In reply to Trond.Endrestol from comment #16)

I see the same here:

https://forums.freebsd.org/threads/sysutils-screen-and-strange-characters.78891/

It started after I upgrade to 4.8.0_1

I happens with "LANG=en_US.UTF-8" but not with "LANG=C".

Should I open a new bug report about this?
Comment 18 Trond Endrestøl 2021-02-15 20:18:25 UTC
(In reply to Christos Chatzaras from comment #17)
Your guess is as good as mine. However this PR is UTF-8 related, so maybe we can continue here for the time being.
Comment 19 Trond Endrestøl 2021-02-15 20:20:26 UTC
I noticed a typo on line 16 of the patch-encoding.c.
The original line reads:

if (c >- 0xdf00 && c <= 0xdfff)

This is probably the intended code:

if (c >= 0xdf00 && c <= 0xdfff)
Comment 20 Cy Schubert freebsd_committer freebsd_triage 2021-02-15 21:03:57 UTC
git apply and patch didn't work. I cut&pasted most lines. I copied one or two by hand.

I'll look at it again tonight.
Comment 21 Cy Schubert freebsd_committer freebsd_triage 2021-02-15 22:59:39 UTC
Created attachment 222478 [details]
This patch works.

Fixing the typo resolves the regression.
Comment 22 Trond Endrestøl 2021-02-16 10:34:50 UTC
(In reply to Cy Schubert from comment #21)
pkg-mgmt/dialog4ports and Midnight Commander (misc/mc) have problems drawing a non-black background while inside screen. dialog4ports doesn't draw the blue background when run inside screen. The default theme in MidC is missing background color where there is whitespace. MidC themes like yadt256 uses a black background and is "immune" to this issue. Does anyone else notice this behaviour?
Comment 23 Trond Endrestøl 2021-02-16 13:12:51 UTC
(In reply to Trond.Endrestol from comment #22)
Mystery solved. I enforced xterm-256color when I should just leave it as screen-256color.
Comment 24 Trond Endrestøl 2021-02-18 21:08:54 UTC
(In reply to Trond.Endrestol from comment #7)
I've been running screen 4.8.0_3 as described in comments 2, 6, and 7, on my laptop (14.0-CURRENT), and on the builder at work (12.2-STABLE), for the past few days. No crashes so far.
Comment 25 Trond Endrestøl 2021-02-19 22:15:27 UTC
(In reply to Trond.Endrestol from comment #24)
Well, I spoke too soon. I had screen crashing two times at work today, although my laptop at home has yet to do the same. This leaves me with the impression that there still exists other UTF-8 related issues within screen. Luckily, FreeBSD 13 & 14 has switched to UTF-8 as the default character set (for root anyway). Hopefully everyone follows suit and gives UTF-8 its proper shakedown.