Bug 233567 - sysutils/screen: screen crashes after a while with vertical regions on stable/12 and head
Summary: sysutils/screen: screen crashes after a while with vertical regions on stable...
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: Cy Schubert
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-27 15:45 UTC by Trond.Endrestol
Modified: 2018-12-09 09:19 UTC (History)
0 users

See Also:
bugzilla: maintainer-feedback? (cy)


Attachments
List of installed ports (57.59 KB, text/plain)
2018-11-28 07:15 UTC, Trond.Endrestol
no flags Details
.screenrc (1.48 KB, text/plain)
2018-11-28 07:17 UTC, Trond.Endrestol
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Trond.Endrestol 2018-11-27 15:45:43 UTC
Some time after 12.0-CURRENT made its debut, sysutils/screen started crashing whenever I use vertical regions, C-x |. It doesn't crash immediately. 11.x and older doesn't have this problem. sysutils/screen isn't compiled differently in any of the actively maintained branches. It doesn't matter if I use plain xterm, sterm (st), Alacritty, or PuTTY from Windows. Running sysutils/screen without any regions is pretty much safe. Maybe sysutils/tmux is the way forward. I know this isn't much to go on, but at least a PR has been created.
Comment 1 Cy Schubert freebsd_committer 2018-11-27 20:24:06 UTC
I'm not able to reproduce this on 13-CURRENT. How long does it take before it crashes?

Do you have a core dump you can share?

What other packages or ports do you have installed?

What options did you use when building screen? Or did you use a package?

Can you provide uname -a output please?
Comment 2 Trond.Endrestol 2018-11-28 07:15:19 UTC
(In reply to Cy Schubert from comment #1)
It may take an hour or so. Running synth along with htop and gstat should suffice. I believe you can substitute make buildworld for synth.

No core dumps were created.

See attached list for installed ports.

screen is configured like this:

Options        :
        INFO           : on
        MAN            : on
        NAMED_PIPES    : off
        NCURSES_BASE   : off
        NCURSES_DEFAULT: on
        NCURSES_PORT   : off
        NETHACK        : on
        SHOWENC        : off
        SOCKETS        : on
        SYSTEM_SCREENRC: on
        XTERM_256      : on

uname -aKU; freebsd-version -ku
FreeBSD FQDN 12.0-PRERELEASE FreeBSD 12.0-PRERELEASE #0 r340833: Fri Nov 23 16:30:43 CET 2018     root@FQDN:/usr/obj/usr/src/amd64.amd64/sys/E5530  amd64 1200500 1200500
12.0-PRERELEASE
12.0-PRERELEASE

I usually run my screen with window 1 (gstat) at the top, with window 2 (synth) and 3 (htop) side by side beneath window 1, i.e.:

C-a 1
C-a s
C-a :resize 7
C-a TAB
C-a |
C-a 2
C-a :resize 80
C-a TAB
C-a 3
C-a TAB
C-a TAB

My .screenrc is also attached.
Comment 3 Trond.Endrestol 2018-11-28 07:15:55 UTC
Created attachment 199630 [details]
List of installed ports
Comment 4 Trond.Endrestol 2018-11-28 07:17:10 UTC
Created attachment 199631 [details]
.screenrc
Comment 5 Trond.Endrestol 2018-11-28 07:24:18 UTC
(In reply to Trond.Endrestol from comment #4)
Never mind the titles I've given to the numerous windows, the file once belonged to a different system.
Comment 6 Trond.Endrestol 2018-11-28 07:26:29 UTC
(In reply to Trond.Endrestol from comment #2)
That should be C-a S, not C-a s.
Comment 7 Trond.Endrestol 2018-11-28 09:04:18 UTC
(In reply to Cy Schubert from comment #1)
I managed to reproduce the crash, but sadly no core files was produced this time either despite the promised core dump. Is there a way I can ensure the creation of a core dump? "ulimit -c" is set to unlimited.

The crash happened as I was switching between windows in the bottom left region (C-a n, C-a p).
Screenshot is available at https://ximalas.info/~trond/screen-2018-11-28/
The system ran stable/12 r341120 when this crash happened.

The last time this happened, screen had been left untouched over night with synth busy compiling my ports.

I failed to reproduce this crash using the same layout on a VM running head r340929 while running make buildworld buildkernel. I used PuTTY from Windows and ran PuTTY in full screen mode. I didn't fool around switching between the windows like I just did on my laptop, and maybe that's a clue.
Comment 8 Cy Schubert freebsd_committer 2018-11-28 15:24:56 UTC
Reason there is no dump is screen is setuid root. Without a dump I will not be able to fix. So, please turn off the setuid bit (chmod -s /usr/local/bin/screen), then use it until it crashes. You won't need it to be setuid root unless you use it for screen sharing with other users.

Then post the dump.

Can you give me uname -a too?
Comment 9 Trond.Endrestol 2018-11-28 16:02:51 UTC
(In reply to Cy Schubert from comment #8)
I created a new screen, detached from it, reattached, and attached gdb to the forked off screen process earlier this morning. I'd like for synth to finish the current run, and then I'll muck about.

uname -a:

FreeBSD FQDN 12.0-PRERELEASE FreeBSD 12.0-PRERELEASE #0 r340833: Fri Nov 23 16:30:43 CET 2018     root@FQDN:/usr/obj/usr/src/amd64.amd64/sys/E5530  amd64 1200500 1200500
Comment 10 Trond.Endrestol 2018-11-28 17:50:28 UTC
(In reply to Trond.Endrestol from comment #9)
Sorry, it's now running:

FreeBSD FQDN 12.0-PRERELEASE FreeBSD 12.0-PRERELEASE #0 r341120: Wed Nov 28 08:55:40 CET 2018     root@FQDN:/usr/obj/usr/src/amd64.amd64/sys/E5530  amd64 1200500 1200500
Comment 11 Trond.Endrestol 2018-11-29 18:23:59 UTC
(In reply to Trond.Endrestol from comment #10)
I'm unable to reproduce the crash despite several attempts today.
screen is running without the setuid bit, so I should be able to get a core dump eventually, I hope.
I'll keep trying to reproduce the crash throughout the weekend while checking out if OpenSSL from base in stable/12 is useable by the ports I have installed.
Comment 12 Trond.Endrestol 2018-12-01 06:54:53 UTC
(In reply to Trond.Endrestol from comment #11)
We now have a core file to work with:

https://ximalas.info/~trond/screen-2018-11-28/screen-2018-12-01T04:15+0100.core

$ gdb /usr/local/bin/screen screen-2018-12-01T04\:15+0100.core 
GNU gdb (GDB) 8.2 [GDB v8.2 for FreeBSD]
[...]
Reading symbols from /usr/local/bin/screen...done.
[New LWP 100551]
Core was generated by `screen'.
Program terminated with signal SIGABRT, Aborted.
#0  thr_kill () at thr_kill.S:3
3       RSYSCALL(thr_kill)
(gdb) bt
#0  thr_kill () at thr_kill.S:3
#1  0x00000008005188b4 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
#2  0x000000080048b0e9 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
#3  0x0000000000217d83 in CoreDump ()
#4  <signal handler called>
#5  0x000000000025f64f in LayPause ()
#6  0x00000000002328d6 in win_readev_fn ()
#7  0x0000000000261a67 in sched ()
#8  0x000000000021733e in main ()
(gdb) 

uname -aKU was at the time:

FreeBSD FQDN 12.0-PRERELEASE FreeBSD 12.0-PRERELEASE #0 r341345: Fri Nov 30 20:40:30 CET 2018     root@FQDN:/usr/obj/usr/src/amd64.amd64/sys/E5530  amd64 1200500 1200500

A new screenshot is available from https://ximalas.info/~trond/screen-2018-11-28/
https://ximalas.info/~trond/screen-2018-11-28/screenshot-2018-12-01-01.png
Comment 13 Trond.Endrestol 2018-12-01 12:32:00 UTC
(In reply to Trond.Endrestol from comment #12)
Another core file has emerged:

https://ximalas.info/~trond/screen-2018-11-28/screen-2018-12-01T12:49+0100.core

Its backtrace goes like this:

[New LWP 100875]
Core was generated by `screen'.
Program terminated with signal SIGABRT, Aborted.
#0  thr_kill () at thr_kill.S:3
3       RSYSCALL(thr_kill)
(gdb) bt
#0  thr_kill () at thr_kill.S:3
#1  0x00000008005188b4 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
#2  0x000000080048b0e9 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
#3  0x0000000000217d83 in CoreDump ()
#4  <signal handler called>
#5  0x000000000025f64f in LayPause ()
#6  0x00000000002328d6 in win_readev_fn ()
#7  0x0000000000261a67 in sched ()
#8  0x000000000021733e in main ()
(gdb) 

It crashes in the same location as before.

I'll see if I can create a screen executable containing debug info once the current synth batch is complete.
Comment 14 Trond.Endrestol 2018-12-03 05:36:30 UTC
(In reply to Trond.Endrestol from comment #13)
I compiled screen with debug symbols and a new core file is now available:

https://ximalas.info/~trond/screen-2018-11-28/screen-2018-12-02T23:27+0100.core

This time the backtrace goes like:

[New LWP 100768]
Core was generated by `screen'.
Program terminated with signal SIGABRT, Aborted.
#0  thr_kill () at thr_kill.S:3

warning: Source file is more recent than executable.
3       RSYSCALL(thr_kill)
(gdb) bt
#0  thr_kill () at thr_kill.S:3
#1  0x00000008005188b4 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
#2  0x000000080048b0e9 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
#3  0x0000000000217d83 in CoreDump (sigsig=<optimized out>) at screen.c:1660
#4  <signal handler called>
#5  0x000000000025f65f in LayPause (layer=0x800e4b018, pause=<optimized out>) at layer.c:1160
#6  0x00000000002328d6 in win_readev_fn (ev=<optimized out>, data=0x800e4b000 "") at window.c:1959
#7  0x0000000000261a77 in sched () at sched.c:237
#8  0x000000000021733e in main (ac=0, av=0x7fffffffe4d0) at screen.c:1466

Moving up the stack frames, we get to LayPause():

(gdb) up
#1  0x00000008005188b4 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
52              return (__sys_thr_kill(id, s));
(gdb) 
#2  0x000000080048b0e9 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
67              (void)raise(SIGABRT);
(gdb) 
#3  0x0000000000217d83 in CoreDump (sigsig=<optimized out>) at screen.c:1660
1660      for (disp = displays; disp; disp = disp->d_next) {
(gdb) 
#4  <signal handler called>
(gdb) 
#5  0x000000000025f65f in LayPause (layer=0x800e4b018, pause=<optimized out>) at layer.c:1160
1160                if (dw_left(ml, xe, UTF8))

This gives us some more context. It's related to the handling of UTF-8, which might explain why not so many are hit by this bug:

(gdb) list
1155              if (xe > vp->v_xe) xe = vp->v_xe;
1156    
1157    #if defined(DW_CHARS) && defined(UTF8)
1158              if (layer->l_encoding == UTF8 && xe < vp->v_xe && win) {
1159                struct mline *ml = win->w_mlines + line;
1160                if (dw_left(ml, xe, UTF8))
1161                  xe++;
1162              }
1163    #endif
1164    

I need to disable optimizations and recompile to get accurate values in the next core dump:

(gdb) info locals
ml = <optimized out>
xs = 80
xe = 232
vp = 0x800780150
win = <optimized out>
cv = 0x8010280c0
line = 45
(gdb) print *layer
$3 = {l_cvlist = 0x8010280c0, l_width = 192, l_height = 63, l_x = 61, l_y = 54, l_encoding = 8, l_layfn = 0x269f98 <WinLf>, l_data = 0x800e4b000, l_next = 0x0, l_bottom = 0x800e4b018, l_blocking = 0, l_mode = 0, l_mouseevent = {buffer = "\000\000", len = 0, start = 0}, l_pause = {d = 0, left = 0x8007ea600, 
    right = 0x8007ea780, top = 45, bottom = 54, lines = 94}}
(gdb) print *vp
$4 = {v_next = 0x0, v_canvas = 0x8010280c0, v_xoff = 80, v_yoff = 7, v_xs = 80, v_xe = 271, v_ys = 7, v_ye = 69}
(gdb)
Comment 15 Trond.Endrestol 2018-12-09 09:19:51 UTC
(In reply to Trond.Endrestol from comment #14)
I have now disabled optimizations for screen, and this is what it looks like after the most recent crash:

gdb /usr/local/bin/screen screen-2018-12-09T09\:59+0100.core 
GNU gdb (GDB) 8.2 [GDB v8.2 for FreeBSD]
[...]
Reading symbols from /usr/local/bin/screen...done.
[New LWP 100734]
Core was generated by `screen'.
Program terminated with signal SIGABRT, Aborted.
#0  thr_kill () at thr_kill.S:3
3       RSYSCALL(thr_kill)
(gdb) bt
#0  thr_kill () at thr_kill.S:3
#1  0x000000080053f904 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
#2  0x00000008004b20e9 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
#3  0x0000000000217f40 in CoreDump (sigsig=11) at screen.c:1678
#4  <signal handler called>
#5  0x0000000000282189 in LayPause (layer=0x800e76018, pause=0) at layer.c:1160
#6  0x000000000024247f in win_readev_fn (ev=0x800e760c8, data=0x800e76000 "") at window.c:1959
#7  0x0000000000286315 in sched () at sched.c:237
#8  0x0000000000217758 in main (ac=0, av=0x7fffffffe4b8) at screen.c:1466
(gdb) up
#1  0x000000080053f904 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
52              return (__sys_thr_kill(id, s));
(gdb) 
#2  0x00000008004b20e9 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
67              (void)raise(SIGABRT);
(gdb) 
#3  0x0000000000217f40 in CoreDump (sigsig=11) at screen.c:1678
1678        abort();
(gdb) 
#4  <signal handler called>
(gdb) 
#5  0x0000000000282189 in LayPause (layer=0x800e76018, pause=0) at layer.c:1160
1160                if (dw_left(ml, xe, UTF8))
(gdb) list
1155              if (xe > vp->v_xe) xe = vp->v_xe;
1156    
1157    #if defined(DW_CHARS) && defined(UTF8)
1158              if (layer->l_encoding == UTF8 && xe < vp->v_xe && win) {
1159                struct mline *ml = win->w_mlines + line;
1160                if (dw_left(ml, xe, UTF8))
1161                  xe++;
1162              }
1163    #endif
1164    
(gdb) info locals
ml = 0x800e73b70
xs = 80
xe = 250
vp = 0x8007a7150
cv = 0x801055280
line = 61
win = 0x800e76000
(gdb) print *ml
$1 = {image = 0x80132f9e0 "36382     1 36382 36382 36382 trond     1000  20   0 12208  3024 S   0  0.0  0.0  0:00.00    \034", attr = 0x800e44000 "", font = 0x8013bbf20 "", fontx = 0x800e0d3c0 "", color = 0x8013bbe40 "", colorx = 0x800e0d3c0 ""}
(gdb) print *layer
$2 = {l_cvlist = 0x801055280, l_width = 192, l_height = 63, l_x = 0, l_y = 0, l_encoding = 8, l_layfn = 0x291068 <WinLf>, l_data = 0x800e76000, l_next = 0x0, l_bottom = 0x800e76018, l_blocking = 0, l_mode = 0, l_mouseevent = {buffer = "\000\000", len = 0, start = 0}, l_pause = {d = 0, left = 0x800e15300, 
    right = 0x800e15480, top = 0, bottom = 61, lines = 94}}
(gdb) 

Core file is: https://ximalas.info/~trond/screen-2018-11-28/screen-2018-12-09T09:59+0100.core
A screenshot showing screen after it crashed: https://ximalas.info/~trond/screen-2018-11-28/screenshot-2018-12-09-01.png