Bug 240122

Summary: ddb: cont doesn't give a working terminal back
Product: Base System Reporter: Bjoern A. Zeeb <bz>
Component: kernAssignee: Mitchell Horne <mhorne>
Status: Closed FIXED    
Severity: Affects Some People CC: bz, emaste, gljennjohn, markj, mhorne
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
URL: https://reviews.freebsd.org/D29130
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223917

Description Bjoern A. Zeeb freebsd_committer freebsd_triage 2019-08-26 12:16:33 UTC
I've seen this for a long time when using ddb over an IPMI serial console and people were saying that it could be because of IPMI and whatever.

Now I had a KDB: enter: manua escape to debugger situation (unclear where from yet) on my laptop.  I said "db> cont" on v0 and it is stuck at that, not bringing back the shell session alive.   However I can switch to v1 v2 v3 and all these shells are perfectly working fine.

I wonder what is broken that returning from ddb doesn't work anymore?
Comment 1 Bjoern A. Zeeb freebsd_committer freebsd_triage 2019-08-26 20:12:31 UTC
Coming back hours later pressing <Enter> again on the "cont" line which didn't show a shell prompt, gave me the 4 <Enter>s I pressed in total (3 earlier, 1 now) and the command prompt came back.  Something is still fishy.

Anyone else with other experience (IPMI, serial, tty) would be welcome.
Comment 2 Bjoern A. Zeeb freebsd_committer freebsd_triage 2020-08-03 21:09:06 UTC
And I currently had this on a classic serial line machine as well.
sysctl debug.kdb.enter=1
ddb> cont

I still get kernel printfs coming but typing anything doesn't work.

I hope someone has an idea...  It's annoying if you cannot remotely power cycle a machine and need hands-on.
Comment 3 Gary Jennejohn 2020-08-05 17:28:15 UTC
(In reply to Bjoern A. Zeeb from comment #2)
I gave this a try in vt0 and, after entering cont in the ddb prompt the terminal recovered and my bash prompt reappeared.  A number of carriage returns were apparently emitted by the kernel, but that was not a problem.
BUT I'm using sc and NOT vt.  What are you using?
Comment 4 Mark Johnston freebsd_committer freebsd_triage 2020-10-07 13:59:32 UTC
I see it on an amd64 system.  With debug.kdb.alt_break_to_debugger=1, I can enter ddb using the alt break sequence and resuming works fine.  When I enter with sysctl debug.kdb.enter=1, I get the same hang.  Happily, I can re-enter ddb in this state using the alt break sequence, so it's possible to debug a bit.

In this state, the shell is stuck:

db> bt
Tracing pid 1447 tid 100097 td 0xfffffe000b532c00
sched_switch() at sched_switch+0x5b2/frame 0xfffffe003bac74c0
mi_switch() at mi_switch+0x155/frame 0xfffffe003bac74e0
sleepq_switch() at sleepq_switch+0x11a/frame 0xfffffe003bac7520
sleepq_catch_signals() at sleepq_catch_signals+0x262/frame 0xfffffe003bac7570
sleepq_timedwait_sig() at sleepq_timedwait_sig+0x12/frame 0xfffffe003bac75b0
_cv_timedwait_sig_sbt() at _cv_timedwait_sig_sbt+0x184/frame 0xfffffe003bac7620
tty_drain() at tty_drain+0x1cc/frame 0xfffffe003bac7680
tty_ioctl() at tty_ioctl+0x26d/frame 0xfffffe003bac76d0
ttydev_ioctl() at ttydev_ioctl+0x247/frame 0xfffffe003bac7720
devfs_ioctl() at devfs_ioctl+0xcc/frame 0xfffffe003bac7770
vn_ioctl() at vn_ioctl+0x132/frame 0xfffffe003bac7880
devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe003bac78a0
kern_ioctl() at kern_ioctl+0x276/frame 0xfffffe003bac7900
sys_ioctl() at sys_ioctl+0x127/frame 0xfffffe003bac79d0
amd64_syscall() at amd64_syscall+0x135/frame 0xfffffe003bac7af0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe003bac7af0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8005e016a, rsp = 0x7fffffffd8e8, rbp = 0x7fffffffd930 ---

Indeed, there are some bytes "stuck" in the tty queues:

db> show tty 0xfffff80004075000                                                                                                                                                                                                                                                                                                
0xfffff80004075000: ttyu0                                                                                                                                                                                                                                                                                                      
        mtx: 0xfffff80004075008                                                                                                                                                                                                                                                                                                
        flags: 0xe<INITLOCK,CALLOUT,OPENED_IN>                                                                                                                                                                                                                                                                                 
        revokecnt: 2                                                                                                                                                                                                                                                                                                           
        inq: 0xfffff80004075048 begin 0 linestart 2 reprint 2 end 2 nblocks 180 quota 180                                                                                                                                                                                                                                      
        outq: 0xfffff80004075088 begin 16 end 29 nblocks 93 quota 93                                                                                                                                                                                                                                                           
        inlow: 20736                                                                                                                                                                                                                                                                                                           
        outlow: 20757                                                                                                                                                                                                                                                                                                          
        termios: iflag 0x2b02 oflag 0x7 cflag 0xcb00 lflag 0x5cb ispeed 115200 ospeed 115200                                                                                                                                                                                                                                   
        winsize: row 87 col 319 xpixel 0 ypixel 0                                                                                                                                                                                                                                                                              
        column: 0                                                                                                                                                                                                                                                                                                              
        writepos: 0                                                                                                                                                                                                                                                                                                            
        compatflags: 0x0                                                                                                                                                                                                                                                                                                       
        termios_init_in: iflag 0x2b02 oflag 0x3 cflag 0xcb00 lflag 0x5cb ispeed 115200 ospeed 115200                                                                                                                                                                                                                           
        termios_init_out: iflag 0x2b02 oflag 0x3 cflag 0xcb00 lflag 0x5cb ispeed 115200 ospeed 115200                                                                                                                                                                                                                          
        termios_lock_in: iflag 0x0 oflag 0x0 cflag 0x0 lflag 0x0 ispeed 0 ospeed 0                                                                                                                                                                                                                                             
        termios_lock_out: iflag 0x0 oflag 0x0 cflag 0x0 lflag 0x0 ispeed 0 ospeed 0                                                                                                                                                                                                                                            
        devsw: uart_tty_class (0xffffffff818d0a08)                                                                                                                                                                                                                                                                             
          open: uart_tty_open                                                                                                                                                                                                                                                                                                  
          close: uart_tty_close                                                                                                                                                                                                                                                                                                
          outwakeup: uart_tty_outwakeup                                                                                                                                                                                                                                                                                        
          inwakeup: uart_tty_inwakeup                                                                                                                                                                                                                                                                                          
          ioctl: uart_tty_ioctl                                                                                                                                                                                                                                                                                                
          param: uart_tty_param                                                                                                                                                                                                                                                                                                
          modem: uart_tty_modem                                                                                                                                                                                                                                                                                                
          mmap: ttydevsw_defmmap                                                                                                                                                                                                                                                                                               
          pktnotify: ttydevsw_defpktnotify                                                                                                                                                                                                                                                                                     
          free: uart_tty_free                                                                                                                                                                                                                                                                                                  
        hook: 0 (0)                                                                                                                                                                                                                                                                                                            
        pgrp: 0xfffff800036f6080 gid 1447 jobc 1                                                                                                                                                                                                                                                                               
        session: 0xfffff80004587b80 count 2 leader 0xfffff80006b43000 tty 0xfffff80004075000 sid 1443 login root                                                                                                                                                                                                               
        sessioncnt: 1                                                                                                                                                                                                                                                                                                          
        devswsoftc: 0xfffff80004075800                                                                                                                                                                                                                                                                                         
        hooksoftc: 0                                                                                                                                                                                                                                                                                                           
        dev: 0xfffff8000407a000

So I guess there is some race that results in uart(4) not handling an interrupt, so ttydisc_getc() isn't getting called to drain the outq.
Comment 5 Mitchell Horne freebsd_committer freebsd_triage 2021-03-09 22:32:05 UTC
FYI, I believe I've tracked down this issue, and posted a fix to https://reviews.freebsd.org/D29130

I plan to commit in the next day or two, but if anyone else can verify that this fixes the issue for them it would be helpful.
Comment 6 Bjoern A. Zeeb freebsd_committer freebsd_triage 2021-03-10 11:27:39 UTC
Works on a machine that was broken without and works with just that patch added and kernel recompiled and installed.

Tested-by: bz

Thanks you so much!  This is going to make life a lot easier again :-)
Comment 7 commit-hook freebsd_committer freebsd_triage 2021-03-10 15:06:24 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=7e7f7beee732810d3afcc83828341ac3e139b5bd

commit 7e7f7beee732810d3afcc83828341ac3e139b5bd
Author:     Mitchell Horne <mhorne@FreeBSD.org>
AuthorDate: 2021-03-10 14:57:12 +0000
Commit:     Mitchell Horne <mhorne@FreeBSD.org>
CommitDate: 2021-03-10 15:04:42 +0000

    ns8250: don't drop IER_TXRDY on bus_grab/ungrab

    It has been observed that some systems are often unable to resume from
    ddb after entering with debug.kdb.enter=1. Checking the status further
    shows the terminal is blocked waiting in tty_drain(), but it never makes
    progress in clearing the output queue, because sc->sc_txbusy is high.

    I noticed that when entering polling mode for the debugger, IER_TXRDY is
    set in the failure case. Since this bit is never tracked by the softc,
    it will not be restored by ns8250_bus_ungrab(). This creates a race in
    which a TX interrupt can be lost, creating the hang described above.
    Ensuring that this bit is restored is enough to prevent this, and resume
    from ddb as expected.

    The solution is to track this bit in the sc->ier field, for the same
    lifetime that TX interrupts are enabled.

    PR:             223917, 240122
    Reviewed by:    imp, manu
    Tested by:      bz
    MFC after:      5 days
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D29130

 sys/dev/uart/uart_dev_ns8250.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
Comment 8 commit-hook freebsd_committer freebsd_triage 2021-03-15 14:26:02 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=17d301f7b59f49c52983fe0957208dddf40b1232

commit 17d301f7b59f49c52983fe0957208dddf40b1232
Author:     Mitchell Horne <mhorne@FreeBSD.org>
AuthorDate: 2021-03-10 14:57:12 +0000
Commit:     Mitchell Horne <mhorne@FreeBSD.org>
CommitDate: 2021-03-15 14:22:17 +0000

    ns8250: don't drop IER_TXRDY on bus_grab/ungrab

    It has been observed that some systems are often unable to resume from
    ddb after entering with debug.kdb.enter=1. Checking the status further
    shows the terminal is blocked waiting in tty_drain(), but it never makes
    progress in clearing the output queue, because sc->sc_txbusy is high.

    I noticed that when entering polling mode for the debugger, IER_TXRDY is
    set in the failure case. Since this bit is never tracked by the softc,
    it will not be restored by ns8250_bus_ungrab(). This creates a race in
    which a TX interrupt can be lost, creating the hang described above.
    Ensuring that this bit is restored is enough to prevent this, and resume
    from ddb as expected.

    The solution is to track this bit in the sc->ier field, for the same
    lifetime that TX interrupts are enabled.

    PR:             223917, 240122
    Sponsored by:   The FreeBSD Foundation

    (cherry picked from commit 7e7f7beee732810d3afcc83828341ac3e139b5bd)

 sys/dev/uart/uart_dev_ns8250.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
Comment 9 commit-hook freebsd_committer freebsd_triage 2021-03-16 17:56:59 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=a54c346ff3e80ff8f2f3d0ec56b5374a7dc34429

commit a54c346ff3e80ff8f2f3d0ec56b5374a7dc34429
Author:     Mitchell Horne <mhorne@FreeBSD.org>
AuthorDate: 2021-03-10 14:57:12 +0000
Commit:     Mitchell Horne <mhorne@FreeBSD.org>
CommitDate: 2021-03-16 17:56:03 +0000

    ns8250: don't drop IER_TXRDY on bus_grab/ungrab

    It has been observed that some systems are often unable to resume from
    ddb after entering with debug.kdb.enter=1. Checking the status further
    shows the terminal is blocked waiting in tty_drain(), but it never makes
    progress in clearing the output queue, because sc->sc_txbusy is high.

    I noticed that when entering polling mode for the debugger, IER_TXRDY is
    set in the failure case. Since this bit is never tracked by the softc,
    it will not be restored by ns8250_bus_ungrab(). This creates a race in
    which a TX interrupt can be lost, creating the hang described above.
    Ensuring that this bit is restored is enough to prevent this, and resume
    from ddb as expected.

    The solution is to track this bit in the sc->ier field, for the same
    lifetime that TX interrupts are enabled.

    PR:             223917, 240122
    Sponsored by:   The FreeBSD Foundation

    (cherry picked from commit 7e7f7beee732810d3afcc83828341ac3e139b5bd)

 sys/dev/uart/uart_dev_ns8250.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)