I've seen this for a long time when using ddb over an IPMI serial console and people were saying that it could be because of IPMI and whatever. Now I had a KDB: enter: manua escape to debugger situation (unclear where from yet) on my laptop. I said "db> cont" on v0 and it is stuck at that, not bringing back the shell session alive. However I can switch to v1 v2 v3 and all these shells are perfectly working fine. I wonder what is broken that returning from ddb doesn't work anymore?
Coming back hours later pressing <Enter> again on the "cont" line which didn't show a shell prompt, gave me the 4 <Enter>s I pressed in total (3 earlier, 1 now) and the command prompt came back. Something is still fishy. Anyone else with other experience (IPMI, serial, tty) would be welcome.
And I currently had this on a classic serial line machine as well. sysctl debug.kdb.enter=1 ddb> cont I still get kernel printfs coming but typing anything doesn't work. I hope someone has an idea... It's annoying if you cannot remotely power cycle a machine and need hands-on.
(In reply to Bjoern A. Zeeb from comment #2) I gave this a try in vt0 and, after entering cont in the ddb prompt the terminal recovered and my bash prompt reappeared. A number of carriage returns were apparently emitted by the kernel, but that was not a problem. BUT I'm using sc and NOT vt. What are you using?
I see it on an amd64 system. With debug.kdb.alt_break_to_debugger=1, I can enter ddb using the alt break sequence and resuming works fine. When I enter with sysctl debug.kdb.enter=1, I get the same hang. Happily, I can re-enter ddb in this state using the alt break sequence, so it's possible to debug a bit. In this state, the shell is stuck: db> bt Tracing pid 1447 tid 100097 td 0xfffffe000b532c00 sched_switch() at sched_switch+0x5b2/frame 0xfffffe003bac74c0 mi_switch() at mi_switch+0x155/frame 0xfffffe003bac74e0 sleepq_switch() at sleepq_switch+0x11a/frame 0xfffffe003bac7520 sleepq_catch_signals() at sleepq_catch_signals+0x262/frame 0xfffffe003bac7570 sleepq_timedwait_sig() at sleepq_timedwait_sig+0x12/frame 0xfffffe003bac75b0 _cv_timedwait_sig_sbt() at _cv_timedwait_sig_sbt+0x184/frame 0xfffffe003bac7620 tty_drain() at tty_drain+0x1cc/frame 0xfffffe003bac7680 tty_ioctl() at tty_ioctl+0x26d/frame 0xfffffe003bac76d0 ttydev_ioctl() at ttydev_ioctl+0x247/frame 0xfffffe003bac7720 devfs_ioctl() at devfs_ioctl+0xcc/frame 0xfffffe003bac7770 vn_ioctl() at vn_ioctl+0x132/frame 0xfffffe003bac7880 devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe003bac78a0 kern_ioctl() at kern_ioctl+0x276/frame 0xfffffe003bac7900 sys_ioctl() at sys_ioctl+0x127/frame 0xfffffe003bac79d0 amd64_syscall() at amd64_syscall+0x135/frame 0xfffffe003bac7af0 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe003bac7af0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8005e016a, rsp = 0x7fffffffd8e8, rbp = 0x7fffffffd930 --- Indeed, there are some bytes "stuck" in the tty queues: db> show tty 0xfffff80004075000 0xfffff80004075000: ttyu0 mtx: 0xfffff80004075008 flags: 0xe<INITLOCK,CALLOUT,OPENED_IN> revokecnt: 2 inq: 0xfffff80004075048 begin 0 linestart 2 reprint 2 end 2 nblocks 180 quota 180 outq: 0xfffff80004075088 begin 16 end 29 nblocks 93 quota 93 inlow: 20736 outlow: 20757 termios: iflag 0x2b02 oflag 0x7 cflag 0xcb00 lflag 0x5cb ispeed 115200 ospeed 115200 winsize: row 87 col 319 xpixel 0 ypixel 0 column: 0 writepos: 0 compatflags: 0x0 termios_init_in: iflag 0x2b02 oflag 0x3 cflag 0xcb00 lflag 0x5cb ispeed 115200 ospeed 115200 termios_init_out: iflag 0x2b02 oflag 0x3 cflag 0xcb00 lflag 0x5cb ispeed 115200 ospeed 115200 termios_lock_in: iflag 0x0 oflag 0x0 cflag 0x0 lflag 0x0 ispeed 0 ospeed 0 termios_lock_out: iflag 0x0 oflag 0x0 cflag 0x0 lflag 0x0 ispeed 0 ospeed 0 devsw: uart_tty_class (0xffffffff818d0a08) open: uart_tty_open close: uart_tty_close outwakeup: uart_tty_outwakeup inwakeup: uart_tty_inwakeup ioctl: uart_tty_ioctl param: uart_tty_param modem: uart_tty_modem mmap: ttydevsw_defmmap pktnotify: ttydevsw_defpktnotify free: uart_tty_free hook: 0 (0) pgrp: 0xfffff800036f6080 gid 1447 jobc 1 session: 0xfffff80004587b80 count 2 leader 0xfffff80006b43000 tty 0xfffff80004075000 sid 1443 login root sessioncnt: 1 devswsoftc: 0xfffff80004075800 hooksoftc: 0 dev: 0xfffff8000407a000 So I guess there is some race that results in uart(4) not handling an interrupt, so ttydisc_getc() isn't getting called to drain the outq.
FYI, I believe I've tracked down this issue, and posted a fix to https://reviews.freebsd.org/D29130 I plan to commit in the next day or two, but if anyone else can verify that this fixes the issue for them it would be helpful.
Works on a machine that was broken without and works with just that patch added and kernel recompiled and installed. Tested-by: bz Thanks you so much! This is going to make life a lot easier again :-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=7e7f7beee732810d3afcc83828341ac3e139b5bd commit 7e7f7beee732810d3afcc83828341ac3e139b5bd Author: Mitchell Horne <mhorne@FreeBSD.org> AuthorDate: 2021-03-10 14:57:12 +0000 Commit: Mitchell Horne <mhorne@FreeBSD.org> CommitDate: 2021-03-10 15:04:42 +0000 ns8250: don't drop IER_TXRDY on bus_grab/ungrab It has been observed that some systems are often unable to resume from ddb after entering with debug.kdb.enter=1. Checking the status further shows the terminal is blocked waiting in tty_drain(), but it never makes progress in clearing the output queue, because sc->sc_txbusy is high. I noticed that when entering polling mode for the debugger, IER_TXRDY is set in the failure case. Since this bit is never tracked by the softc, it will not be restored by ns8250_bus_ungrab(). This creates a race in which a TX interrupt can be lost, creating the hang described above. Ensuring that this bit is restored is enough to prevent this, and resume from ddb as expected. The solution is to track this bit in the sc->ier field, for the same lifetime that TX interrupts are enabled. PR: 223917, 240122 Reviewed by: imp, manu Tested by: bz MFC after: 5 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29130 sys/dev/uart/uart_dev_ns8250.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=17d301f7b59f49c52983fe0957208dddf40b1232 commit 17d301f7b59f49c52983fe0957208dddf40b1232 Author: Mitchell Horne <mhorne@FreeBSD.org> AuthorDate: 2021-03-10 14:57:12 +0000 Commit: Mitchell Horne <mhorne@FreeBSD.org> CommitDate: 2021-03-15 14:22:17 +0000 ns8250: don't drop IER_TXRDY on bus_grab/ungrab It has been observed that some systems are often unable to resume from ddb after entering with debug.kdb.enter=1. Checking the status further shows the terminal is blocked waiting in tty_drain(), but it never makes progress in clearing the output queue, because sc->sc_txbusy is high. I noticed that when entering polling mode for the debugger, IER_TXRDY is set in the failure case. Since this bit is never tracked by the softc, it will not be restored by ns8250_bus_ungrab(). This creates a race in which a TX interrupt can be lost, creating the hang described above. Ensuring that this bit is restored is enough to prevent this, and resume from ddb as expected. The solution is to track this bit in the sc->ier field, for the same lifetime that TX interrupts are enabled. PR: 223917, 240122 Sponsored by: The FreeBSD Foundation (cherry picked from commit 7e7f7beee732810d3afcc83828341ac3e139b5bd) sys/dev/uart/uart_dev_ns8250.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a54c346ff3e80ff8f2f3d0ec56b5374a7dc34429 commit a54c346ff3e80ff8f2f3d0ec56b5374a7dc34429 Author: Mitchell Horne <mhorne@FreeBSD.org> AuthorDate: 2021-03-10 14:57:12 +0000 Commit: Mitchell Horne <mhorne@FreeBSD.org> CommitDate: 2021-03-16 17:56:03 +0000 ns8250: don't drop IER_TXRDY on bus_grab/ungrab It has been observed that some systems are often unable to resume from ddb after entering with debug.kdb.enter=1. Checking the status further shows the terminal is blocked waiting in tty_drain(), but it never makes progress in clearing the output queue, because sc->sc_txbusy is high. I noticed that when entering polling mode for the debugger, IER_TXRDY is set in the failure case. Since this bit is never tracked by the softc, it will not be restored by ns8250_bus_ungrab(). This creates a race in which a TX interrupt can be lost, creating the hang described above. Ensuring that this bit is restored is enough to prevent this, and resume from ddb as expected. The solution is to track this bit in the sc->ier field, for the same lifetime that TX interrupts are enabled. PR: 223917, 240122 Sponsored by: The FreeBSD Foundation (cherry picked from commit 7e7f7beee732810d3afcc83828341ac3e139b5bd) sys/dev/uart/uart_dev_ns8250.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)