Bug 256838 - devel/gdb: Enters busy-loop when interrupting a debuggee under different pty
Summary: devel/gdb: Enters busy-loop when interrupting a debuggee under different pty
Status: In Progress
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: Luca Pizzamiglio
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-25 21:09 UTC by Gleb Popov
Modified: 2021-07-13 08:31 UTC (History)
0 users

See Also:
pizzamig: maintainer-feedback+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gleb Popov freebsd_committer 2021-06-25 21:09:47 UTC
Reproduction steps are pretty simple:

1. Open a terminal emulator window, so that /dev/pty/n is created.
2. In the **different** terminal (which doesn't correspond to /dev/pty/n) run GDB in the machine interface mode:

% gdb --nx --interpreter=mi2 -quiet

3. Enter the following script into the GDB MI prompt:

1-gdb-show version
2-gdb-set width 0
3-gdb-set height 0
4handle SIG32 pass nostop noprint
5handle SIG41 pass nostop noprint
6handle SIG42 pass nostop noprint
7handle SIG43 pass nostop noprint
8-enable-pretty-printing
9-gdb-set charset UTF-8
10-gdb-set print sevenbit-strings off
11-gdb-set disable-randomization off
12-gdb-set print static-members off
13-gdb-set print asm-demangle on
14-file-exec-and-symbols /bin/cat
15-inferior-tty-set /dev/pts/n
16-exec-run

In "15-inferior-tty-set" command replace "n" with your tty number.

This will start /bin/cat program under the debugger. At this point, the debuggee  terminal will print

&"warning: GDB: Failed to set controlling terminal: Operation not permitted\n"

Not sure if this is relevant to the problem.

4. In the GDB prompt press Ctrl+C to send an interrupt signal to the debuggee.
5. gdb executable starts to consume 100% CPU and the debuggee doesn't stop.

This bug is pretty nasty, because it manifests itself on the KDE CI system. The hanging GDB process not only breaks KDevelop tests, but also slows down builds for other porjects running on the same machine.

Interestingly, removing "15-inferior-tty-set /dev/pts/n" makes the bug go away, but it isn't a solution for KDevelop case, which uses that pty stuff for "Launch in the external terminal" feature.
Comment 1 Luca Pizzamiglio freebsd_committer 2021-06-27 22:23:05 UTC
Hi Gleb,

Thanks for providing an example to reproduce the bug.
Even if it's quite simple (and I'm able to reproduce the bug), it's not so easy to investigate, running gdb in lldb won't propagate the signal to the debuggee and sending the signal with kill to the debuggee doesn't trigger the bug.

After several attempts, now I know:
* gdb main thread is the one running 100% CPU, not the worker gdb threads
* the backtrace is (lldb attached to gdb build with debug symbols):

* thread #1, name = 'gdb'
  * frame #0: 0x000000080202670a libc.so.7`__sys_sigprocmask at _sigprocmask.S:4
    frame #1: 0x0000000801ed7de4 libthr.so.3`handle_signal(actp=0x00007fffffffc7c0, sig=2, info=0x00007fffffffcbb0, ucp=0x00007fffffffc840) at thr_sig.c:288:2
    frame #2: 0x0000000801ed73cf libthr.so.3`thr_sighandler(sig=2, info=0x00007fffffffcbb0, _ucp=0x00007fffffffc840) at thr_sig.c:246:2
    frame #3: 0x00007fffffffe003
    frame #4: 0x00000000011532f7 gdb`inf_ptrace_target::wait(this=0x000000000181d710, ptid=(m_pid = -1, m_lwp = 0, m_tid = 0), ourstatus=0x00007fffffffd2c8, options=<unavailable>) at inf-ptrace.c:315:10
    frame #5: 0x00000000010fd258 gdb`fbsd_nat_target::wait(this=0x000000000181d710, ptid=(m_pid = -1, m_lwp = 0, m_tid = 0), ourstatus=0x00007fffffffd2c8, target_options=0) at fbsd-nat.c:1320:34
    frame #6: 0x000000000134d507 gdb`target_wait(ptid=(m_pid = -1, m_lwp = 0, m_tid = 0), status=0x00007fffffffd2c8, options=0) at target.c:2017:18
    frame #7: 0x000000000116a381 gdb`do_target_wait_1(inf=0x0000000802c1fac0, ptid=(m_pid = -1, m_lwp = 0, m_tid = 0), status=<unavailable>, options=<unavailable>) at infrun.c:3544:18
    frame #8: 0x0000000001163833 gdb`do_target_wait(ptid_t, execution_control_state*, int) [inlined] do_target_wait(this=<unavailable>, inf=0x0000000802c1fac0)::$_4::operator()(inferior*) const at infrun.c:3606:17
    frame #9: 0x0000000001163807 gdb`do_target_wait(wait_ptid=(m_pid = -1, m_lwp = 0, m_tid = 0), ecs=<unavailable>, options=1) at infrun.c:3619
    frame #10: 0x000000000116562a gdb`fetch_inferior_event() at infrun.c:3905:10
    frame #11: 0x0000000000fca76d gdb`check_async_event_handlers() at async-event.c:295:4
    frame #12: 0x00000000017cb3f5 gdb`gdb_do_one_event() at event-loop.cc:194:10
    frame #13: 0x00000000011ad1c5 gdb`captured_command_loop() at main.c:356:13
    frame #14: 0x00000000011ad0f7 gdb`captured_command_loop() at main.c:416
    frame #15: 0x00000000011aa435 gdb`gdb_main(captured_main_args*) at main.c:1253:4
    frame #16: 0x00000000011aa414 gdb`gdb_main(args=0x00007fffffffd4b0) at main.c:1268
    frame #17: 0x0000000000f5b05e gdb`main(argc=4, argv=0x00007fffffffd548) at gdb.c:38:10
    frame #18: 0x0000000000f5ade0 gdb`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1_c.c:75:7

the frames 0-2 are clearly caused by lldb attaching (and stopping) gdb.
I'm puzzled by frame 3, the one I don't understand (I guess it's the signal)
frame 4 is a do while loop, calling waitpid(-1, *status, WNOHANG) and could be the source of the 100% CPU, even if I don't understand why.
Ignoring frame 3, it seems that gdb is waiting for an event from the debuggee that never happens, but it's at OS level.

The warning you see from the "cat" terminal, is caused by gdb:

#ifdef TIOCSCTTY
  /* Make tty our new controlling terminal.  */
  if (ioctl (tty, TIOCSCTTY, 0) == -1)
    /* Mention GDB in warning because it will appear in the inferior's
       terminal instead of GDB's.  */
      warning (_("GDB: Failed to set controlling terminal: %s"),
         safe_strerror (errno));
#endif

My knowledge of tty related stuff is limited and I don't know if it's relevant for the error or not.
Comment 2 Gleb Popov freebsd_committer 2021-06-28 10:36:17 UTC
If I'm not mistaken, the SIGINT is not single-shot in this case, but is being spammed. Your backtrace seem to confirm this - GDB probably loops in signal handling code.

How did you obtain it? Did you attach to gdb process?

And another question - have you figured out what exactly sends these signals?
Comment 3 Luca Pizzamiglio freebsd_committer 2021-06-28 14:32:30 UTC
Yes, I attached to gdb using lldb

(lldb) process attach --pid NNNNN

I build gdb with debug symbols and used the binary in the build folder ($WRKDIR/.build/gdb/gdb) (The DEBUG option in the Makefile doesn't prevent strip, I'm going to fix it soon)

If I send the signal directly to cat, I don't see any issue.
If I send a different signal, like TERM, I don't see any issue.
The endless loop is triggered by INT.
I enabled extra debug messages in gdbarch (the architecture translation layer to implement the FreeBSD architecture), but when I send the INT signal, I don't see any output.
I tried to add printf in the sigint_handle function, but nothing shows up.

In gdb, the signal management is somehow complicated because gdb intercepts all signals and, depending on which signal is received, it has to understand if it was meant for the debugee or the debugger, but in my attempts it seems that this framework doesn't even notice that a SIGINT is arrived...

Maybe we should ask jhb@ or kib@ , they have more experience in this area
Comment 4 Gleb Popov freebsd_committer 2021-06-30 19:50:48 UTC
How do I prevent GDB from being stripped when building with DEBUG enabled?
Comment 5 Luca Pizzamiglio freebsd_committer 2021-06-30 20:48:22 UTC
(In reply to Gleb Popov from comment #4)
Currently, you have to define WITH_DEBUG in /etc/make.conf

Or, you can use the gdb binary at: ${WRKDIR}/.build/gdb/gdb
Comment 6 Gleb Popov freebsd_committer 2021-07-01 09:48:12 UTC
I've found something. The handler being called for the SIGINT signal is gdb/inflow.c:880

static void
pass_signal (int signo)
{
#ifndef _WIN32
  kill (inferior_ptid.pid (), SIGINT);
#endif
}

The inferior_ptid structure contains 0 in all fields, which causes kill() function to send the signal to the GDB itself, not the debuggee.

However, the GDB source code is quite complex, and I haven't found where this structure gets filled in yet.

I'll continue debugging on my own, but if you, Luca, have more experience with GDB codebase, please share your thoughts.
Comment 7 Gleb Popov freebsd_committer 2021-07-01 10:01:22 UTC
Another observation is that "Failed to set controlling terminal: Operation not permitted" warning is not relevant to this problem.

Running GDB as root makes the warning go away, but the busy-loop issue persists.
Comment 8 Gleb Popov freebsd_committer 2021-07-01 11:35:12 UTC
In further debugging I stumbled upon gdb/thread.c:1280 and following two functions:


void
switch_to_thread_no_regs (struct thread_info *thread)
{
  ...
}


and


void
switch_to_no_thread ()
{
  ...
}


The latter fills inferior_ptid variable with 0 and the former restores its value.

I verified that upon start, inferior_ptid contains correct PID of the debuggee. While running, these functions are occasionally being called. I put debugging printfs inside and this is what I get:


16^running
*running,thread-id="all"
(gdb) 
switch_to_thread_no_regs
switch_to_no_thread
switch_to_thread_no_regs
switch_to_no_thread
switch_to_thread_no_regs
switch_to_no_thread
switch_to_thread_no_regs
=library-loaded,id="/lib/libc.so.7",target-name="/lib/libc.so.7",host-name="/lib/libc.so.7",symbols-loaded="0",thread-group="i1",ranges=[{from="0x00000008010f1610",to="0x0000000801250f8c"}]
switch_to_no_thread
switch_to_thread_no_regs
switch_to_no_thread
switch_to_thread_no_regs
switch_to_thread_no_regs
switch_to_no_thread
^C


As you can see, while debugged program is running, the inferior_ptid is zeroed out, which is expected. However, upon receiving Ctrl+C the switch_to_thread_no_regs function doesn't get called, leaving inferior_ptid with zeroes. I believe this is the root cause of the issue, but I still have no idea how to fix it and why it works on !FreeBSD platforms.
Comment 9 Gleb Popov freebsd_committer 2021-07-01 12:30:42 UTC
The following hack fixed the problem for me:


--- gdb/inflow.c.orig	2021-04-25 04:06:26 UTC
+++ gdb/inflow.c
@@ -881,7 +881,10 @@ static void
 pass_signal (int signo)
 {
 #ifndef _WIN32
-  kill (inferior_ptid.pid (), SIGINT);
+  if (inferior_ptid.pid ())
+    kill (inferior_ptid.pid (), SIGINT);
+  else
+    kill (current_inferior ()->pid , SIGINT);
 #endif
 }
 


It probably does not fix the root cause, but apply a band-aid just before the disaster happens. Do you think it should be committed to our ports tree?
Comment 10 Luca Pizzamiglio freebsd_committer 2021-07-01 20:30:28 UTC
(In reply to Gleb Popov from comment #9)
Hi Gleb, thanks a lot for the time you dedicated to this and to find the solution. As you already noticed, the gdb code is a big mess, very difficult to read, and debugging a debugger can also be very challenging.

The "hack-fix" is good enough to lend in the portstree to fix the issue.
However, I would invest some time to understand why this is happening only on FreeBSD when trying to use a different tty (without inferior-set-tty, gdb works just fine).

The combination of factors that triggers the bug seems unrelated to each others, but obviously a reason exists, and it could be the cause of other nasty rare bugs.
Comment 11 commit-hook freebsd_committer 2021-07-03 09:58:24 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=65af5b9da1535eba81a41c88e386aa1729ceb60c

commit 65af5b9da1535eba81a41c88e386aa1729ceb60c
Author:     Luca Pizzamiglio <pizzamig@FreeBSD.org>
AuthorDate: 2021-07-03 09:54:51 +0000
Commit:     Luca Pizzamiglio <pizzamig@FreeBSD.org>
CommitDate: 2021-07-03 09:54:51 +0000

    devel/gdb: Add a hack to fix the kill storm bug

    While here, fix the debug install avoid stripping gdb
    Bump portrevision

    PR:             256838
    Reported by:    arrowd@FreeBSD.org
    Co-Author:      arrowd@FreeBSD.org

 devel/gdb/Makefile                       |  3 ++-
 devel/gdb/files/patch-gdb_inflow.c (new) | 14 ++++++++++++++
 2 files changed, 16 insertions(+), 1 deletion(-)
Comment 12 Gleb Popov freebsd_committer 2021-07-13 08:22:13 UTC
I don't have time to dig into this further. Luca, let's close this if you also don't plan on looking for a proper fix.
Comment 13 Luca Pizzamiglio freebsd_committer 2021-07-13 08:31:42 UTC
(In reply to Gleb Popov from comment #12)

I still have some time next weekend. After that, I'll close it.