I'm running into a problem debugging threaded programs on FreeBSD head amd64 r304294 and i386 r305230. The following program reproduces it, but not always: % cat test.c #include <pthread.h> void * thr( void *arg ) { return( arg ); } int main( void ) { pthread_t pthr[ 4 ]; pthread_create( &pthr[ 0 ], NULL, thr, NULL ); pthread_create( &pthr[ 1 ], NULL, thr, NULL ); pthread_create( &pthr[ 2 ], NULL, thr, NULL ); pthread_create( &pthr[ 3 ], NULL, thr, NULL ); pthread_join( pthr[ 0 ], NULL ); pthread_join( pthr[ 1 ], NULL ); pthread_join( pthr[ 2 ], NULL ); pthread_join( pthr[ 3 ], NULL ); return( 0 ); } % cc -ggdb -o test test.c -lpthread % gdb ./test Reading symbols from ./test...done. (gdb) b thr Breakpoint 1 at 0x4007d8: file test.c, line 5. (gdb) r Starting program: /usr/home/tijl/test [New LWP 100221 of process 974] [New LWP 100222 of process 974] [Switching to LWP 100221 of process 974] Thread 2 hit Breakpoint 1, thr (arg=0x0) at test.c:5 5 return( arg ); (gdb) c Continuing. [Switching to LWP 100222 of process 974] Thread 3 hit Breakpoint 1, thr (arg=0x0) at test.c:5 5 return( arg ); (gdb) c Continuing. [LWP 100221 of process 974 exited] [LWP 100222 of process 974 exited] [New LWP 100223 of process 974] [Switching to LWP 100223 of process 974] 0x0000000800828990 in ?? () from /lib/libthr.so.3 ptrace: No such process. At this point gdb seems to be in an inconsistent state. (gdb) bt #0 0x0000000800828990 in ?? () from /lib/libthr.so.3 #1 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x7fffdfbfc000 (gdb) q A debugging session is active. Inferior 1 [process 974] will be killed. Quit anyway? (y or n) y Here gdb locks up and has to be killed with SIGKILL. ports r411099 is the first commit that gives this behaviour.
I'm currently extermely busy (my girlfriend is in hospital), but I can give a look at this problem next week. thanks for reporting
Hello, I'm getting "ptrace: no such process" errors too. I filed a bug report upstream (whoops?): https://sourceware.org/bugzilla/show_bug.cgi?id=20743 I'm not sure if this deals with the "gdb locks up" issue too though (which I also experience). FreeBSD 11.0-RELEASE-p1 with gdb 7.11.1_3
Hello there, Unfortunately I'm having exactly the same issue. I have two computers, one with FreeBSD 11.0-RELEASE-p1 (amd64) <Where I found the error>, and a laptop with 10.3-RELEASE FreeBSD 10.3-RELEASE (i386) <I tried few times, but I cannot reproduce the issue> Also, I tried to use -lthr, -lpthread and -pthread, but it seems that the real implementation underneath is libthr. I'm willing to give a hand if it's necessary. After some testing I can conclude: - GDB 6.6 works without problem, although it's using libthread_db.so to manage threads. - GDB 7.x it's using a different implementation, and it's no longer using libthread_db.so, but using ptrace directly (specially after the last addition for LWP events). - Initially I discarded problems in the kernel, because GDB 6.6 was working, but now that I realized that they use different paths I'm not sure, maybe the new LWP code is broken. - I tried vanillas GDB from elsewhere, same exact problem. If I can help in anyway just let me know. Cheers, Javi.
Pinging.... I'm still hitting this issue. Did you have time to have a look luca? Thanks in advance Javi.
I've really trouble to replicate the problem. After 50 runs, no error. I've spoken with jhb@ and he's not sure if it's a gdb problem, it could be potentially related to ptrace/kernel. AFAIK, The base gdb6 support threads in a different way. I'm running a CURRENT, so it could be the reason why I've problem to reproduce the error. @Javier: do you have another test case that trigger the error more often? I can use it to debug it
(In reply to Tijl Coosemans from comment #0) (In reply to Javier Bizcocho from comment #4) Did you try the patch I linked to in comment #2 ? The crux of the patch is change the top of resume_all_threads_cb() in fbsd-nat.c to: resume_all_threads_cb (struct thread_info *tp, void *data) { ptid_t *filter = (ptid_t *) data; /* don't resume an exited thread */ if (tp->state == THREAD_EXITED) return 0; [existing code, starting with if() continues from here] I'm not able to run CURRENT right now but did suffer this problem with GDB v7 on 11-RC and 11-RELEASE.
(In reply to misc-freebsd-bugzilla from comment #6) Your patch seems to fix the test case. I can keep continuing until the process exists and then cleanly quit gdb. However, when I try to quit gdb while the process is still running it still locks up and needs to be killed with SIGKILL.
Sorry that I haven't sat down to test this yet. The hang on kill seems to be an issue with PT_KILL (and I'm not sure if it's gdb or the kernel that is broken).
A commit references this bug: Author: olivier Date: Thu Jan 12 21:40:07 UTC 2017 New revision: 431323 URL: https://svnweb.freebsd.org/changeset/ports/431323 Log: Add MIPS support and other fixes PR: 215938 - Main PR that merge all Submitted by: luca.pizzamiglio@gmail.com (maintainer) PR: 215783 - Add MIPS support Submitted by: jhb Sponsored by: DARPA / AFRL PR: 215868 - Fix build on powerpc architecture Reported by: Mark Millard PR: 212607 - Add a workaround to mitigate gdb hangs under some circumstances with multi-threaded applications (thanks to misc-freebsd-bugzilla@talk2dom.com) Reported by: tijl PR: 215578 - Fix build by removing option to use system readline Reported by: rozhuk.im@gmail.com Changes: head/devel/gdb/Makefile head/devel/gdb/distinfo head/devel/gdb/files/commit-387360daf9 head/devel/gdb/files/commit-b268007c68 head/devel/gdb/files/extrapatch-base-readline head/devel/gdb/files/extrapatch-kgdb head/devel/gdb/files/kgdb/mipsfbsd-kern.c head/devel/gdb/files/kgdb/ppcfbsd-kern.c head/devel/gdb/files/kgdb/sparc64fbsd-kern.c head/devel/gdb/files/patch-gdb-fbsd-nat.c
Does the workaround committed is enough for closing this PR ?
(In reply to Olivier Cochard from comment #10) No, the root of the problem is still unknown, so this PR should stay open
A commit references this bug: Author: badger Date: Mon Feb 20 15:53:17 UTC 2017 New revision: 313992 URL: https://svnweb.freebsd.org/changeset/base/313992 Log: Defer ptracestop() signals that cannot be delivered immediately When a thread is stopped in ptracestop(), the ptrace(2) user may request a signal be delivered upon resumption of the thread. Heretofore, those signals were discarded unless ptracestop()'s caller was issignal(). Fix this by modifying ptracestop() to queue up signals requested by the ptrace user that will be delivered when possible. Take special care when the signal is SIGKILL (usually generated from a PT_KILL request); no new stop events should be triggered after a PT_KILL. Add a number of tests for the new functionality. Several tests were authored by jhb. PR: 212607 Reviewed by: kib Approved by: kib (mentor) MFC after: 2 weeks Sponsored by: Dell EMC In collaboration with: jhb Differential Revision: https://reviews.freebsd.org/D9260 Changes: head/sys/kern/kern_fork.c head/sys/kern/kern_sig.c head/sys/kern/kern_thr.c head/sys/kern/subr_syscall.c head/sys/kern/sys_process.c head/sys/sys/signalvar.h head/tests/sys/kern/Makefile head/tests/sys/kern/ptrace_test.c
(In reply to commit-hook from comment #12) This commit addresses the hang when trying to quit gdb. I still see the original problem (ptrace errors) on FreeBSD 10.3, I presume because it lacks thread events. I think a patch like the one alluded to be jhb here: https://sourceware.org/bugzilla/show_bug.cgi?id=20743#c2 is required to fix that.
(In reply to Eric Badger from comment #13) I don't have a patch for the non-LWP events errors with 10.3. My inclination at this point is to tell folks to just backport the LWP commits from stable/10 to 10.3. :-/ The errors GDB gets in the non-LWP event case do not have an obvious solution.
A commit references this bug: Author: badger Date: Sat Mar 25 13:33:25 UTC 2017 New revision: 315949 URL: https://svnweb.freebsd.org/changeset/base/315949 Log: MFC r313992, r314075, r314118, r315484: r315484: ptrace_test: eliminate assumption about thread scheduling A couple of the ptrace tests make assumptions about which thread in a multithreaded process will run after a halt. This makes the tests less portable across branches, and susceptible to future breakage. Instead, twiddle thread scheduling and priorities to match the tests' expectation. r314118: Actually fix buildworlds other than i386/amd64/sparc64 after r313992 Disable offending test for platforms without a userspace visible breakpoint(). r314075: Fix world build for archs where __builtin_debugtrap() does not work. The offending code was introduced in r313992. r313992: Defer ptracestop() signals that cannot be delivered immediately When a thread is stopped in ptracestop(), the ptrace(2) user may request a signal be delivered upon resumption of the thread. Heretofore, those signals were discarded unless ptracestop()'s caller was issignal(). Fix this by modifying ptracestop() to queue up signals requested by the ptrace user that will be delivered when possible. Take special care when the signal is SIGKILL (usually generated from a PT_KILL request); no new stop events should be triggered after a PT_KILL. Add a number of tests for the new functionality. Several tests were authored by jhb. PR: 212607 Sponsored by: Dell EMC Changes: _U stable/10/ stable/10/sys/kern/kern_fork.c stable/10/sys/kern/kern_sig.c stable/10/sys/kern/kern_thr.c stable/10/sys/kern/subr_syscall.c stable/10/sys/kern/sys_process.c stable/10/sys/sys/signalvar.h stable/10/tests/sys/kern/Makefile stable/10/tests/sys/kern/ptrace_test.c _U stable/11/ stable/11/sys/kern/kern_fork.c stable/11/sys/kern/kern_sig.c stable/11/sys/kern/kern_thr.c stable/11/sys/kern/subr_syscall.c stable/11/sys/kern/sys_process.c stable/11/sys/sys/signalvar.h stable/11/tests/sys/kern/Makefile stable/11/tests/sys/kern/ptrace_test.c
FYI, a final patch has been merged to GDB master and will appear in 8.0 release. The final patch is a bit different from the one in the port, but is functionally identical. I don't think we need to rework the patch in the current port, but we can drop it when we import 8.0. The GDB hangs were due to issues in the kernel that Eric Badger has thankfully fixed.