Bug 250954 - ptrace(): weird ordering between inheriting debug registers and reporting a new thread
Summary: ptrace(): weird ordering between inheriting debug registers and reporting a n...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-08 15:01 UTC by Michał Górny
Modified: 2020-11-11 09:53 UTC (History)
3 users (show)

See Also:


Attachments
Minimal demonstration program (2.58 KB, text/plain)
2020-11-08 15:02 UTC, Michał Górny
no flags Details
More verbose demo (3.46 KB, text/plain)
2020-11-09 08:03 UTC, Michał Górny
no flags Details
print the list of threads on each stop (3.60 KB, text/plain)
2020-11-09 11:50 UTC, Konstantin Belousov
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michał Górny 2020-11-08 15:01:53 UTC
Disclaimer: I'm not sure if it's a bug or just a surprising behavior.  Please consider the following case:

1. The debugger enables reporting new threads via PT_SET_EVENT_MASK.

2. Debugged program creates a thread (e.g. via pthread_create() or std::thread).

3. Debugged program is stopped immediately afterwards (e.g. due to breakpoint).

Now, I can reliably reproduce that the kernel reports the breakpoint (trace trap) *before* the new thread.  However, it seems that the thread is already created at this point -- if the debugger alters debug registers in the main thread, the new thread has the previous value.

Is dbreg inheritance something we want programmers to rely on?
Comment 1 Michał Górny 2020-11-08 15:02:21 UTC
Created attachment 219458 [details]
Minimal demonstration program
Comment 2 Michał Górny 2020-11-08 15:03:44 UTC
I can reproduce the problem using the attached program.  The dr0 output for the new thread matches g_val rather than g_val2, as set *before* new thread event arrives.
Comment 3 Konstantin Belousov freebsd_committer freebsd_triage 2020-11-08 21:46:42 UTC
Generally if you using PT_LWPINFO, you should use it after each reported stop.
I am not sure, in your test, which stop occurs when the thread is created,
it could be earlier than you call to PT_LWPINFO.

And then, since you use pid and not tid for ptrace(), it is relatively
random which thread new dbregs are applied.
Comment 4 Michał Górny 2020-11-08 22:14:16 UTC
That's just a cheap example.

So do I understand correctly that your recommendation is to call PT_LWPINFO and update the thread list every time the debugger stops rather than relying on LWP events to add/remove threads?
Comment 5 Konstantin Belousov freebsd_committer freebsd_triage 2020-11-08 23:49:58 UTC
(In reply to Michał Górny from comment #4)
I do not quite understand what do you mean by 'relying on LWP events'.  Events
are delivered by reporting stops, there is no separate channel that communicates
events.  And then, you need to call PT_LWPINFO to get the events reported in
pl_flags, after getting stop notification from waitpid(2).

That said, I am not saying that there is no bug with debug registers control,
but to make the conclusion, I need a convincing example, while your sample is
not (yet).  I also would need it to confirm that the bug is indeed fixed.
Comment 6 Michał Górny 2020-11-09 07:43:16 UTC
I'm sorry, I've confused PT_LWPINFO with PT_GETLWPLIST.  I'll make the example more clear.
Comment 7 Michał Górny 2020-11-09 08:03:23 UTC
Created attachment 219480 [details]
More verbose demo

Is this clearer?
Comment 8 Konstantin Belousov freebsd_committer freebsd_triage 2020-11-09 11:49:19 UTC
(In reply to Michał Górny from comment #7)
Actually not.  Or rather, I updated your second example some more to query
the list of threads on each stop, and for me it behaves exactly as I would
expect it to behave:
wait: pid=38618, waited=38618, ret=117f
threads: 100447
tid = 100447, SIGSTOP
set DR0=0x601878 (&g_val) on tid=100447
wait: pid=38618, waited=38618, ret=117f
threads: 100447 100820
tid = 100447, SIGSTOP
set DR0=0x601874 (&g_val2) on tid=100447
wait: pid=38618, waited=38618, ret=57f
threads: 100447 100820
tid = 100820, SIGTRAP w/ PL_FLAG_BORN
dr0=0x601878, g_val=0x601878, g_val2=0x601874
wait: pid=38618, waited=38618, ret=57f
threads: 100447 100820
tid = 100820, SIGTRAP w/ PL_FLAG_EXITED
thread started
thread joined
wait: pid=38618, waited=38618, ret=0
Now you set DR0 using explicit tid, and second PT_SETDBREGS only affects the
main thread, while new thread 100820 inherits initial DR0.
Comment 9 Konstantin Belousov freebsd_committer freebsd_triage 2020-11-09 11:50:07 UTC
Created attachment 219487 [details]
print the list of threads on each stop
Comment 10 Michał Górny 2020-11-09 12:29:37 UTC
Hmm, I should have mentioned that this affect LLDB right now.

The scenario where this happens is this:

1. Create breakpoint immediately after pthread_create().

2. When LLDB hits the breakpoint, create a watchpoint.

3. Resume.

The problem is that the new thread is already created at 2. but LLDB does not know about it, so the watchpoint is created only on thread 1.

We can solve this on LLDB end in either of two ways:

a. Relist all threads (via PT_GETLWPLIST) on every stop.  In this case, LLDB at point 2. would have the correct thread list even though the event is not yet delivered.

b. Copy dbregs to new threads in LLDB when they are reported.  In this case, the watchpoints are correctly propagated to new threads when LLDB becomes aware of these threads.


Regardless of this, it would be nice if kernel had predictable behavior here, i.e. reported the new thread before pthread_create() returned.
Comment 11 Konstantin Belousov freebsd_committer freebsd_triage 2020-11-09 12:47:20 UTC
(In reply to Michał Górny from comment #10)
And again I do not understand what you are complaining about.

We stop target before new thread returns to userspace.  This is the moment where
TDB_BORN or TBD_SCX are checked.  So if you are tracing either syscall exits
or forks you get stop with SIGTRAP and pl_flag PL_FLAG_BORN.  More, I believe
that pl_lwpid is set to the lwpid of the new thread.

At that stop, you can set whatever registers in the newborn thread.
Comment 12 Michał Górny 2020-11-09 12:59:05 UTC
Watchpoints are global in LLDB.  If user sets a watchpoint, he expects it to apply to all threads, existing and new.  To achieve LLDB does the following:

1. When watchpoint is being set, it sets debug registers on all threads it knows of.

2. When a new thread is created, it assumes it inherited debug registers from the previous thread.

The problem is that there is a window when the new thread is created already but it is not reported to LLDB yet.  During this window, changes to debug registers do not apply to the new thread because LLDB does not know about it.

Firstly, I'm asking whether this is better worked around by having LLDB recheck the thread list on every stop (i.e. assume that LWP events are unreliable) or by having it explicitly set dbregs on every new thread (i.e. assuming that dbreg inheritance is unreliable).

Secondly, I believe this behavior is suboptimal.  If the kernel is 1) reporting new threads, and 2) making them inherit dbregs from parent, it would be reasonable to do it in a way that would permit programs to actually rely on that and not have to reimplement either of the two functions.
Comment 13 Konstantin Belousov freebsd_committer freebsd_triage 2020-11-09 18:56:37 UTC
(In reply to Michał Górny from comment #12)
I believe the intended use of PT_SETDBREGS in combination with the new thread
tracking is to set debug registers to desired configuration on each PL_FLAG_BORN
event.  This should be most straight-forward.

WRT to the race itself, trying to close it would affect normal operation of
thread creation, as opposed to just causing minor inconvenience to the ptrace(2)
consumers.  We create thread, including the setup of all machine state, then
we fire it running, up to the point of running at trampoline returning to the
user space.  This moment is when we report TBD_BORN to ptracestop().  It is
important to do that stop from the context of the running new thread, so that
we already know that all hardware state is copied as needed.  Would we report
TDB_BORN before copying, it means that copying should occur under process lock
to guarantee consistency with potential modifications from debuggers, and we
do not want this.

Also I think existing consumers of TDB_BORN already expect that the event comes
from the running thread, not from the embryo.  So it really need be yet another
event to not break API.
Comment 14 Ed Maste freebsd_committer freebsd_triage 2020-11-09 21:04:05 UTC
See also LLVM review https://reviews.llvm.org/D91032
Comment 15 Pavel Labath 2020-11-10 12:56:51 UTC
Hi all. I have a semi-random peanut gallery comment.

This talk of two events has reminded me of how thread creating works on linux. There, the tracer indeed gets two events for thread creation, one on the main(parent) thread, and one for the newly created child thread.

The parent event (PTRACE_EVENT_CLONE) comes with the pid of the new thread, and it allows the tracer to wait for the creation event on the newly created thread, if he chooses to do that. Lldb chooses to do that, as it makes it easier to reason about the state of the process.

I am not very familiar with the ptrace model on freebsd (I understand there are significant differences), but it sounds like something like that might be useful here too...
Comment 16 Konstantin Belousov freebsd_committer freebsd_triage 2020-11-10 21:18:10 UTC
(In reply to Pavel Labath from comment #15)
Do you mean that events channel for Linux debugging facility is per-thread ?

In FreeBSD, ptrace(2)/waitpid(2) report all process events, with proper
qualification when it comes from specific thread.  As result, we do not really
need pre-notification about new thread to be created.  ptrace() produces a stop
in the context of new thread before it executes first userspace instruction,
informing debugger about lwpid and allowing it to make changes to the thread
state before it ever has chance to execute.

I suspect that the race that we discussed there either handled somewhat different
in Linux, or exists in Linux as well.  It is between the event from the context
of the new thread, and kernel taking snapshot of the state of the parent thread
to set child state.
Comment 17 Pavel Labath 2020-11-11 09:53:33 UTC
(In reply to Konstantin Belousov from comment #16)
> Do you mean that events channel for Linux debugging facility is per-thread ?
Yes, the linux ptrace interface deals with individual threads, essentially. It even allows funny things like tracing a subset of the threads of a process, tracing different threads by different processes, etc.

At first, I've found this setup strange, but now that I see the effort other kernels need to go to in order to present a sequentially consistent view of events that are inherently unsequenced, I am beginning to think that was the right decision after all.

But I digress...

> As result, we do not really need pre-notification about new thread to be created.

It's true that this event is not as critical here as in the linux case. However, the fact that the debugger cannot know whether its modification of the debug registers will be reflected in the threads that are (supposedly) created after it makes those modifications, makes the whole concept of inheriting debug registers pretty useless, imo.

That said, it probably doesn't make sense to change overall design because of this issue, as the workaround is pretty straight-forward.

> I suspect that the race that we discussed there either handled somewhat different in Linux, or exists in Linux as well.  It is between the event from the context of the new thread, and kernel taking snapshot of the state of the parent thread to set child state.

I am not completely sure what happens inside the kernel, but the way that lldb uses this interface (it forces a synchronization by waiting for both events, before doing anything else) means that the race (if it exists) does not have a chance to surface. I don't know what would happen if we actually tried to quickly start messing with the parent thread before the child thread is fully set up. It might be an interesting experiment...