Bug 201276 - truss(1): truss -f doesn't followed pdfork()ed descendents
Summary: truss(1): truss -f doesn't followed pdfork()ed descendents
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 10.1-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: Mark Johnston
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-02 13:00 UTC by David Drysdale
Modified: 2021-02-05 15:46 UTC (History)
5 users (show)

See Also:
emaste: mfc-stable10?


Attachments
Special case traced pdfork processes in wait6 (3.80 KB, patch)
2017-11-12 15:15 UTC, Jan Kokemüller
no flags Details | Diff
v2 of patch (2.02 KB, patch)
2018-10-13 12:48 UTC, Jan Kokemüller
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Drysdale 2015-07-02 13:00:12 UTC
I guess the tests around:
https://svnweb.freebsd.org/base/head/usr.bin/truss/amd64-fbsd.c?view=markup#l159
(and the other arch variants) need updating.
Comment 1 Ed Maste freebsd_committer freebsd_triage 2015-07-02 13:52:03 UTC
Review here: https://reviews.freebsd.org/D2976
Comment 2 commit-hook freebsd_committer freebsd_triage 2015-07-24 16:58:17 UTC
A commit references this bug:

Author: emaste
Date: Fri Jul 24 16:57:17 UTC 2015
New revision: 285842
URL: https://svnweb.freebsd.org/changeset/base/285842

Log:
  truss: follow pdfork()ed descendents with -f

  PR:		201276
  Reported by:	David Drysdale
  Reviewed by:	oshogbo
  MFC after:	1 week
  Sponsored by:	The FreeBSD Foundation
  Differential Revision:	https://reviews.freebsd.org/D2976

Changes:
  head/usr.bin/truss/amd64-fbsd.c
  head/usr.bin/truss/amd64-fbsd32.c
  head/usr.bin/truss/arm-fbsd.c
  head/usr.bin/truss/i386-fbsd.c
  head/usr.bin/truss/mips-fbsd.c
  head/usr.bin/truss/powerpc-fbsd.c
  head/usr.bin/truss/powerpc64-fbsd.c
  head/usr.bin/truss/sparc64-fbsd.c
Comment 3 Ed Maste freebsd_committer freebsd_triage 2015-08-14 21:45:08 UTC
MFC'd to 10 in r286789
Comment 4 Mark Johnston freebsd_committer freebsd_triage 2017-04-05 08:01:44 UTC
It looks like this was undone by r286698:
https://svnweb.freebsd.org/changeset/base/286698

truss -f uses waitid(P_ALL, WTRAPPED) to wait for children. When a PT_FOLLOW_FORK-traced process forks, the child reparents itself to the tracing process and stops, which should cause the tracing process to return from kern_wait6(). But it doesn't, so the child stays stuck in ptracestop().
Comment 5 John Baldwin freebsd_committer freebsd_triage 2017-06-03 15:57:26 UTC
I think the kernel change is wrong for a debugger.  The parent should not see the pid and need pdwait(), but a debugger like truss should be able to see them.  There is a question if this means that a process that can't use fork can use pdfork() and then ptrace(PT_TRACE_ME) in the child and be able to obtain the pid via wait(P_ALL) in the parent, but if you're blocking fork() but not ptrace() that may be what you are stuck with.
Comment 6 Jan Kokemüller 2017-11-12 15:15:22 UTC
Created attachment 187941 [details]
Special case traced pdfork processes in wait6

I've attached a (only very lightly tested) patch that makes "truss -f" work with pdfork. It acknowledges that traced processes created in PTRACE_FORK mode are a special case and makes their notifications visible to wait(2) again.

Some questions:
 - Why is P_TRACED cleared so early in exit1()? This makes it impossible to just check for P_TRACED in proc_to_reap() because some notifications near process exit would be missed. I've hacked around it by introducing another process flag that is set if P_TRACED was set.
 - This is also the reason why I check for P_WEXIT in procdesc_close and don't send SIGKILL in this case. Otherwise the traced process would be reparented to init and sent SIGCHLD to init instead of the tracing process.
 - I've seen that procdesc_free() is called two times in procdesc_close(). Is that correct?
Comment 7 Jan Kokemüller 2018-01-26 06:58:59 UTC
Has anybody had a chance to look at this yet? I've been using this patch for a few weeks now to truss(1) my capsicum enabled program.
Comment 8 John Baldwin freebsd_committer freebsd_triage 2018-01-26 19:37:31 UTC
I wonder if instead we should only hide procdesc processes from the parent.  If a debugger used pdfork() then it would be responsible for using pdwait().  The only non-parent's who can call wait() on a process are debuggers, so if the P_ALL check only bailed for 'p->p_pptr == curproc' (or however one finds the "real" parent now), then I think that might work without requiring the new flag?

I need to think more about the other change (avoiding reparent if the process is exiting).
Comment 9 Mark Johnston freebsd_committer freebsd_triage 2018-10-10 16:58:26 UTC
It would be nice to fix this in 12.0: it makes debugging capsicumized processes rather challenging.  In my case I can't truss libcasper services because they're created using pdfork().
Comment 10 Mark Johnston freebsd_committer freebsd_triage 2018-10-10 19:29:40 UTC
(In reply to Jan Kokemüller from comment #6)
I'm not sure why P_TRACED is cleared when it is.  Did you try deferring the clear?

Regarding the procdesc_free() calls: yes, I believe that's correct.  procdescs are initialized with 2 references: one from the file descriptor and one from the process.  procdesc_close() may need to release both.

There seem to be other problems here; for instance, procdesc_close() always reparents to init(8) instead of the process' reaper.  Also, with the patch, if someone closes a procdesc for a traced process and that traced process exits, the "closed && parent not init" assertion in procdesc_exit() will trigger.
Comment 11 Jan Kokemüller 2018-10-13 12:48:54 UTC
Created attachment 198099 [details]
v2 of patch

Deferring the clear of the P_TRACED flag to proc_reap() seems to work. This is also the place where the parent process gets reset to the "real" parent, so there is symmetry here. FWIW, all of OpenBSD, NetBSD and Dragonfly defer the clear to this place now.

To fix procdesc_close() I guess it's enough to check for P_TRACED to omit the reparenting to a reaper? Is replacing "initproc" by "p->p_reaper" enough to fix the "always reparent to init" bug?

I haven't tried John's suggestion yet. With P_TRACED at the new position, wouldn't this be more or less equivalent to checking for P_TRACED?
Comment 12 Mark Johnston freebsd_committer freebsd_triage 2018-10-16 17:35:37 UTC
(In reply to Jan Kokemüller from comment #11)
Indeed, I believe s/initproc/p->p_reaper/ is sufficient for the reaper bug.  I'll fix it first since it's more straightforward: https://reviews.freebsd.org/D17589

I'm looking at your patch now.
Comment 13 Mark Johnston freebsd_committer freebsd_triage 2018-10-16 18:05:30 UTC
The patch looks reasonable to me and passes a couple of tests that I wrote.  Some comments:

- I would not reference PT_FOLLOW_FORK in the comment; the problem applies to ptrace() generally.
- If I understand John's comment correctly, the check in proc_to_reap() should instead be "if (p->p_procdesc != NULL && td->td_proc == proc_realparent(p))", and I tend to agree.
- I believe we can and should assert that P_TRACED is set before clearing it in proc_reap().

I'll do some more testing with these modifications applied.
Comment 14 Mark Johnston freebsd_committer freebsd_triage 2018-10-16 22:36:14 UTC
(In reply to Mark Johnston from comment #13)
My third comment is not right.  Consider the case where C is a child of P, and is traced.  If P exits, the kernel clears P_TRACED on C and sends SIGKILL to C.  Then (p->p_oppid != 0 && p->p_oppid != p->p_pptr->p_pid) is true for p = C, but C does not have P_TRACED set.

On the other hand, if P is tracing C and C exits, we end up leaving P_TRACED set on C through proc_reap() since p->p_oppid == p->p_pptr->p_pid.  So the latest patch is leaking P_TRACED in some cases.

My suspicion is that the code which sends SIGKILL to C in exit1() should set q->p_oppid = 0 in addition to clearing P_TRACED.  There's quite a lot of open-coded process state updates related to ptrace(), though, so it's hard to be sure. :(
Comment 15 Mark Johnston freebsd_committer freebsd_triage 2021-02-05 15:46:12 UTC
I believe this was fixed by commit 8e49361164e2da0e0647.