Bug 195222

Summary: dtrace kernel sdt probes must always be anchored
Product: Base System Reporter: Andriy Gapon <avg>
Component: kernAssignee: Mark Johnston <markj>
Status: Open ---    
Severity: Affects Some People CC: emaste, gonzo, markj
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   

Description Andriy Gapon freebsd_committer 2014-11-20 16:18:40 UTC
DTrace has two broad categories of probes, anchored and unanchored.  An anchored probes is associated with a certain place in code (point of execution) while an unanchored probe does not have such an association.  For example, probes provided by profile provider are unanchored.  On the other hand fbt probes are always anchored to function entry and exit points.

Given this definition, all sdt probes must be anchored because each probe is placed at a particular point of execution.

Some things are done differently by the DTrace code depending on whether a probe is anchored or not.  The code test if the probe is anchored based on whether it is tied to a function (see DTRACE_ANCHORED() macro).  But the current FreeBSD sdt implementation allows the function name to be left unset (empty).  Thus a fundamentally anchored sdt probe would appear as an unanchored probe to the DTrace code.

The above may lead to incorrect behavior.
For example try the following:
1. load dtrace_test.ko module
2. run this command: dtrace -n 'test:dtrace_test::sdttest { stack(); exit(0); }'
3. trigger the probe: sysctl debug.dtracetest.sdttest=1

Observe an incorrect stack trace as a result.

If definition of the probe is changed to include a function name, then stack() would provide an expected result.

Thus, FreeBSD sdt implementation needs to be change to always define function name for a probe.  Preferably, the function name should be set automatically.  For example, module name is automatically determined if left empty.

Ideally, FreeBSD sdt interface should not ask for module and function names to be provided at all.
Comment 1 Mark Johnston freebsd_committer 2015-01-13 23:14:49 UTC
I've done some work towards solving this problem, together with implementing Solaris-style SDT probes. By the latter I'm referring to the technique of overwriting SDT probe sites with NOPs until the probes are enabled.

The core of the change is the move from

#define SDT_PROBE(...)
        if (sdt_<probe identifier> != NULL)
                sdt_<probe identifier>(<args>);


#define SDT_PROBE(...)
        __dtrace_probe_<probe identifier>(<args>);

In the new implementation, we call a function directly instead of through a pointer, and the call is unconditional.

The main challenge is in identifying probe sites (represented by struct sdt_instance in my patch). We need to define the __dtrace_probe_* stub functions so that the kernel will link, but we also want to be able to find all of the calls to __dtrace_probe_* so that we know where the probe sites are actually located.

Trying to do this after the kernel (or a kld) has been linked seems challenging - the only way to find __dtrace_probe_* calls would be to disassemble the text section and look for calls/jmps to the address of a __dtrace_probe_* stub. What Solaris does instead is look for relocations against the __dtrace_probe_* stubs, and handle them specially: instead of processing the relocation, the target offset is overwritten with NOPs, and the location is recorded for use by the SDT kernel code.

I've written a userland program, sdtconvert (better name suggestions are welcome), which does exactly this. It is intended to be run against all object files in a kld immediately prior to linking. It looks for the relocations described above, overwrites the probe sites with NOPs, and records each probe site in a linker set, sdt_instances_set. Each struct sdt_instance in the set contains a pointer to the corresponding sdt_probe (filled in with a relocation emitted by sdtconvert), and the offset of the probe site within the kld.

I've also changed sdt.ko to examine the sdt_instances set when it processes a kld. For each instance, we use the kernel linker to look up the name of the function containing the probe site, and create a probe with the correct function name. sdtconvert has already overwritten the probe sites with NOPs, so sdt.ko's responsibility is to insert/remove a breakpoint at the probe site whenever the probe is enabled/disabled.

As a result I now get the following:

markj@charmander: /usr/home/markj $ sudo dtrace -l -P tcp
   ID   PROVIDER            MODULE                          FUNCTION NAME
36965        tcp            kernel                    tcp_do_segment accept-established
36966        tcp            kernel                    tcp_do_segment connect-established
36967        tcp            kernel                    tcp_do_segment connect-refused
36968        tcp            kernel                         tcp_input receive
36969        tcp            kernel                        tcp_output send
36970        tcp            kernel                        tcp_output connect-request
36971        tcp            kernel                  tcp_state_change state-change
36972        tcp            kernel                          tcp_drop state-change
36973        tcp            kernel                       tcp_respond send
36974        tcp            kernel                       tcp_respond accept-refused

In particular, the function names are filled in automatically. Probes that specify their own function name are left alone.

sdtconvert.c: https://github.com/markjdb/sdtconvert/blob/master/sdtconvert.c
current FreeBSD changes (includes sdtconvert integration): http://people.freebsd.org/~markj/patches/sdt-proper/sdt-proper-1.diff

The FreeBSD changes are on branch 20141214-sdt-proper in github.com/markjdb/freebsd.

Note that this is still a WIP; I haven't yet written the kernel code to enable/disable probes by overwriting the probe site. But that's pretty straightforward.

One thing I haven't figured out is how to handle compatibility. There are a number of SDT probes that specify the module and/or function, even though they're not supposed to. I'd like to rename them to avoid abusing the function and module components, but that would obviously break any script which uses these probes (though they aren't documented and mostly likely aren't widely used). It's preferable to just leave them alone, but I'd also like to have a new KPI for SDT probes that doesn't allow the module or function name to be specified, and convert "correct" SDT providers (io, ip, tcp, udp, proc, sched, ...) to use them. Then the old SDT(9) KPI would be deprecated.

Note that the work done so far only supports amd64.

Any thoughts?
Comment 2 Andriy Gapon freebsd_committer 2015-04-23 14:20:29 UTC
(In reply to Mark Johnston from comment #1)

sorry for not getting back sooner.
It so happened that when opened this bug report I also did some quick hacking along the same lines as your work:

This is also WIP.
Some things that I considered which are not done:
- remove function parameter from SDT probes, so that it is not set by a programmer but is determined automatically
- #BP could be used instead of #UD to trigger an SDT probe

Instead of sdtconvert like yours I used a shell script that generates an assembly file that defines a linker set pointing to all the interesting addresses.

Unfortunately, I do not have much time to work at this now.
I hope that eventually one of us would be able to converge the changes and commit the result.

One last note, I cared only about amd64 in my changes.  Obviously, all supported architectures have to be covered.  Or perhaps we could support both ways of implementing SDT probes, so that architectures could be converted one by one.
Comment 3 Mark Johnston freebsd_committer 2015-05-18 00:51:35 UTC
(In reply to Andriy Gapon from comment #2)

Hi Andriy,

Glad to see that you've been working on this too. I have some more time now and
I'm going to try and finish this work in the next few weeks.

> - remove function parameter from SDT probes, so that it is not set by a programmer but is determined automatically

I'd very much like to do this (and remove the module parameter too), but there are a number of existing probes that abuse these parameters; they can't be modified without changing the probe identifiers. My implementation automatically sets the function name correctly if it's not specified. Maybe it's worth the pain to fix this properly in 11 though.

> Instead of sdtconvert like yours I used a shell script that generates an assembly file that defines a linker set pointing to all the interesting addresses.

That's definitely simpler than what I did. Just a couple of notes:
- At the moment the probe stubs are empty functions, so we'll (probably) crash if they're called before patching is done. This can be fixed by changing them to just ret. Even better, each stub function should be an alias for a single no-op function defined in kern_sdt.c, so that we don't bloat the text section more than necessary.
- The patching should probably be done by the kernel rather than sdt.ko. It could be done from a SYSINIT method in kern_sdt.c.
- The probe calls may be tail calls, so it's not always correct to overwrite the probe site with nops - sometimes it needs to be a ret.
- Some more work needs to be done to handle KLDs and cross-builds, but don't see any issues there wrt your approach.

> One last note, I cared only about amd64 in my changes.  Obviously, all supported architectures have to be covered.  Or perhaps we could support both ways of implementing SDT probes, so that architectures could be converted one by one.

I've only handled amd64 so far as well. I'd prefer to try and make it work for all arches rather than maintaining two different SDT implementations, since they end up being fairly different, and sdt.h is already a mess as it is. :)

I'll try to get your changes working and post an updated patch soon.
Comment 4 commit-hook freebsd_committer 2015-09-29 12:02:54 UTC
A commit references this bug:

Author: avg
Date: Tue Sep 29 12:02:23 UTC 2015
New revision: 288363
URL: https://svnweb.freebsd.org/changeset/base/288363

  std: it is important that func name is never an empty string

  otherwise DTRACE_ANCHORED() returns false and that makes stack()
  insert a bogus frame at the top.
  For example:
  dtrace -n 'test:dtrace_test::sdttest { stack(); }

  This change is not really a solution, but just a work-around.
  The real solution is to record the probe's call site and to use
  that for resolving a function name.

  PR:		195222
  MFC after:	22 days

Comment 5 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:48:44 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
- Untouched since 2018-01-01.
- Affects Base System OR Documentation


Reset to open status.

I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 6 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2019-01-21 09:59:17 UTC
There is a commit referencing this PR, but it's still not closed and has been inactive for some time. Closing the PR as fixed but feel free to re-open it if the issue hasn't been completely resolved.

Comment 7 Andriy Gapon freebsd_committer 2019-01-21 10:04:21 UTC
The main problem has not been fixed.