Bug 192906 - [dtrace] running dtest.pl causes VM to sporadically reboot
Summary: [dtrace] running dtest.pl causes VM to sporadically reboot
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-22 04:47 UTC by Enji Cooper
Modified: 2014-08-28 03:26 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Enji Cooper freebsd_committer 2014-08-22 04:47:50 UTC
Running dtest.pl causes my vanilla-ish VMware VM to sporadically reboot when the tests are executing.

Repro:

1. Build and install a kernel with the cyclic, dtrace, opensolaris "modules" in MODULES_OVERRIDE.
2. Run the following commands as root:

% 
% pkg install -y ksh93 nawk
% git clone https://github.com/yaneurabeya/freebsd/
% git checkout isilon-atf-integrate-dtrace
% export WITH_TESTS=yes
% make hier
% cd cddl/tests/dtrace
% make obj
% make depend
% make all
% make install
% cd /usr/tests/cddl/dtrace
% kyua test

Expected result:

The tests should complete.

Actual result:

The VM reboots. VMware Workstation notes that there was an uncaught/unhandled hardware fault either generated by the host hardware or guest OS.
Comment 1 Enji Cooper freebsd_committer 2014-08-22 04:49:08 UTC
Note from Anton Rang @ isilon (Isilon had fixed a similar/potentially same issue
in 7.x):

From: Rang, Anton
Sent: Thursday, August 21, 2014 11:12 PM
To: 'Cooper, Garrett'
Subject: dtrace bug

When dtrace is running, it’s important that traps be caught by dtrace if required. This is implemented via hook variables such as dtrace_trap_func and dtrace_doubletrap_func. However, these can’t be checked from normal C functions, since dtrace may add a probe at the start of the function, which causes a trap before the check, which causes an infinite loop.

For dtrace_trap_func, a special trap_check function is used – dtrace knows about this name and avoids instrumenting it, thus the trap occurs and dtrace is invoked without an intervening probe.

The same needs to be done for dtrace_dbltrap_func.
Comment 2 Enji Cooper freebsd_committer 2014-08-22 23:58:45 UTC
This also causes a stable/10 VM to reboot with a double fault.
Comment 3 Mark Johnston freebsd_committer 2014-08-23 00:17:32 UTC
Does this happen if you run the legacy test suite from tools/test/dtrace? I can't reproduce this using kyua, but I didn't manage to get the test suite to complete either; it runs some tests that seem to never terminate. In the legacy suite, some tests are excluded, so this problem isn't encountered.

Also, which revision are you running? The bug that Anton mentioned should be fixed in r268600/r268869. I'm confused by the reference to dtrace_dbltrap_func though; DTrace does indeed have a double fault handler, but DTrace should never trigger a double fault.

There have also been some other recent fixes (r269525 and r270024) which are relevant.
Comment 4 Enji Cooper freebsd_committer 2014-08-26 21:59:15 UTC
(In reply to Mark Johnston from comment #3)
> Does this happen if you run the legacy test suite from tools/test/dtrace? I
> can't reproduce this using kyua, but I didn't manage to get the test suite
> to complete either; it runs some tests that seem to never terminate. In the
> legacy suite, some tests are excluded, so this problem isn't encountered.

Yes. Thanks for the reminder about that directory (hadn't realized that it was committed back in concert with some of the other changes we made).

> Also, which revision are you running? The bug that Anton mentioned should be
> fixed in r268600/r268869. I'm confused by the reference to
> dtrace_dbltrap_func though; DTrace does indeed have a double fault handler,
> but DTrace should never trigger a double fault.

I'm just referencing the panic message/message from VMware.
 
> There have also been some other recent fixes (r269525 and r270024) which are
> relevant.

I'm running CURRENT as of last week, but it doesn't include these changes (I'll pull them in and retest).
Comment 5 Enji Cooper freebsd_committer 2014-08-26 22:00:59 UTC
(In reply to Garrett Cooper from comment #4)
> (In reply to Mark Johnston from comment #3)
> > Does this happen if you run the legacy test suite from tools/test/dtrace? I
> > can't reproduce this using kyua, but I didn't manage to get the test suite
> > to complete either; it runs some tests that seem to never terminate. In the
> > legacy suite, some tests are excluded, so this problem isn't encountered.

Another note, compiling these tests with tools/test/dtrace and clang doesn't work. These need to be fixed to work with !gcc (I have some fixes outstanding on my github fork, but they should be polished/pushed back upstream).
Comment 6 Enji Cooper freebsd_committer 2014-08-28 02:27:26 UTC
After I applied the compilation fixes I noted in the related bug and updated to the latest sources, I was able to get the tests to complete without crashing my box; there are a large number of failures that need to be resolved, but I'll follow up with that in another bug...
Comment 7 Enji Cooper freebsd_committer 2014-08-28 02:28:35 UTC
Sorry... neglected to provide the important information:

$ uname -a
FreeBSD freebsd-11-x64.localdomain 11.0-CURRENT FreeBSD 11.0-CURRENT #12 r270674+0129dfc(isilon-atf-integrate-dtrace): Tue Aug 26 16:50:36 PDT 2014     root@freebsd-11-x64.localdomain:/usr/obj/usr/src/sys/GENERIC-DEBUG  amd64
Comment 8 Enji Cooper freebsd_committer 2014-08-28 03:26:21 UTC
(In reply to Garrett Cooper from comment #6)
> After I applied the compilation fixes I noted in the related bug and updated
> to the latest sources, I was able to get the tests to complete without
> crashing my box; there are a large number of failures that need to be
> resolved, but I'll follow up with that in another bug...

Most of the failures are because ksh should be ksh93..