Bug 203772 - panic: assertion failed: agg->dtag_hasarg, file: /usr/src/sys/cddl/dev/dtrace/dtrace_ioctl.c, line: 141
Summary: panic: assertion failed: agg->dtag_hasarg, file: /usr/src/sys/cddl/dev/dtrace...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Mark Johnston
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-10-14 15:53 UTC by Danilo Egea Gondolfo
Modified: 2016-02-05 02:04 UTC (History)
2 users (show)

See Also:


Attachments
core.txt (191.95 KB, text/plain)
2015-10-14 15:53 UTC, Danilo Egea Gondolfo
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Danilo Egea Gondolfo freebsd_committer freebsd_triage 2015-10-14 15:53:58 UTC
Created attachment 162042 [details]
core.txt

Assert triggered when /usr/share/dtrace/disklatency is executed. core.txt attached.
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2015-10-15 04:33:40 UTC
It seems that the problem has to do with the fact that args[1]->device_name is not always set in the io:::{start,end} probes. For some reason, it only causes a crash when there are multiple aggregation keys. For instance,

dtrace -n 'io:::start {@[args[1]->device_name, execname] = count()}'

triggers the crash for me, but

dtrace -n 'io:::start {@[args[1]->device_name] = count()}'

does not.

I'm also not sure why this problem has only appeared recently. I've definitely run this script before: it triggered the bug fixed in r278114. The first example in the dtrace_io(4) man page also triggers this crash. (And I definitely tested it when I wrote it!) Nonetheless, reverting several recent DTrace changes doesn't make the problem go away.
Comment 2 Mark Johnston freebsd_committer freebsd_triage 2015-10-16 04:52:20 UTC
Digging somewhat further, it seems the problem has to do with the encoding of
array lengths. Contrary to what I thought last night, struct devstat's device_name
field is an array, not a pointer. And it turns out that all the array types in
the kernel's CTF info have length 0 for some reason. kgdb shows the same thing, so
I'm guessing it's related to the clang 3.7 import. Interestingly, the new kgdb
support has the correct array sizes:

kgdb710:
(kgdb) ptype struct devstat
type = struct devstat {
    u_int sequence0;
    int allocated;
    u_int start_count;
    u_int end_count;
    struct bintime busy_from;
    struct {
        struct devstat *stqe_next;
    } dev_links;
    u_int32_t device_number;
    char device_name[16];
    ...

kgdb (base):
(kgdb) ptype struct devstat
type = struct devstat {
    u_int sequence0;
    int allocated;
    u_int start_count;
    u_int end_count;
    struct bintime busy_from;
    struct {
        struct devstat *stqe_next;
    } dev_links;
    u_int32_t device_number;
    char device_name[0];
    ...

I've fixed a few bugs related to zero-length arrays in the CTF code in the past,
so it's not surprising that it's causing problems for DTrace. There are multiple
problems here: we need to figure out why the array lengths are now zero, and
fix DTrace to handle this case properly (i.e. without crashing the kernel).
Comment 3 Mark Johnston freebsd_committer freebsd_triage 2015-10-16 05:17:02 UTC
Seems that clang switched from using DW_AT_upper_bound to DW_AT_count:

< 1><0x0000294d>    DW_TAG_array_type           
                      DW_AT_type                  <0x00000137>
< 2><0x00002952>      DW_TAG_subrange_type
                        DW_AT_type                  <0x0000013f>
                        DW_AT_upper_bound           16

vs.

< 1><0x0000410d>    DW_TAG_array_type
                      DW_AT_type                  <0x000000c0>
< 2><0x00004112>      DW_TAG_subrange_type
                        DW_AT_type                  <0x00000100>
                        DW_AT_count                 0x00000010

and sure enough, the CTF tools only recognize the former. That should be straightforward to fix.
Comment 4 commit-hook freebsd_committer freebsd_triage 2015-10-24 03:15:00 UTC
A commit references this bug:

Author: markj
Date: Sat Oct 24 03:14:36 UTC 2015
New revision: 289866
URL: https://svnweb.freebsd.org/changeset/base/289866

Log:
  DWARF emitted by clang 3.7 encodes array sizes using the DW_AT_count
  attribute rather than DW_AT_upper_bound. Teach ctfconvert about this so that
  array type sizes are encoded correctly.

  PR:		203772
  MFC after:	1 week

Changes:
  head/cddl/contrib/opensolaris/tools/ctf/cvt/dwarf.c
Comment 5 Danilo Egea Gondolfo freebsd_committer freebsd_triage 2015-10-26 21:04:10 UTC
It's working now, thanks!
Comment 6 commit-hook freebsd_committer freebsd_triage 2015-11-13 01:28:15 UTC
A commit references this bug:

Author: markj
Date: Fri Nov 13 01:27:20 UTC 2015
New revision: 290738
URL: https://svnweb.freebsd.org/changeset/base/290738

Log:
  MFC r289866:
  DWARF emitted by clang 3.7 encodes array sizes using the DW_AT_count
  attribute rather than DW_AT_upper_bound. Teach ctfconvert about this so that
  array type sizes are encoded correctly.

  PR:           203772

Changes:
_U  stable/10/
  stable/10/cddl/contrib/opensolaris/tools/ctf/cvt/dwarf.c
Comment 7 Danilo Egea Gondolfo freebsd_committer freebsd_triage 2016-01-12 16:21:27 UTC
Closing. Problem fixed. Thanks!