Summary: | [dtrace] Certain llquantize() parameters trigger assertion | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Fabian Keil <fk> | ||||
Component: | bin | Assignee: | Mark Johnston <markj> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Some People | CC: | dtrace, emaste, gnn, markj | ||||
Priority: | --- | Keywords: | patch | ||||
Version: | 11.0-STABLE | Flags: | koobs:
mfc-stable11+
|
||||
Hardware: | Any | ||||||
OS: | Any | ||||||
Attachments: |
|
Slightly simpler repro that we can put in a test case: # dtrace -q -n 'syscall:::return {@[execname] = llquantize(arg0, 100, 0, 10, 100);} tick-5s {exit(0);}' Assertion failed: (value < next), file /mnt/ssd/markj/src/freebsd-dev/dev/graphics/cddl/contrib/opensolaris/lib/libdtrace/common/dt_aggregate.c, line 248. Abort trap It seems the real issue here is an overflow when computing 100^10? 9 is the largest power of 100 that fits in a 64-bit int. There's a check that's supposed to catch this. Indeed, it does if we change the upper power from 10 to 11: git (dev/vm-init) markj@wkstn-mjohnston> sudo dtrace -n 'syscall:::return {@[execname] = llquantize(arg0, 100, 0, 11, 100);} tick-1s {exit(0);}' dtrace: invalid probe specifier syscall:::return {@[execname] = llquantize(arg0, 100, 0, 11, 100);} tick-1s {exit(0);}: llquantize( ) factor (100) raised to power of high magnitude (11) overflows 64-bits Right, so there's an off-by-one in that check. However, fixing that isn't sufficient: we also hit an overflow with a maximum power of 9. That's because the upper bound ends up being base^{max+1}. For instance, with a 4-tuple (10, 0, 3, 10) we show buckets up to 10000, i.e., 10^4. So it's actually an off-by-two. A commit references this bug: Author: markj Date: Mon Aug 21 21:56:03 UTC 2017 New revision: 322773 URL: https://svnweb.freebsd.org/changeset/base/322773 Log: Fix an off-by-two in the llquantize() action parameter validation. The aggregation created by llquantize() partitions values into buckets; the lower bound of the bucket containing the largest values is b^{m+1}, where b and m are the second and fourth parameters to the action, respectively. Bucket bounds are stored in a 64-bit integer, and so the llquantize() validation checks need to verify that b^{m+1} fits in 64 bits. However, it was only verifying that b^{m-1} fits in 64 bits, so certain parameter combinations could trigger assertion failures in libdtrace. PR: 219451 MFC after: 1 week Changes: head/cddl/contrib/opensolaris/lib/libdtrace/common/dt_cc.c MFCed to stable/11 in r323012. |
Created attachment 182796 [details] libdtrace: Prevent an assertion from triggering with certain llquantize() parameters On a system based on r318579/0c33b79a4 the following dtrace command reliably triggers an assertion when printing output: fk@t520 ~ $sudo dtrace -n 'syscall::read:return /execname == "privoxy"/ { @[execname] = llquantize(arg0, 100, 0, 10, 100); @m = max(arg0)}' [...] 9800 | 0 9900 | 0 10000 |@@@@@@@@@@@@@@@@@@@@ 37 20000 | 0 Assertion failed: (value < next), file /usr/src/cddl/lib/libdtrace/../../../cddl/contrib/opensolaris/lib/libdtrace/common/dt_consume.c, line 1083. Abort trap (gdb) where #0 0x00000008011effda in thr_kill () from /lib/libc.so.7 #1 0x00000008011effa4 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52 #2 0x00000008011eff19 in abort () at /usr/src/lib/libc/stdlib/abort.c:65 #3 0x000000080088c3b2 in __assert (expr=0x8008d3172 "value < next", file=0x8008d3078 "/usr/src/cddl/lib/libdtrace/../../../cddl/contrib/opensolaris/lib/libdtrace/common/dt_consume.c", line=1083) at /usr/src/cddl/lib/libdtrace/../../../cddl/compat/opensolaris/include/assert.h:56 #4 0x000000080088c190 in dt_print_llquantize (dtp=0x802633000, fp=0x8014c37e8, addr=0x80269a110, size=7840, normal=1) at /usr/src/cddl/lib/libdtrace/../../../cddl/contrib/opensolaris/lib/libdtrace/common/dt_consume .c:1083 #5 0x000000080088e37d in dt_print_datum (dtp=0x802633000, fp=0x8014c37e8, rec=0x8026900e8, addr=0x80269a110 "d", size=7848, aggdata=0x802690150, normal=1, pd=0x7fffffffe750) at /usr/src/cddl/lib/libdtrace/../../../cddl/contrib/opensolaris/lib/libdtrace/common/dt_consume.c:2211 #6 0x000000080088dc12 in dt_print_aggs (aggsdata=0x7fffffffe630, naggvars=1, arg=0x7fffffffe750) at /usr/src/cddl/lib/libdtrace/../../../cddl/contrib/opensolaris/lib/libdtrace/common/dt_consume.c:2313 #7 0x000000080088e6cf in dt_print_agg (aggdata=0x802690150, arg=0x7fffffffe750) at /usr/src/cddl/lib/libdtrace/../../../cddl/contrib/opensolaris/lib/libdtrace/common/dt_consume.c:2361 #8 0x0000000800895f8b in dt_aggregate_walk_sorted (dtp=0x802633000, func=0x80088e610 <dt_print_agg>, arg=0x7fffffffe750, sfunc=0x0) at /usr/src/cddl/lib/libdtrace/../../../cddl/contrib/opensolaris/lib/libdtrace/common/dt_aggregate.c:1585 #9 0x0000000800895d39 in dtrace_aggregate_walk_sorted (dtp=0x802633000, func=0x80088e610 <dt_print_agg>, arg=0x7fffffffe750) at /usr/src/cddl/lib/libdtrace/../../../cddl/contrib/opensolaris/lib/libdtrace/common/dt_aggregate.c:1605 #10 0x0000000800897f12 in dtrace_aggregate_print (dtp=0x802633000, fp=0x8014c37e8, func=0x800895d10 <dtrace_aggregate_walk_sorted>) at /usr/src/cddl/lib/libdtrace/../../../cddl/contrib/opensolaris/lib/libdtrace/common/dt_aggregate.c:2130 #11 0x0000000000403a5e in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/cddl/usr.sbin/dtrace/../../../cddl/contrib/opensolaris/cmd/dtrace/dtrace.c:2005 (gdb) f 4 #4 0x000000080088c190 in dt_print_llquantize (dtp=0x802633000, fp=0x8014c37e8, addr=0x80269a110, size=7840, normal=1) at /usr/src/cddl/lib/libdtrace/../../../cddl/contrib/opensolaris/lib/libdtrace/common/dt_consume .c:1083 1083 assert(value < next); (gdb) p step $2915 = 77662796314522419 (gdb) p value $2916 = 7834326075677972872 (gdb) p next $2917 = 7766279631452241920 It works as expected when replacing the 10 with a 5. Various other parameter combinations work as expected as well and I've used similar commands for weeks without issues. The problem is reproducible with other execnames as long as the probe fires. The "@m = max(arg0)" part isn't required to trigger the assertion but I only noticed it after already patching the system where libdtrace is build with reduced optimizations. The attached patch prevents the assertion from triggering but may not be the best solution. The code flow in dt_print_llquantize() seems strange to me and maybe the loop should break if "bin" reaches "last_bin" instead. My impression is that it does a bunch of cycles at the end without doing meaningful work. Obtained from: ElectroBSD