Bug 206128

Summary: vmstat -z is truncated when stdout is not terminal
Product: Base System Reporter: Jan Beich <jbeich>
Component: binAssignee: Phil Shafer <phil>
Status: Closed FIXED    
Severity: Affects Some People CC: allanjude, fk, markj, ngie, phil, rodrigc
Priority: --- Keywords: regression
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
vmstat: Prevent output truncation when piping zone statistics
none
File created with "script vmstat-working vmstat -z"
none
File created with "script vmstat-not-working sh -c 'vmstat -z | cat'" none

Description Jan Beich freebsd_committer freebsd_triage 2016-01-11 10:01:53 UTC
After base r291090 non-libxo output for -z option is unreliable.

$ vmstat -z | wc -l
     113

$ vmstat -z | tail -3
16384:                16384,      0,       6,       6,  368241,   0,   0
16384:                16384,      0,       1,       5,    1088,   0,   0
32768:         

$ script -q vmstat.out vmstat -z
[snipped]

$ wc -l vmstat.out
     326 vmstat.out

$ fgrep -m1 32768: vmstat.out
32768:                32768,      0,       0,       0,       0,   0,   0
Comment 1 Jan Beich freebsd_committer freebsd_triage 2016-01-11 10:29:17 UTC
Only happens with MALLOC_CONF=junk:false, see malloc(3) and MALLOC_PRODUCTION.
Comment 2 Mark Johnston freebsd_committer freebsd_triage 2016-01-15 18:00:43 UTC
I've noticed this too, but only on some of my systems. The output ends in the same place, i.e., after we start printing the 32KB malloc bucket stats. It does not occur if vmstat's stdout is a terminal.

I haven't dug into this very deeply yet, but I noticed that removing the field width for the zone name makes the problem go away:

1528                 xo_open_instance("zone");
1529                 xo_emit("{d:name/%-20s}{ke:name/%s} {:size/%6" PRIu64 "}, "
1530                         "{:limit/%6" PRIu64 "},{:used/%8" PRIu64 "},"
1531                         "{:free/%8" PRIu64 "},{:requests/%8" PRIu64 "},"
1532                         "{:fail/%4" PRIu64 "},{:sleep/%4" PRIu64 "}\n", name,

So this seems like a libxo bug.
Comment 3 Fabian Keil 2016-05-22 15:04:24 UTC
Created attachment 170546 [details]
vmstat: Prevent output truncation when piping zone statistics

The attached patch works around the issue by frequently calling xo_flush().
Comment 4 Enji Cooper freebsd_committer freebsd_triage 2016-05-22 18:24:28 UTC
Adding phil@ for feedback on unexpected behavior with libxo not flushing quickly when the buffer is full.
Comment 5 Phil Shafer freebsd_committer freebsd_triage 2016-05-22 18:52:33 UTC
I'll take a look tonight.

Thanks,
 Phil
Comment 6 Phil Shafer freebsd_committer freebsd_triage 2016-05-22 19:06:43 UTC
No luck reproducing this, even with MALLOC_CONF=junk:false.  I'll dig more later tonight.  What grn are you seeing this at?

Thanks,
 Phil
Comment 7 Fabian Keil 2016-05-26 09:47:52 UTC
After reverting the patch again, I still see the problem with
a system based on 11-CURRENT r300376.

You could try running:

while vmstat -z | wc -c; do sleep 60; done

and wait until it prints 8192 which indicates a truncation.

Usually the output stays the same for a while, so if you
see 8192 once, there's a fair chance that you'll be able
to reproduce the problem with a debugger attached.

Yesterday I saw the truncation issue for more than 80 minutes
until it went away again.
Comment 8 Phil Shafer freebsd_committer freebsd_triage 2016-05-26 15:14:34 UTC
Could you please send me the working and failing output?

Thanks,
 Phil
Comment 9 Fabian Keil 2016-05-26 15:27:04 UTC
Created attachment 170683 [details]
File created with "script vmstat-working vmstat -z"
Comment 10 Fabian Keil 2016-05-26 15:28:15 UTC
Created attachment 170684 [details]
File created with "script vmstat-not-working sh -c 'vmstat -z | cat'"
Comment 11 Phil Shafer freebsd_committer freebsd_triage 2016-05-28 06:42:57 UTC
Was finally able to reproduce this and have a fix.  I was using the wrong value when checking for available buffer space:

diff --git a/libxo/libxo.c b/libxo/libxo.c
index 9df67de..7a714b1 100644
--- a/libxo/libxo.c
+++ b/libxo/libxo.c
@@ -2816,7 +2816,7 @@ xo_format_string (xo_handle_t *xop, xo_buffer_t *xbp, xo_xff_flags_t flags,
         * but if we did the work ourselves, then we need to do it.
         */
        int delta = xfp->xf_width[XF_WIDTH_MIN] - cols;
-       if (!xo_buf_has_room(xbp, delta))
+       if (!xo_buf_has_room(xbp, xfp->xf_width[XF_WIDTH_MIN]))
            goto bail;

        /*

In addition, I'm adding a high-water mark that will trigger a flush:

@@ -6128,6 +6128,12 @@ xo_do_emit_fields (xo_handle_t *xop, xo_field_info_t *fields,

     XOIF_CLEAR(xop, XOIF_REORDER);

+    /*
+     * If we've got enough data, flush it.
+     */
+    if (xo_buf_offset(&xop->xo_data) > XO_BUF_HIGH_WATER)
+       flush = 1;
+
     /* If we don't have an anchor, write the text out */
     if (flush && !XOIF_ISSET(xop, XOIF_ANCHOR)) {
        if (xo_write(xop) < 0)

I'll roll a new version of libxo tomorrow.

Apologies for taking so long with this, but it was mostly just an issue of reproducing it.  I wasn't seeing enough churn in my "vmstat -z" value to trigger the issue, so I munged one of the regression tests to help.

Thanks,
 Phil
Comment 12 Phil Shafer freebsd_committer freebsd_triage 2016-05-29 01:58:10 UTC
Fixed in r300925.

Thanks,
 Phil