Bug 206128 - vmstat -z is truncated when stdout is not terminal
Summary: vmstat -z is truncated when stdout is not terminal
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Phil Shafer
Keywords: regression
Depends on:
Reported: 2016-01-11 10:01 UTC by Jan Beich
Modified: 2016-05-29 07:31 UTC (History)
6 users (show)

See Also:

vmstat: Prevent output truncation when piping zone statistics (981 bytes, patch)
2016-05-22 15:04 UTC, Fabian Keil
no flags Details | Diff
File created with "script vmstat-working vmstat -z" (24.04 KB, text/plain)
2016-05-26 15:27 UTC, Fabian Keil
no flags Details
File created with "script vmstat-not-working sh -c 'vmstat -z | cat'" (8.25 KB, text/plain)
2016-05-26 15:28 UTC, Fabian Keil
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Beich freebsd_committer 2016-01-11 10:01:53 UTC
After base r291090 non-libxo output for -z option is unreliable.

$ vmstat -z | wc -l

$ vmstat -z | tail -3
16384:                16384,      0,       6,       6,  368241,   0,   0
16384:                16384,      0,       1,       5,    1088,   0,   0

$ script -q vmstat.out vmstat -z

$ wc -l vmstat.out
     326 vmstat.out

$ fgrep -m1 32768: vmstat.out
32768:                32768,      0,       0,       0,       0,   0,   0
Comment 1 Jan Beich freebsd_committer 2016-01-11 10:29:17 UTC
Only happens with MALLOC_CONF=junk:false, see malloc(3) and MALLOC_PRODUCTION.
Comment 2 Mark Johnston freebsd_committer 2016-01-15 18:00:43 UTC
I've noticed this too, but only on some of my systems. The output ends in the same place, i.e., after we start printing the 32KB malloc bucket stats. It does not occur if vmstat's stdout is a terminal.

I haven't dug into this very deeply yet, but I noticed that removing the field width for the zone name makes the problem go away:

1528                 xo_open_instance("zone");
1529                 xo_emit("{d:name/%-20s}{ke:name/%s} {:size/%6" PRIu64 "}, "
1530                         "{:limit/%6" PRIu64 "},{:used/%8" PRIu64 "},"
1531                         "{:free/%8" PRIu64 "},{:requests/%8" PRIu64 "},"
1532                         "{:fail/%4" PRIu64 "},{:sleep/%4" PRIu64 "}\n", name,

So this seems like a libxo bug.
Comment 3 Fabian Keil 2016-05-22 15:04:24 UTC
Created attachment 170546 [details]
vmstat: Prevent output truncation when piping zone statistics

The attached patch works around the issue by frequently calling xo_flush().
Comment 4 Ngie Cooper freebsd_committer 2016-05-22 18:24:28 UTC
Adding phil@ for feedback on unexpected behavior with libxo not flushing quickly when the buffer is full.
Comment 5 Phil Shafer freebsd_committer 2016-05-22 18:52:33 UTC
I'll take a look tonight.

Comment 6 Phil Shafer freebsd_committer 2016-05-22 19:06:43 UTC
No luck reproducing this, even with MALLOC_CONF=junk:false.  I'll dig more later tonight.  What grn are you seeing this at?

Comment 7 Fabian Keil 2016-05-26 09:47:52 UTC
After reverting the patch again, I still see the problem with
a system based on 11-CURRENT r300376.

You could try running:

while vmstat -z | wc -c; do sleep 60; done

and wait until it prints 8192 which indicates a truncation.

Usually the output stays the same for a while, so if you
see 8192 once, there's a fair chance that you'll be able
to reproduce the problem with a debugger attached.

Yesterday I saw the truncation issue for more than 80 minutes
until it went away again.
Comment 8 Phil Shafer freebsd_committer 2016-05-26 15:14:34 UTC
Could you please send me the working and failing output?

Comment 9 Fabian Keil 2016-05-26 15:27:04 UTC
Created attachment 170683 [details]
File created with "script vmstat-working vmstat -z"
Comment 10 Fabian Keil 2016-05-26 15:28:15 UTC
Created attachment 170684 [details]
File created with "script vmstat-not-working sh -c 'vmstat -z | cat'"
Comment 11 Phil Shafer freebsd_committer 2016-05-28 06:42:57 UTC
Was finally able to reproduce this and have a fix.  I was using the wrong value when checking for available buffer space:

diff --git a/libxo/libxo.c b/libxo/libxo.c
index 9df67de..7a714b1 100644
--- a/libxo/libxo.c
+++ b/libxo/libxo.c
@@ -2816,7 +2816,7 @@ xo_format_string (xo_handle_t *xop, xo_buffer_t *xbp, xo_xff_flags_t flags,
         * but if we did the work ourselves, then we need to do it.
        int delta = xfp->xf_width[XF_WIDTH_MIN] - cols;
-       if (!xo_buf_has_room(xbp, delta))
+       if (!xo_buf_has_room(xbp, xfp->xf_width[XF_WIDTH_MIN]))
            goto bail;


In addition, I'm adding a high-water mark that will trigger a flush:

@@ -6128,6 +6128,12 @@ xo_do_emit_fields (xo_handle_t *xop, xo_field_info_t *fields,


+    /*
+     * If we've got enough data, flush it.
+     */
+    if (xo_buf_offset(&xop->xo_data) > XO_BUF_HIGH_WATER)
+       flush = 1;
     /* If we don't have an anchor, write the text out */
     if (flush && !XOIF_ISSET(xop, XOIF_ANCHOR)) {
        if (xo_write(xop) < 0)

I'll roll a new version of libxo tomorrow.

Apologies for taking so long with this, but it was mostly just an issue of reproducing it.  I wasn't seeing enough churn in my "vmstat -z" value to trigger the issue, so I munged one of the regression tests to help.

Comment 12 Phil Shafer freebsd_committer 2016-05-29 01:58:10 UTC
Fixed in r300925.