Bug 206128

Summary:

vmstat -z is truncated when stdout is not terminal

Product:

Base System

Reporter:

Jan Beich <jbeich>

Component:

bin

Assignee:

Phil Shafer <phil>

Status:

Closed FIXED

Severity:

Affects Some People

CC:

allanjude, fk, markj, ngie, phil, rodrigc

Priority:

---

Keywords:

regression

Version:

CURRENT

Hardware:

Any

OS:

Any

Attachments:

Description	Flags
vmstat: Prevent output truncation when piping zone statistics	none
File created with "script vmstat-working vmstat -z"	none
File created with "script vmstat-not-working sh -c 'vmstat -z \| cat'"	none

Description Jan Beich freebsd_committer

2016-01-11 10:01:53 UTC

After base r291090 non-libxo output for -z option is unreliable.

$ vmstat -z | wc -l
     113

$ vmstat -z | tail -3
16384:                16384,      0,       6,       6,  368241,   0,   0
16384:                16384,      0,       1,       5,    1088,   0,   0
32768:         

$ script -q vmstat.out vmstat -z
[snipped]

$ wc -l vmstat.out
     326 vmstat.out

$ fgrep -m1 32768: vmstat.out
32768:                32768,      0,       0,       0,       0,   0,   0

Comment 1 Jan Beich freebsd_committer

2016-01-11 10:29:17 UTC

Only happens with MALLOC_CONF=junk:false, see malloc(3) and MALLOC_PRODUCTION.

Comment 2 Mark Johnston freebsd_committer

2016-01-15 18:00:43 UTC

I've noticed this too, but only on some of my systems. The output ends in the same place, i.e., after we start printing the 32KB malloc bucket stats. It does not occur if vmstat's stdout is a terminal.

I haven't dug into this very deeply yet, but I noticed that removing the field width for the zone name makes the problem go away:

1528                 xo_open_instance("zone");
1529                 xo_emit("{d:name/%-20s}{ke:name/%s} {:size/%6" PRIu64 "}, "
1530                         "{:limit/%6" PRIu64 "},{:used/%8" PRIu64 "},"
1531                         "{:free/%8" PRIu64 "},{:requests/%8" PRIu64 "},"
1532                         "{:fail/%4" PRIu64 "},{:sleep/%4" PRIu64 "}\n", name,

So this seems like a libxo bug.

Comment 3 Fabian Keil 2016-05-22 15:04:24 UTC

Created attachment 170546 [details]
vmstat: Prevent output truncation when piping zone statistics

The attached patch works around the issue by frequently calling xo_flush().

Comment 4 Enji Cooper freebsd_committer

2016-05-22 18:24:28 UTC

Adding phil@ for feedback on unexpected behavior with libxo not flushing quickly when the buffer is full.

Comment 5 Phil Shafer freebsd_committer

2016-05-22 18:52:33 UTC

I'll take a look tonight.

Thanks,
 Phil

Comment 6 Phil Shafer freebsd_committer

2016-05-22 19:06:43 UTC

No luck reproducing this, even with MALLOC_CONF=junk:false.  I'll dig more later tonight.  What grn are you seeing this at?

Thanks,
 Phil

Comment 7 Fabian Keil 2016-05-26 09:47:52 UTC

After reverting the patch again, I still see the problem with
a system based on 11-CURRENT r300376.

You could try running:

while vmstat -z | wc -c; do sleep 60; done

and wait until it prints 8192 which indicates a truncation.

Usually the output stays the same for a while, so if you
see 8192 once, there's a fair chance that you'll be able
to reproduce the problem with a debugger attached.

Yesterday I saw the truncation issue for more than 80 minutes
until it went away again.

Comment 8 Phil Shafer freebsd_committer

2016-05-26 15:14:34 UTC

Could you please send me the working and failing output?

Thanks,
 Phil

Comment 9 Fabian Keil 2016-05-26 15:27:04 UTC

Created attachment 170683 [details]
File created with "script vmstat-working vmstat -z"

Comment 10 Fabian Keil 2016-05-26 15:28:15 UTC

Created attachment 170684 [details]
File created with "script vmstat-not-working sh -c 'vmstat -z | cat'"

Comment 11 Phil Shafer freebsd_committer

2016-05-28 06:42:57 UTC

Was finally able to reproduce this and have a fix.  I was using the wrong value when checking for available buffer space:

diff --git a/libxo/libxo.c b/libxo/libxo.c
index 9df67de..7a714b1 100644
--- a/libxo/libxo.c
+++ b/libxo/libxo.c
@@ -2816,7 +2816,7 @@ xo_format_string (xo_handle_t *xop, xo_buffer_t *xbp, xo_xff_flags_t flags,
         * but if we did the work ourselves, then we need to do it.
         */
        int delta = xfp->xf_width[XF_WIDTH_MIN] - cols;
-       if (!xo_buf_has_room(xbp, delta))
+       if (!xo_buf_has_room(xbp, xfp->xf_width[XF_WIDTH_MIN]))
            goto bail;

        /*

In addition, I'm adding a high-water mark that will trigger a flush:

@@ -6128,6 +6128,12 @@ xo_do_emit_fields (xo_handle_t *xop, xo_field_info_t *fields,

     XOIF_CLEAR(xop, XOIF_REORDER);

+    /*
+     * If we've got enough data, flush it.
+     */
+    if (xo_buf_offset(&xop->xo_data) > XO_BUF_HIGH_WATER)
+       flush = 1;
+
     /* If we don't have an anchor, write the text out */
     if (flush && !XOIF_ISSET(xop, XOIF_ANCHOR)) {
        if (xo_write(xop) < 0)

I'll roll a new version of libxo tomorrow.

Apologies for taking so long with this, but it was mostly just an issue of reproducing it.  I wasn't seeing enough churn in my "vmstat -z" value to trigger the issue, so I munged one of the regression tests to help.

Thanks,
 Phil

Comment 12 Phil Shafer freebsd_committer

2016-05-29 01:58:10 UTC

Fixed in r300925.

Thanks,
 Phil