Bug 211033 - sysutils/rsyslog8: CPU hangs at 100% with I/O error on stdout/stderr
Summary: sysutils/rsyslog8: CPU hangs at 100% with I/O error on stdout/stderr
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Matthew Seaman
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-12 13:59 UTC by Xavier Garcia
Modified: 2017-01-02 18:08 UTC (History)
1 user (show)

See Also:
bugzilla: maintainer-feedback? (brd)


Attachments
Fix tryTTYRecover in stream.c to also match on errno 5 (482 bytes, patch)
2017-01-02 16:58 UTC, Damien Fleuriot
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Xavier Garcia 2016-07-12 13:59:13 UTC
rsyslogd hangs in an infinite loop trying to write a syslog record to a file descriptor. 
truss -p pid outputs

write(1,"2016-07-06T10:01:23.061615+00:00"...,116) ERR#5 'Input/output error'
write(1,"2016-07-06T10:01:23.061615+00:00"...,116) ERR#5 'Input/output error'
write(1,"2016-07-06T10:01:23.061615+00:00"...,116) ERR#5 'Input/output error'
write(1,"2016-07-06T10:01:23.061615+00:00"...,116) ERR#5 'Input/output error'
write(1,"2016-07-06T10:01:23.061615+00:00"...,116) ERR#5 'Input/output error'
write(1,"2016-07-06T10:01:23.061615+00:00"...,116) ERR#5 'Input/output error'

It is not displayed here but the message written is exactly the same every time (truss -s 1024 -p pid)


We assumed it was the same bug that was closed upstream on https://github.com/rsyslog/rsyslog/issues/318 but we cannot reproduce it with the proof of concept. Furthermore, building the binary with debug mode (WITH_DEBUG=yes) stopped the bug from being triggered and we cannot get a coredump to investigate.

We then forced the debug build with optimisations on (-O2) and we still cannot reproduce the error.


We have seen this happening in fd 1 and fd 2.

The command `fstat -p pid`  returned

root     rsyslogd    1073    2 -         -         bad

that means bad file descriptor in stat(2) and errno.h (EBADF).
Comment 1 Xavier Garcia 2016-08-01 08:39:39 UTC
After rebooting one of our servers, we hit the bug again and we were able to investigate further.

It seems to be related to the already closed bug report  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200429

Our rsyslog.conf sends *.err messages to /dev/console, that should be the file descriptor FD1. Unfortunately, this FD is not available and the rsyslog process ends up in an infinite loop, trying to write the same message.


# fstat -p `cat /var/run/syslog.pid`
USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
root     rsyslogd    1375 text /        100290 -rwxr-xr-x  606760  r
root     rsyslogd    1375   wd /             4 drwxr-xr-x      27  r
root     rsyslogd    1375 root /             4 drwxr-xr-x      27  r
root     rsyslogd    1375    0* local dgram fffff80331b1dd20
root     rsyslogd    1375    1 -         -         bad    -
root     rsyslogd    1375    2 -           881 -rw-r-----     202  w
root     rsyslogd    1375    3* internet dgram udp fffff803318f5dc8
root     rsyslogd    1375    4 /dev         32 crw-rw-rw-  random  r
root     rsyslogd    1375    5* internet stream tcp fffff80400488810
root     rsyslogd    1375    6 -            33 -rw-r--r--   48744  w
root     rsyslogd    1375    7 -          2025 -rw-------    2570  w
root     rsyslogd    1375    8 -           623 -rw-------   91099  w
root     rsyslogd    1375    9 -           622 -rw-------   93137  w
root     rsyslogd    1375   10 -          2691 -rw-r-----  29562283  w


We have a core dump available in case it is needed.
Comment 2 Damien Fleuriot 2017-01-02 16:58:58 UTC
Created attachment 178452 [details]
Fix tryTTYRecover in stream.c to also match on errno 5
Comment 3 Damien Fleuriot 2017-01-02 17:00:47 UTC
Xavier,


Your problem is very similar to the one encountered by Alexandre and for which he proposed a fix in #200429 :
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200429

I'm not sure what FreeBSD version Alexandre runs to get errno 6 ENXIO.

I, however, run 10-STABLE and get errno 5 EIO.


I've attached a quick fix which solves the problem for me.



Brad, would you kindly review and commit the proposed patch ?
Comment 4 Brad Davis freebsd_committer freebsd_triage 2017-01-02 17:03:13 UTC
I thought this had been resolved, but I am not using rsyslog anymore so over to the new maintainer.
Comment 5 Damien Fleuriot 2017-01-02 17:26:32 UTC
Issue opened over at github :
https://github.com/rsyslog/rsyslog/issues/1351
Comment 6 commit-hook freebsd_committer freebsd_triage 2017-01-02 18:05:31 UTC
A commit references this bug:

Author: matthew
Date: Mon Jan  2 18:05:06 UTC 2017
New revision: 430388
URL: https://svnweb.freebsd.org/changeset/ports/430388

Log:
  Handle errno==5 (EIO) as the result of trying to write(2) to the console.

  Submittet upstream by Damien Fleurot:
  https://github.com/rsyslog/rsyslog/issues/1351

  PR:		211033
  Submitted by:	dam@my.gd
  Reported by:	xavi.garcia@gmail.com

Changes:
  head/sysutils/rsyslog8/files/patch-grammar_rainerscript.c
  head/sysutils/rsyslog8/files/patch-plugins_impstats_impstats.c
  head/sysutils/rsyslog8/files/patch-runtime_nsd__ptcp.c
  head/sysutils/rsyslog8/files/patch-runtime_stream.c
Comment 7 Matthew Seaman freebsd_committer freebsd_triage 2017-01-02 18:08:13 UTC
Patch committed, thanks!

Thank you for submitting this upstream.  Agreed that EIO looks rather more likely as something that write(2) would return.  In fact, I don't see where
ENXIO would come from at all.