rsyslogd hangs in an infinite loop trying to write a syslog record to a file descriptor. truss -p pid outputs write(1,"2016-07-06T10:01:23.061615+00:00"...,116) ERR#5 'Input/output error' write(1,"2016-07-06T10:01:23.061615+00:00"...,116) ERR#5 'Input/output error' write(1,"2016-07-06T10:01:23.061615+00:00"...,116) ERR#5 'Input/output error' write(1,"2016-07-06T10:01:23.061615+00:00"...,116) ERR#5 'Input/output error' write(1,"2016-07-06T10:01:23.061615+00:00"...,116) ERR#5 'Input/output error' write(1,"2016-07-06T10:01:23.061615+00:00"...,116) ERR#5 'Input/output error' It is not displayed here but the message written is exactly the same every time (truss -s 1024 -p pid) We assumed it was the same bug that was closed upstream on https://github.com/rsyslog/rsyslog/issues/318 but we cannot reproduce it with the proof of concept. Furthermore, building the binary with debug mode (WITH_DEBUG=yes) stopped the bug from being triggered and we cannot get a coredump to investigate. We then forced the debug build with optimisations on (-O2) and we still cannot reproduce the error. We have seen this happening in fd 1 and fd 2. The command `fstat -p pid` returned root rsyslogd 1073 2 - - bad that means bad file descriptor in stat(2) and errno.h (EBADF).
After rebooting one of our servers, we hit the bug again and we were able to investigate further. It seems to be related to the already closed bug report https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200429 Our rsyslog.conf sends *.err messages to /dev/console, that should be the file descriptor FD1. Unfortunately, this FD is not available and the rsyslog process ends up in an infinite loop, trying to write the same message. # fstat -p `cat /var/run/syslog.pid` USER CMD PID FD MOUNT INUM MODE SZ|DV R/W root rsyslogd 1375 text / 100290 -rwxr-xr-x 606760 r root rsyslogd 1375 wd / 4 drwxr-xr-x 27 r root rsyslogd 1375 root / 4 drwxr-xr-x 27 r root rsyslogd 1375 0* local dgram fffff80331b1dd20 root rsyslogd 1375 1 - - bad - root rsyslogd 1375 2 - 881 -rw-r----- 202 w root rsyslogd 1375 3* internet dgram udp fffff803318f5dc8 root rsyslogd 1375 4 /dev 32 crw-rw-rw- random r root rsyslogd 1375 5* internet stream tcp fffff80400488810 root rsyslogd 1375 6 - 33 -rw-r--r-- 48744 w root rsyslogd 1375 7 - 2025 -rw------- 2570 w root rsyslogd 1375 8 - 623 -rw------- 91099 w root rsyslogd 1375 9 - 622 -rw------- 93137 w root rsyslogd 1375 10 - 2691 -rw-r----- 29562283 w We have a core dump available in case it is needed.
Created attachment 178452 [details] Fix tryTTYRecover in stream.c to also match on errno 5
Xavier, Your problem is very similar to the one encountered by Alexandre and for which he proposed a fix in #200429 : https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200429 I'm not sure what FreeBSD version Alexandre runs to get errno 6 ENXIO. I, however, run 10-STABLE and get errno 5 EIO. I've attached a quick fix which solves the problem for me. Brad, would you kindly review and commit the proposed patch ?
I thought this had been resolved, but I am not using rsyslog anymore so over to the new maintainer.
Issue opened over at github : https://github.com/rsyslog/rsyslog/issues/1351
A commit references this bug: Author: matthew Date: Mon Jan 2 18:05:06 UTC 2017 New revision: 430388 URL: https://svnweb.freebsd.org/changeset/ports/430388 Log: Handle errno==5 (EIO) as the result of trying to write(2) to the console. Submittet upstream by Damien Fleurot: https://github.com/rsyslog/rsyslog/issues/1351 PR: 211033 Submitted by: dam@my.gd Reported by: xavi.garcia@gmail.com Changes: head/sysutils/rsyslog8/files/patch-grammar_rainerscript.c head/sysutils/rsyslog8/files/patch-plugins_impstats_impstats.c head/sysutils/rsyslog8/files/patch-runtime_nsd__ptcp.c head/sysutils/rsyslog8/files/patch-runtime_stream.c
Patch committed, thanks! Thank you for submitting this upstream. Agreed that EIO looks rather more likely as something that write(2) would return. In fact, I don't see where ENXIO would come from at all.