Observed fairly regularly on arm64: root@:~ # Sep 15 15:00:11 syslogd: /dev/console: Interrupted system call
I can observe this on 11.0-RELEASE-p8 (amd64) rather regularly as well. The message occurs nearly every time when cron starts newsyslog every hour which in turn triggers a restart of syslogd. After the first occurance of the message there are no more messages written to /dev/console. Sometimes syslogd even hangs in a state where it is still visible in the process list but does not do any logging anymore. And a restart even fails. I have discussed this issue on IRC and besides other advice I have been told it is somehow related to the serial console. I have been playing with different speeds with no effect. But after disabling agetty on ttyu0 the messages are gone. At this point I am a bit lost, because need serial console logins with my headless server. And I cannot decide, if my uart device is somehow broken or if agetty/syslogd may have an error.
Same here after adding to /etc/ttys: ttyu1 "/usr/libexec/getty std.115200" vt100 onifconsole secure 11.1 release Supermicro ipmi is used on ttyu1
Commenting out the /dev/console line in syslog.conf is a workaround.
Also seeing this, and worryingly it can prevent a su to root.
Me too. Like J.Catrysse@, I have a Supermicro X9 ipmi at 115200 with console forwarding set in the BIOS on fbsd 11.1. In addition, I have loader.conf configured for dual consoles (serial/ttyu0 and KVM), as in: hint.sio.0.flags="0x30" boot_multicons="YES" boot_serial="YES" comconsole_speed="115200" console="comconsole,vidconsole" only one syslog line logging to /dev/console: kern.*;syslog.err /dev/console ktrace syslogd & kill -1 it (or restart it), reveals: 37969 syslogd GIO fd 10 read 1409 bytes XXX syslog.conf ...... 37969 syslogd CALL openat(AT_FDCWD,0x7fffffffc4d6,0x9<O_WRONLY|O_APPEND>) 37969 syslogd NAMI "/dev/console" 37969 syslogd RET openat -1 errno 4 Interrupted system call 37969 syslogd PSIG SIGALRM caught handler=0x404c60 mask=0x80001 code=SI_KERNEL ....... 37969 syslogd CALL openat(AT_FDCWD,0x408828,0x5<O_WRONLY|O_NONBLOCK>) 37969 syslogd NAMI "/dev/console" 37969 syslogd RET openat 11/0xb 37969 syslogd CALL openat(AT_FDCWD,0x7fffffffa300,0x5<O_WRONLY|O_NONBLOCK>) 37969 syslogd NAMI "/dev/console" 37969 syslogd RET openat 12/0xc 37969 syslogd CALL writev(0xc,0x7fffffffaa10,0x7) 37969 syslogd GIO fd 12 wrote 65 bytes "Sep 28 17:18:59 syslogd: /dev/console: Interrupted system call\r " This is a recent upgrade to 11.1, but it worked in 10.3.
Hardware changed to "any" based on follow-ups
This seems to be related to a possibly inadvertent change I noticed on our 11.1-RELEASE (p1) hosts: the serial console device can now have its baud rate and modem control settings changed other than at boot. On 10.3 and below, modem control was prohibited for serial console, and baud rate was fixed at boot. We had /etc/ttys configured to do "/usr/libexec/getty std.9600", but the serial console is actually set for 115200, with no modem control. On 10.3 and earlier, the getty config was ignored, but on 11.1, the config is used, and since our console connection (also Supermicro IPMI SOL) isn't doing modem control (or not properly), this made output to the serial console block -- which was causing segfaulting processes to never exit. This seems to be resolvable, by setting /etc/ttys to the proper settings (3wire.115200 in our case), but I didn't see a kernel change that looked like it was intentionally changing this, and sys/kern/tty.c:323 still seems to be trying to prevent modem control on the console port.
3wire.115200 seemed to have fixed the issue.
I haven't tried dropping modem control yet, which is how my console server is configured. This seems like a bug to not treat the console specially; without an active connection, the console server will not assert CTS/DTR - like a modem would without an active connection, so the server will not be able to write and things will block. The kernel must treat the console specially, as annoying as this may be. Its not enough to stop syslog writing to the console - which i admit is normally useless anyway.
I have noticed another regressive behavior from this: Local system status: 3:01AM up 36 days, 7 hrs, 3 users, load averages: 165.55, 164.19, 161.10 I am forced to reboot at this point. This is a headless server with a service processor and a serial console with h/w flow control running 11.1-p4. A similar server without a head and no serial console does not have this problem, nor does a headless 10.3 server, nor did this server when it was running 10.3. By process of elimination, I have found that removing syslog.conf lines that log to /dev/console, this is avoided (or seemes to be after 2 days without reoccurring). What happens is that the daily cron newsyslog rotates files and HUPs syslogd. At some point after that, usually a few hours, the load starts to climb. On this server, it is many postgres processes driving that load in a tight error/retry loop. Postgres is configured to lot to syslog. That loop traces as: 96701 postgres CALL sendto(0x7,0x7fffffffceb0,0x9d,0,0,0) 96701 postgres RET sendto -1 errno 55 No buffer space available 96701 postgres CALL nanosleep(0x7fffffffc980,0) 96701 postgres RET nanosleep 0 96701 postgres CALL sendto(0x7,0x7fffffffceb0,0x9d,0,0,0) 96701 postgres RET sendto -1 errno 55 No buffer space available 96701 postgres CALL nanosleep(0x7fffffffc980,0) I failed to take note of what FD 7 is. PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 96701 pgsql 1 48 0 179M 144M RUN 9 102:21 9.47% postgres 40970 pgsql 1 48 0 179M 144M RUN 8 69:55 9.08% postgres 44637 pgsql 1 48 0 179M 144M RUN 10 67:07 8.98% postgres 70361 pgsql 1 48 0 179M 144M RUN 7 77:29 8.59% postgres ...... USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND pgsql 32593 9.1 0.2 182788 147740 - Rs 22:50 95:11.30 postgres: gld gld 127.0.0.1(57118) (postgres) pgsql 80018 9.0 0.2 182788 147932 - Rs 00:26 74:00.04 postgres: gld gld 127.0.0.1(18337) (postgres) pgsql 67956 8.9 0.2 182788 147940 - Rs 00:26 73:59.81 postgres: gld gld 127.0.0.1(18309) (postgres) ..... Active Internet connections (including servers) Proto Recv-Q Send-Q Local Address Foreign Address (state) gld to postfix: tcp4 0 0 127.0.0.1.5432 127.0.0.1.50479 ESTABLISHED tcp4 0 0 127.0.0.1.50479 127.0.0.1.5432 ESTABLISHED tcp4 0 0 127.0.0.1.2525 127.0.0.1.50478 CLOSE_WAIT tcp4 0 0 127.0.0.1.5432 127.0.0.1.29439 ESTABLISHED tcp4 0 0 127.0.0.1.29439 127.0.0.1.5432 ESTABLISHED tcp4 0 0 127.0.0.1.2525 127.0.0.1.29438 CLOSE_WAIT tcp4 0 0 127.0.0.1.5432 127.0.0.1.37203 CLOSED tcp4 0 0 127.0.0.1.5432 127.0.0.1.31279 CLOSED tcp4 0 0 127.0.0.1.5432 127.0.0.1.31277 CLOSED tcp4 0 0 127.0.0.1.5432 127.0.0.1.35576 CLOSED ...... these would be postfix to gld: tcp4 596 0 127.0.0.1.2525 127.0.0.1.37998 CLOSE_WAIT tcp4 627 0 127.0.0.1.2525 127.0.0.1.37994 CLOSE_WAIT tcp4 627 0 127.0.0.1.2525 127.0.0.1.37993 CLOSE_WAIT tcp4 602 0 127.0.0.1.2525 127.0.0.1.37992 CLOSE_WAIT tcp4 605 0 127.0.0.1.2525 127.0.0.1.37988 CLOSE_WAIT tcp4 605 0 127.0.0.1.2525 127.0.0.1.37987 CLOSE_WAIT tcp4 605 0 127.0.0.1.2525 127.0.0.1.37986 CLOSE_WAIT ...... Active UNIX domain sockets Address Type Recv-Q Send-Q Inode Conn Refs Nextref Addr fffff8012e26b690 dgram 0 0 fffff8012e2d3938 0 0 0 /var/named/var/run/log fffff8012e26b780 dgram 49898 0 fffff8012e2d3b10 0 fffff808375de3c0 0 /var/run/logpriv fffff8012e26b870 dgram 44120 0 fffff8012e2d3ce8 0 fffff81015ef8870 0 /var/run/log fffff8012e2b51e0 seqpac 0 0 fffff80077b11938 0 0 0 /var/run/devd.seqpacket.pipe .... Active Internet connections Proto Recv-Q Send-Q Local Address Foreign Address R-MBUF S-MBUF R-CLUS S-CLUS R-HIWA S-HIWA R-LOWA S-LOWA R-BCNT S-BCNT R-BMAX S-BMAX rexmt persist keep 2msl delack rcvtime tcp4 0 0 127.0.0.1.5432 127.0.0.1.38803 0 0 0 0 81720 49032 1 2048 0 0 653760 392256 0.00 0.00 7145.47 0.00 0.00 54.52 tcp4 0 0 127.0.0.1.38803 127.0.0.1.5432 0 0 0 0 81720 49032 1 2048 0 0 653760 392256 0.00 0.00 7145.57 0.00 0.00 54.42 tcp4 668 0 127.0.0.1.2525 127.0.0.1.54066 1 0 1 0 81720 49032 1 2048 2304 0 653760 392256 0.00 0.00 7145.47 0.00 0.00 54.52 tcp4 0 0 127.0.0.1.54066 127.0.0.1.2525 0 0 0 0 81720 49032 1 2048 0 0 653760 392256 0.00 0.00 7145.57 0.00 0.00 54.42 tcp4 668 0 127.0.0.1.2525 127.0.0.1.33204 1 0 1 0 81720 49032 1 2048 2304 0 653760 392256 0.00 0.00 7145.47 0.00 0.00 54.52 tcp4 0 0 127.0.0.1.33204 127.0.0.1.2525 0 0 0 0 81720 49032 1 2048 0 0 653760 392256 0.00 0.00 7145.47 545.47 0.00 54.52 ..... That is as far as I've analyzed this, but toward a vote to revert whatever changed with console from 10.3.
+1 to reverting 11.x to the previous console behavior in 10.3! Same problem here on 11.1-RELEASE-p9 amd64, using serial console server (Cyclades) on ttyu0, on Supermicro SM X6DHR-TG motherboard. Sadly, no luck yet with the suggested fixes here; our console speed is set to 9600, but neither "std.9600" nor "3wire.9600" is getting it working so far.
Sorry for my untidy commenting, but two additions to make: 1. The workaround of removing any /dev/console logging from syslog.conf did do the trick for us, it's just not really an ideal or complete fix. 2. It took us a long time to identify syslogd as the root cause among all the symptoms and to notice the right error message (the interrupted system call) that led us to this bug. Instead, the only visible error message we were aware of for a while was this one, repeated every 30 seconds in the messages log whenever we attempted to connect to the serial console: init: getty repeating too quickly on port /dev/ttyu0, sleeping 30 secs But a web search on that error turned up a lot of stuff, none of it relevant. FWIW, 'ps' showed the getty process for the serial port with just "-" in its TT column instead of "u0", and lacking the "s+" in its STAT column that the healthy gettys had. I hope that will help someone else find this bug quicker.