Bug 203129 - syslogd: /dev/console: Interrupted system call
Summary: syslogd: /dev/console: Interrupted system call
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks: 203349 231027
  Show dependency treegraph
 
Reported: 2015-09-15 15:56 UTC by Ed Maste
Modified: 2019-01-11 19:03 UTC (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ed Maste freebsd_committer 2015-09-15 15:56:33 UTC
Observed fairly regularly on arm64:

root@:~ # Sep 15 15:00:11  syslogd: /dev/console: Interrupted system call
Comment 1 pvoigt 2017-04-02 14:43:35 UTC
I can observe this on 11.0-RELEASE-p8 (amd64) rather regularly as well.

The message occurs nearly every time when cron starts newsyslog every hour which in turn triggers a restart of syslogd.

After the first occurance of the message there are no more messages written to /dev/console. Sometimes syslogd even hangs in a state where it is still visible in the process list but does not do any logging anymore. And a restart even fails.

I have discussed this issue on IRC and besides other advice I have been told it is somehow related to the serial console. I have been playing with different speeds with no effect. But after disabling agetty on ttyu0 the messages are gone.

At this point I am a bit lost, because need serial console logins with my headless server. And I cannot decide, if my uart device is somehow broken or if agetty/syslogd may have an error.
Comment 2 J.Catrysse 2017-08-02 01:28:20 UTC
Same here after adding to /etc/ttys:
ttyu1   "/usr/libexec/getty std.115200"   vt100   onifconsole secure

11.1 release

Supermicro ipmi is used on ttyu1
Comment 3 J.Catrysse 2017-08-02 01:43:31 UTC
Commenting out the /dev/console line in syslog.conf is a workaround.
Comment 4 Paul Macdonald 2017-08-07 09:40:27 UTC
Also seeing this, and worryingly it can prevent a su to root.
Comment 5 heas 2017-09-28 17:32:43 UTC
Me too.  Like J.Catrysse@, I have a Supermicro X9 ipmi at 115200 with console forwarding set in the BIOS on fbsd 11.1.  In addition, I have loader.conf configured for dual consoles (serial/ttyu0 and KVM), as in:
hint.sio.0.flags="0x30"
boot_multicons="YES"
boot_serial="YES"
comconsole_speed="115200"
console="comconsole,vidconsole"

only one syslog line logging to /dev/console:
kern.*;syslog.err                                       /dev/console

ktrace syslogd & kill -1 it (or restart it), reveals:
 37969 syslogd  GIO   fd 10 read 1409 bytes     XXX syslog.conf
      ......
 37969 syslogd  CALL  openat(AT_FDCWD,0x7fffffffc4d6,0x9<O_WRONLY|O_APPEND>)
 37969 syslogd  NAMI  "/dev/console"
 37969 syslogd  RET   openat -1 errno 4 Interrupted system call
 37969 syslogd  PSIG  SIGALRM caught handler=0x404c60 mask=0x80001 code=SI_KERNEL
      .......
 37969 syslogd  CALL  openat(AT_FDCWD,0x408828,0x5<O_WRONLY|O_NONBLOCK>)
 37969 syslogd  NAMI  "/dev/console"
 37969 syslogd  RET   openat 11/0xb
 37969 syslogd  CALL  openat(AT_FDCWD,0x7fffffffa300,0x5<O_WRONLY|O_NONBLOCK>)
 37969 syslogd  NAMI  "/dev/console"
 37969 syslogd  RET   openat 12/0xc
 37969 syslogd  CALL  writev(0xc,0x7fffffffaa10,0x7)
 37969 syslogd  GIO   fd 12 wrote 65 bytes
       "Sep 28 17:18:59  syslogd: /dev/console: Interrupted system call\r
       "


This is a recent upgrade to 11.1, but it worked in 10.3.
Comment 6 Ed Maste freebsd_committer 2017-10-04 16:26:31 UTC
Hardware changed to "any" based on follow-ups
Comment 7 Richard Russo 2017-10-19 18:22:48 UTC
This seems to be related to a possibly inadvertent change I noticed on our 11.1-RELEASE (p1) hosts: the serial console device can now have its baud rate and modem control settings changed other than at boot. On 10.3 and below, modem control was prohibited for serial console, and baud rate was fixed at boot.

We had /etc/ttys configured to do "/usr/libexec/getty std.9600", but the serial console is actually set for 115200, with no modem control. On 10.3 and earlier, the getty config was ignored, but on 11.1, the config is used, and since our console connection (also Supermicro IPMI SOL) isn't doing modem control (or not properly), this made output to the serial console block -- which was causing segfaulting processes to never exit.

This seems to be resolvable, by setting /etc/ttys to the proper settings (3wire.115200 in our case), but I didn't see a kernel change that looked like it was intentionally changing this, and sys/kern/tty.c:323 still seems to be trying to prevent modem control on the console port.
Comment 8 J.Catrysse 2017-10-23 11:48:46 UTC
3wire.115200 seemed to have fixed the issue.
Comment 9 heas 2017-10-31 13:23:31 UTC
I haven't tried dropping modem control yet, which is how my console server is configured.  This seems like a bug to not treat the console specially; without an active connection, the console server will not assert CTS/DTR - like a modem would without an active connection, so the server will not be able to write and things will block.

The kernel must treat the console specially, as annoying as this may be.  Its not enough to stop syslog writing to the console - which i admit is normally useless anyway.
Comment 10 heas 2017-12-20 17:14:48 UTC
I have noticed another regressive behavior from this:

Local system status:
 3:01AM  up 36 days, 7 hrs, 3 users, load averages: 165.55, 164.19, 161.10

I am forced to reboot at this point.

This is a headless server with a service processor and a serial console with h/w flow control running 11.1-p4. A similar server without a head and no serial console does not have this problem, nor does a headless 10.3 server, nor did this server when it was running 10.3.

By process of elimination, I have found that removing syslog.conf lines that log to /dev/console, this is avoided (or seemes to be after 2 days without reoccurring).

What happens is that the daily cron newsyslog rotates files and HUPs syslogd.  At some point after that, usually a few hours, the load starts to climb.  On this server, it is many postgres processes driving that load in a tight error/retry loop.  Postgres is configured to lot to syslog. That loop traces as:

 96701 postgres CALL  sendto(0x7,0x7fffffffceb0,0x9d,0,0,0)
 96701 postgres RET   sendto -1 errno 55 No buffer space available
 96701 postgres CALL  nanosleep(0x7fffffffc980,0)
 96701 postgres RET   nanosleep 0
 96701 postgres CALL  sendto(0x7,0x7fffffffceb0,0x9d,0,0,0)
 96701 postgres RET   sendto -1 errno 55 No buffer space available
 96701 postgres CALL  nanosleep(0x7fffffffc980,0)

I failed to take note of what FD 7 is.

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
96701 pgsql         1  48    0   179M   144M RUN     9 102:21   9.47% postgres
40970 pgsql         1  48    0   179M   144M RUN     8  69:55   9.08% postgres
44637 pgsql         1  48    0   179M   144M RUN    10  67:07   8.98% postgres
70361 pgsql         1  48    0   179M   144M RUN     7  77:29   8.59% postgres
       ......

USER      PID %CPU %MEM    VSZ    RSS TT  STAT STARTED         TIME COMMAND
pgsql   32593  9.1  0.2 182788 147740  -  Rs   22:50       95:11.30 postgres: gld gld 127.0.0.1(57118)  (postgres)
pgsql   80018  9.0  0.2 182788 147932  -  Rs   00:26       74:00.04 postgres: gld gld 127.0.0.1(18337)  (postgres)
pgsql   67956  8.9  0.2 182788 147940  -  Rs   00:26       73:59.81 postgres: gld gld 127.0.0.1(18309)  (postgres)
   .....

Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address          Foreign Address        (state)
gld to postfix:
tcp4       0      0 127.0.0.1.5432         127.0.0.1.50479        ESTABLISHED
tcp4       0      0 127.0.0.1.50479        127.0.0.1.5432         ESTABLISHED
tcp4       0      0 127.0.0.1.2525         127.0.0.1.50478        CLOSE_WAIT
tcp4       0      0 127.0.0.1.5432         127.0.0.1.29439        ESTABLISHED
tcp4       0      0 127.0.0.1.29439        127.0.0.1.5432         ESTABLISHED
tcp4       0      0 127.0.0.1.2525         127.0.0.1.29438        CLOSE_WAIT
tcp4       0      0 127.0.0.1.5432         127.0.0.1.37203        CLOSED
tcp4       0      0 127.0.0.1.5432         127.0.0.1.31279        CLOSED
tcp4       0      0 127.0.0.1.5432         127.0.0.1.31277        CLOSED
tcp4       0      0 127.0.0.1.5432         127.0.0.1.35576        CLOSED
   ......
these would be postfix to gld:
tcp4     596      0 127.0.0.1.2525         127.0.0.1.37998        CLOSE_WAIT 
tcp4     627      0 127.0.0.1.2525         127.0.0.1.37994        CLOSE_WAIT
tcp4     627      0 127.0.0.1.2525         127.0.0.1.37993        CLOSE_WAIT
tcp4     602      0 127.0.0.1.2525         127.0.0.1.37992        CLOSE_WAIT
tcp4     605      0 127.0.0.1.2525         127.0.0.1.37988        CLOSE_WAIT
tcp4     605      0 127.0.0.1.2525         127.0.0.1.37987        CLOSE_WAIT
tcp4     605      0 127.0.0.1.2525         127.0.0.1.37986        CLOSE_WAIT
   ......

Active UNIX domain sockets
Address          Type   Recv-Q Send-Q            Inode             Conn             Refs          Nextref Addr
fffff8012e26b690 dgram       0      0 fffff8012e2d3938                0                0                0 /var/named/var/run/log
fffff8012e26b780 dgram   49898      0 fffff8012e2d3b10                0 fffff808375de3c0                0 /var/run/logpriv
fffff8012e26b870 dgram   44120      0 fffff8012e2d3ce8                0 fffff81015ef8870                0 /var/run/log
fffff8012e2b51e0 seqpac      0      0 fffff80077b11938                0                0                0 /var/run/devd.seqpacket.pipe
   ....

Active Internet connections
Proto Recv-Q Send-Q Local Address          Foreign Address        R-MBUF S-MBUF R-CLUS S-CLUS R-HIWA S-HIWA R-LOWA S-LOWA R-BCNT S-BCNT R-BMAX S-BMAX   rexmt persist    keep    2msl  delack rcvtime
tcp4       0      0 127.0.0.1.5432         127.0.0.1.38803             0      0      0      0  81720  49032      1   2048      0      0 653760 392256    0.00    0.00 7145.47    0.00    0.00   54.52
tcp4       0      0 127.0.0.1.38803        127.0.0.1.5432              0      0      0      0  81720  49032      1   2048      0      0 653760 392256    0.00    0.00 7145.57    0.00    0.00   54.42
tcp4     668      0 127.0.0.1.2525         127.0.0.1.54066             1      0      1      0  81720  49032      1   2048   2304      0 653760 392256    0.00    0.00 7145.47    0.00    0.00   54.52
tcp4       0      0 127.0.0.1.54066        127.0.0.1.2525              0      0      0      0  81720  49032      1   2048      0      0 653760 392256    0.00    0.00 7145.57    0.00    0.00   54.42
tcp4     668      0 127.0.0.1.2525         127.0.0.1.33204             1      0      1      0  81720  49032      1   2048   2304      0 653760 392256    0.00    0.00 7145.47    0.00    0.00   54.52
tcp4       0      0 127.0.0.1.33204        127.0.0.1.2525              0      0      0      0  81720  49032      1   2048      0      0 653760 392256    0.00    0.00 7145.47  545.47    0.00   54.52
   .....

That is as far as I've analyzed this, but toward a vote to revert whatever changed with console from 10.3.
Comment 11 Adam Bernstein 2018-04-11 22:53:44 UTC
+1 to reverting 11.x to the previous console behavior in 10.3!

Same problem here on 11.1-RELEASE-p9 amd64, using serial console server (Cyclades) on ttyu0, on Supermicro SM X6DHR-TG motherboard. Sadly, no luck yet with the suggested fixes here; our console speed is set to 9600, but neither "std.9600" nor "3wire.9600" is getting it working so far.
Comment 12 Adam Bernstein 2018-04-12 21:42:34 UTC
Sorry for my untidy commenting, but two additions to make:

1. The workaround of removing any /dev/console logging from syslog.conf did do the trick for us, it's just not really an ideal or complete fix.

2. It took us a long time to identify syslogd as the root cause among all the symptoms and to notice the right error message (the interrupted system call) that led us to this bug. Instead, the only visible error message we were aware of for a while was this one, repeated every 30 seconds in the messages log whenever we attempted to connect to the serial console:

  init: getty repeating too quickly on port /dev/ttyu0, sleeping 30 secs

But a web search on that error turned up a lot of stuff, none of it relevant. FWIW, 'ps' showed the getty process for the serial port with just "-" in its TT column instead of "u0", and lacking the "s+" in its STAT column that the healthy gettys had.

I hope that will help someone else find this bug quicker.