Bug 13992

Summary: routed exit after some day of work with signal 6 (core dump)
Product: Base System Reporter: Riccardo Torrini <riccardo>
Component: miscAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 3.3-STABLE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
gettimeofday_test.c
none
dmesg.boot none

Description Riccardo Torrini 1999-09-27 08:40:00 UTC
For the 3rd time routed -s exits after some day of work with signal 6
(SIGABRT) without any other message. Visible on console (and as last
line of dmesg) but not always on /var/log/messages.
The machine is up from 22.9.1999-23:42 (reboot after make world)

From dmesg (this happens at 02:22 GMT+1 this morning, 27.9.1999):
-----8<----------8<----------8<-----
CPU: i486 DX2 (486-class CPU)
  Origin = "GenuineIntel"  Id = 0x435  Stepping = 5
  Features=0x3<FPU,VME>
real memory  = 92274688 (90112K bytes)
avail memory = 86380544 (84356K bytes)
[...]
changing root device to da0s1a
pid 1147 (routed), uid 0: exited on signal 6 (core dumped)
-----8<----------8<----------8<-----

From messages:
-----8<----------8<----------8<-----
Sep 24 21:24:43 snail routed[1147]: select: Invalid argument
Sep 24 21:24:44 snail /kernel: pid 1147 (routed), uid 0: exited on signal 6 (core dumped)

Fix: 

I have work-around with a script that poll processes and
respawn "routed -s" when it dies.  Not a real fix... :-(
How-To-Repeat: On my machine, an HP NetServer 4/66-LC (486/DX2-66) used as internal
router with 4 Intel EtherExpress Pro/10 on isa and internet gateway
with 56k internal usrobotics modem, it happens every some day.
I recompiled world and kernel, after cvsupping, on end of august,
begin of september, 20 and 24 of september.  No more often because
I need full 24 hours to build and install :-(
Comment 1 Ruslan Ermilov 1999-09-27 11:18:41 UTC
On Mon, Sep 27, 1999 at 12:30:21AM -0700, riccardo@torrini.org wrote:
> 
> For the 3rd time routed -s exits after some day of work with signal 6
> (SIGABRT) without any other message. Visible on console (and as last
> line of dmesg) but not always on /var/log/messages.
> The machine is up from 22.9.1999-23:42 (reboot after make world)
> 
[...]
> >From messages:
> -----8<----------8<----------8<-----
> Sep 24 21:24:43 snail routed[1147]: select: Invalid argument
> Sep 24 21:24:44 snail /kernel: pid 1147 (routed), uid 0: exited on signal 6 (core dumped)
> 
Could you please compile the routed(8) with debug symbols, i.e.

# cd /usr/src/sbin/routed; make DEBUG_FLAGS=-g clean all

And run gdb(1) against the core file with this version of routed(8)?


Thanks,
-- 
Ruslan Ermilov		Sysadmin and DBA of the
ru@ucb.crimea.ua	United Commercial Bank,
ru@FreeBSD.org		FreeBSD committer,
+380.652.247.647	Simferopol, Ukraine

http://www.FreeBSD.org	The Power To Serve
http://www.oracle.com	Enabling The Information Age
Comment 2 Ruslan Ermilov 1999-09-28 13:56:26 UTC
On Tue, Sep 28, 1999 at 12:52:08PM +0200, Riccardo Torrini wrote:
> Ruslan Ermilov wrote:
> 
> > Great!  Send me this core file as well, and in two minutes after
> > that you'll know what happened, I'm very close to it.
> 
> Here it is.
> 
The problem is that gettimeofday(2) returns a garbage for you, and
the timeout value being passed to select(2) becomes invalid:

: GNU gdb 4.18
: Copyright 1998 Free Software Foundation, Inc.
: GDB is free software, covered by the GNU General Public License, and you are
: welcome to change it and/or distribute copies of it under certain conditions.
: Type "show copying" to see the conditions.
: There is absolutely no warranty for GDB.  Type "show warranty" for details.
: This GDB was configured as "i386-unknown-freebsd"...
: Core was generated by `routed'.
: Program terminated with signal 6, Abort trap.
: #0  0x806d114 in kill ()
: (gdb) where
: #0  0x806d114 in kill ()
: #1  0x806c608 in abort ()
: #2  0x804c208 in logbad (dump=1, p=0x80718c7 "select: %s")
:     at /usr/src/sbin/routed/main.c:901
: #3  0x804b90b in main (argc=0, argv=0xbfbfdbfc)
:     at /usr/src/sbin/routed/main.c:468
: #4  0x80480e9 in _start ()
: (gdb) up 3
: #3  0x804b90b in main (argc=0, argv=0xbfbfdbfc)
:     at /usr/src/sbin/routed/main.c:468
: 468					BADERR(1,"select");
: (gdb) list
: 463			trace_flush();
: 464			ibits = fdbits;
: 465			n = select(sock_max, &ibits, 0, 0, &wtime);
: 466			if (n <= 0) {
: 467				if (n < 0 && errno != EINTR && errno != EAGAIN)
: 468					BADERR(1,"select");
: 469				continue;
: 470			}
: 471	
: 472			if (FD_ISSET(rt_sock, &ibits)) {
: (gdb) print sock_max
: $1 = 6
: (gdb) print wtime
: $2 = {tv_sec = 3, tv_usec = 695150852}
                              ^^^^^^^^^
			      that's why select(2) returned EINVAL
: (gdb) print ifinit_timer
: $3 = {tv_sec = 184988, tv_usec = 841300}
: (gdb) print now
: $4 = {tv_sec = 184985, tv_usec = -694309552}
                                   ^^^^^^^^^^
				   what's up?
: (gdb) print epoch
: $5 = {tv_sec = 938326603, tv_usec = 194765}
: (gdb) print clk
: $6 = {tv_sec = 938511589, tv_usec = -695114787}
                                      ^^^^^^^^^^
				      bah, gettimeofday(2) failed!
: (gdb) print prev_clk
: $7 = {tv_sec = 938511586, tv_usec = 900334}
: (gdb) quit


Could you please compile and run an attached test program?
Let it run until it finishes.  If it finishes, it will print an
incorrect date returned by gettimeofday().

Then please send me the output of this test (if any), as well as
the output of the following commands:

# cat /var/run/dmesg.boot
# sysctl kern.timecounter.method machdep.tsc_freq


Cheers,
-- 
Ruslan Ermilov		Sysadmin and DBA of the
ru@ucb.crimea.ua	United Commercial Bank,
ru@FreeBSD.org		FreeBSD committer,
+380.652.247.647	Simferopol, Ukraine

http://www.FreeBSD.org	The Power To Serve
http://www.oracle.com	Enabling The Information Age
Comment 3 Riccardo Torrini 1999-09-28 15:20:43 UTC
Ruslan Ermilov wrote:

> Let it run until it finishes.  If it finishes, it will print an
> incorrect date returned by gettimeofday().

After about 20minutes:
gettimeofday_test: invalid time returned: 938527821:-695331771


# sysctl kern.timecounter.method machdep.tsc_freq
kern.timecounter.method: 0


# sysctl -a | grep -i machdep
machdep.consdev: { major = 0, minor = 0 }
machdep.adjkerntz: -7200
machdep.disable_rtc_set: 0
machdep.wall_cmos_clock: 1
machdep.do_dump: 1
machdep.ispc98: 0
machdep.msgbuf: 
machdep.msgbuf_clear: 0
machdep.i8254_freq: 1193182
machdep.conspeed: 9600


# sysctl -a | grep -i tsc


# sysctl -a | grep -i freq
kern.acct_chkfreq: 15
machdep.i8254_freq: 1193182


Sorry, no machdep.tsc_freq (but a sound-like machdep.i8254_freq)
If you are sure of spelling I am missing something :-(


Ciao++
Vic.
/------------------------+---------------------------------------\
| Riccardo "VIC" Torrini | W.W.W.: www.torrini.org            // |
|   Via Montebello, 64   | e-mail : riccardo@torrini.org     //  |
|   50123 Firenze  (I)   +--------------------------------\\//---|
| phone: +39-055-286.574 |        This space for rent :-)        |
\------------------------+---------------------------------------/
Comment 4 ru freebsd_committer freebsd_triage 1999-09-29 09:52:27 UTC
State Changed
From-To: open->closed

Superseded by PR kern/14034.