Bug 83220

Summary: Daily crashes on 5.4 SMP (with backtrace)
Product: Base System Reporter: Blaz Zupan <blaz>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 5.4-RELEASE   
Hardware: Any   
OS: Any   

Description Blaz Zupan 2005-07-10 15:20:24 UTC
Crash happens with both HTT turned on and off in the BIOS. Machine is a
heavily loaded incoming mail server, running postfix, amavisd-new and
F-Secure under Linux emulation. ipfilter is running.

See also this thread on freebsd-stable:

http://lists.freebsd.org/pipermail/freebsd-stable/2005-July/016767.html

Below is a backtrace on a crashdump. Crashdump and kernel compiled with -g
are available on request.

[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".
#0  doadump () at pcpu.h:159
159	pcpu.h: No such file or directory.
 	in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:159
#1  0xc044b006 in db_fncall (dummy1=0, dummy2=0, dummy3=-1067606609, dummy4=0xe4b6c9d0 "üɶä(\205]Àèɶäìɶä\222\a")
     at /usr/src5/sys/ddb/db_command.c:531
#2  0xc044ae14 in db_command (last_cmdp=0xc0674644, cmd_table=0x0, aux_cmd_tablep=0xc064226c, aux_cmd_tablep_end=0xc0642270)
     at /usr/src5/sys/ddb/db_command.c:349
#3  0xc044aedc in db_command_loop () at /usr/src5/sys/ddb/db_command.c:455
#4  0xc044ca75 in db_trap (type=12, code=0) at /usr/src5/sys/ddb/db_main.c:221
#5  0xc04e6599 in kdb_trap (type=12, code=0, tf=0xe4b6cb3c) at /usr/src5/sys/kern/subr_kdb.c:468
#6  0xc05f4c79 in trap_fatal (frame=0xe4b6cb3c, eva=36) at /usr/src5/sys/i386/i386/trap.c:812
#7  0xc05f43e9 in trap (frame=
       {tf_fs = -1040580584, tf_es = -1029439472, tf_ds = 16, tf_edi = -1038000128, tf_esi = -1066898900, tf_ebp = -457782384, tf_isp = -457782424, tf_ebx = -1040530304, tf_edx = -1040524364, tf_ecx = -1040524544, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1068574101, tf_cs = 8, tf_eflags = 65683, tf_esp = 180, tf_ss = 0}) at /usr/src5/sys/i386/i386/trap.c:255
#8  0xc05e283a in calltrap () at /usr/src5/sys/i386/i386/exception.s:140
#9  0xc1fa0018 in ?? ()
#10 0xc2a40010 in ?? ()
#11 0x00000010 in ?? ()
#12 0xc2216000 in ?? ()
#13 0xc0686a2c in tcbinfo ()
#14 0xe4b6cb90 in ?? ()
#15 0xe4b6cb68 in ?? ()
#16 0xc1fac480 in ?? ()
#17 0xc1fadbb4 in ?? ()
#18 0xc1fadb00 in ?? ()
#19 0x00000000 in ?? ()
#20 0x0000000c in ?? ()
#21 0x00000000 in ?? ()
#22 0xc04eda6b in propagate_priority (td=0xc2216000) at /usr/src5/sys/kern/subr_turnstile.c:243
#23 0xc04ee225 in turnstile_wait (ts=0xc1fadb00, lock=0xc0686a2c, owner=0xc2216000)
     at /usr/src5/sys/kern/subr_turnstile.c:556
#24 0xc04c5ced in _mtx_lock_sleep (m=0xc0686a2c, td=0xc1fac480, opts=0, file=0x0, line=0)
     at /usr/src5/sys/kern/kern_mutex.c:552
#25 0xc0559ad8 in tcp_usr_rcvd (so=0x0, flags=0) at /usr/src5/sys/netinet/tcp_usrreq.c:602
#26 0xc0506103 in soreceive (so=0xc27bf798, psa=0x0, uio=0xe4b6cc88, mp0=0x0, controlp=0x0, flagsp=0x0)
     at /usr/src5/sys/kern/uipc_socket.c:1395
#27 0xc04f4bd9 in soo_read (fp=0x0, uio=0xe4b6cc88, active_cred=0xc2884a80, flags=0, td=0xc1fac480)
     at /usr/src5/sys/kern/sys_socket.c:91
#28 0xc04ee865 in dofileread (td=0xc1fac480, fp=0xc2e17bb0, fd=10, buf=0x0, nbyte=4096, offset=Unhandled dwarf expression opcode 0x93
) at file.h:233
#29 0xc04ee72f in read (td=0xc1fac480, uap=0xe4b6cd14) at /usr/src5/sys/kern/sys_generic.c:107
#30 0xc05f4fe7 in syscall (frame=
       {tf_fs = 47, tf_es = 47, tf_ds = -1078001617, tf_edi = 10, tf_esi = 300, tf_ebp = -1077942168, tf_isp = -457781900, tf_ebx = 134822152, tf_edx = 0, tf_ecx = 10, tf_eax = 3, tf_trapno = 0, tf_err = 2, tf_eip = 672556795, tf_cs = 31, tf_eflags = 658, tf_esp = -1077942212, tf_ss = 47}) at /usr/src5/sys/i386/i386/trap.c:1009
#31 0xc05e288f in Xint0x80_syscall () at /usr/src5/sys/i386/i386/exception.s:201
#32 0x0000002f in ?? ()
#33 0x0000002f in ?? ()
#34 0xbfbf002f in ?? ()
#35 0x0000000a in ?? ()
#36 0x0000012c in ?? ()
#37 0xbfbfe868 in ?? ()
#38 0xe4b6cd74 in ?? ()
#39 0x08093908 in ?? ()
#40 0x00000000 in ?? ()
#41 0x0000000a in ?? ()
#42 0x00000003 in ?? ()
#43 0x00000000 in ?? ()
#44 0x00000002 in ?? ()
#45 0x281666fb in ?? ()
#46 0x0000001f in ?? ()
#47 0x00000292 in ?? ()
#48 0xbfbfe83c in ?? ()
#49 0x0000002f in ?? ()
#50 0x00000000 in ?? ()
#51 0x00000000 in ?? ()
#52 0x00000000 in ?? ()
#53 0x00000000 in ?? ()
#54 0x2c75b000 in ?? ()
#55 0xc22de000 in ?? ()
#56 0xc1fac480 in ?? ()
#57 0xe4b6ccac in ?? ()
#58 0xe4b6cc94 in ?? ()
#59 0xc1f26000 in ?? ()
#60 0xc04ded13 in sched_switch (td=0x12c, newtd=0x8093908, flags=Cannot access memory at address 0xbfbfe878
) at /usr/src5/sys/kern/sched_4bsd.c:881
Previous frame inner to this frame (corrupt stack?)
(kgdb) quit

Fix: 

Unknown
How-To-Repeat: 
Only happens on SMP boxes, a very simmilar HP DL380 G3 box with only a
single processor has currently an uptime of 12 days, while a SMP box
crashes at least once daily.
Comment 1 Blaz Zupan 2005-07-17 12:20:07 UTC
After removing ipfilter from the kernel and using pf instead, the machine has 
stayed up for 5 days now, while the maximum uptime was a couple of hours 
before the change.
Comment 2 Darern Reed freebsd_committer freebsd_triage 2005-07-18 23:50:36 UTC
State Changed
From-To: open->closed

user has solved the problem for themselves and there's not nearly enough 
useful information here to try and debug it.
Comment 3 Blaz Zupan 2005-07-19 06:14:18 UTC
> user has solved the problem for themselves and there's not nearly enough
> useful information here to try and debug it.

Do you really think this is the correct response? I would rather hear *WHAT 
KIND* of useful information you would like to see. I have provided a full 
backtrace. I have a (production) machine ready to do more testing on. Sure, if 
you don't want to work on the problem, I can live with pf...