A box running with openVPN along with a FastIPSEC tunnel will enter into a deadlock under the right conditions Fix: Using pf instead of ipfw seems to work around the issue although I dont know if that just makes it harder to trigger Using regular IPSEC seems to work just fine, but debug.mpsafenet forced to 0 as ipsec requires Giant How-To-Repeat: client machines come in via tun0 (openvpn interface) and are natted going out vlan7 /sbin/ipfw add 3400 divert natd ip from any to any via vlan7 /sbin/natd -unregistered_only -n vlan7 /sbin/ifconfig gif0 create tunnel xxx.yyy.zzz.86 aaa.bbb.ccc.118 /sbin/ifconfig gif0 10.44.99.2 netmask 255.255.255.252 10.44.99.1 netmask 255.255.255.252 Then we add a simple IPSEC policy across the gif tunnel setkey -c <<EOF add xxx.yyy.zzz.86 aaa.bbb.ccc.118 esp 1044 -m any -E blowfish-cbc "JOy1QIbr8swoTIiuZYCtkDiCSHeI1eb48rux7IzwbEXyA"; add aaa.bbb.ccc.118 xxx.yyy.zzz.86 esp 1044 -m any -E blowfish-cbc "JOy1QIbr8swoTIiuZYCtkDiCSHeI1eb48rux7IzwbEXyA"; spdadd xxx.yyy.zzz.86/32 aaa.bbb.ccc.118/32 any -P out ipsec esp/tunnel/xxx.yyy.zzz.86-aaa.bbb.ccc.118/require; spdadd aaa.bbb.ccc.118/32 xxx.yyy.zzz.86/32 any -P in ipsec esp/tunnel/aaa.bbb.ccc.118-xxx.yyy.zzz.86/require; EOF I then bring up the openvpn tunnels (just a few test clients). I can then trigger the deadlock by doing an scp across the gif tunnel On the serial console I see lock order reversal 1st 0xc292a090 inp (divinp) @ /usr/src/sys/netinet/ip_divert.c:327 2nd 0xc28bc950 ipsec request (ipsec request) @ /usr/src/sys/netipsec/ipsec_output.c:354 KDB: stack backtrace: kdb_backtrace(0,ffffffff,c075ed70,c075ed98,c07261c4) at kdb_backtrace+0x29 witness_checkorder(c28bc950,9,c06f7aa6,162) at witness_checkorder+0x564 _mtx_lock_flags(c28bc950,0,c06f7aa6,162,0) at _mtx_lock_flags+0x5b ipsec4_process_packet(c2a4a400,c28bc900,22,0,c28b7600) at ipsec4_process_packet+0x45 ip_output(c2a4a400,0,e741eb28,22,0) at ip_output+0x74f div_output(c2926b20,c2a4a400,c255a280,0,e741ec08) at div_output+0x185 div_send(c2926b20,0,c2a4a400,c255a280,0) at div_send+0x3f sosend(c2926b20,c255a280,e741ec3c,c2a4a400,0) at sosend+0x5e3 kern_sendit(c2731600,3,e741ecbc,0,0) at kern_sendit+0x104 sendit(c2731600,3,e741ecbc,0,bfbdec1c) at sendit+0x163 sendto(c2731600,e741ed04,6,2,296) at sendto+0x4d syscall(3b,3b,3b,2,7c) at syscall+0x22f Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (133, FreeBSD ELF32, sendto), eip = 0x280c5d97, esp = 0xbfbdeb0c, ebp = 0xbfbeebb8 --- And here the box is frozen up. On the serial console, I can break into the debugger telnet> send break KDB: enter: Line break on console [thread pid 12 tid 100004 ] Stopped at kdb_enter+0x2b: nop db> db> show pcpu cpuid = 0 curthread = 0xc22a9900: pid 12 "idle: cpu0" curpcb = 0xe3481d90 fpcurthread = none idlethread = 0xc22a9900: pid 12 "idle: cpu0" APIC ID = 0 currentldt = 0x50 spin locks held: db> show alllocks Process 682 (ssh) thread 0xc2aa6000 (100137) exclusive sleep mutex crypto (crypto op queues) r = 0 (0xc07aa420) locked @ /usr/src/sys/opencrypto/crypto.c:669 exclusive sleep mutex ipsec request r = 1 (0xc28bc950) locked @ /usr/src/sys/netipsec/xform_esp.c:872 exclusive sleep mutex inp (tcpinp) r = 0 (0xc28c6a68) locked @ /usr/src/sys/netinet/tcp_usrreq.c:651 Process 586 (natd) thread 0xc2731600 (100080) exclusive sleep mutex inp (divinp) r = 0 (0xc292a090) locked @ /usr/src/sys/netinet/ip_divert.c:327 exclusive sleep mutex div r = 0 (0xc07a0b6c) locked @ /usr/src/sys/netinet/ip_divert.c:325 Process 37 (swi4: clock sio) thread 0xc2317d80 (100036) exclusive sleep mutex tcp r = 0 (0xc07a1ecc) locked @ /usr/src/sys/netinet/tcp_timer.c:457 db> ps pid proc uid ppid pgrp flag stat wmesg wchan cmd 682 c2aa4830 0 681 681 0004002 [LOCK div c29f57c0] ssh 681 c2aa4a3c 0 679 681 0004002 [SLPQ piperd 0xc272f000][SLP] scp 679 c2894830 0 678 679 0004002 [SLPQ pause 0xc2894864][SLP] csh 678 c2894624 1001 675 678 0004102 [SLPQ wait 0xc2894624][SLP] su 675 c26efa3c 1001 674 675 0004002 [SLPQ pause 0xc26efa70][SLP] csh 674 c2894a3c 1001 672 672 0000100 [SLPQ select 0xc079d3c4][SLP] sshd 672 c2734830 0 523 672 0004100 [SLPQ sbwait 0xc2a2920c][SLP] sshd 663 c29f320c 0 657 663 0004102 [SLPQ select 0xc079d3c4][SLP] openvpn 657 c2894c48 0 656 657 0004002 [SLPQ pause 0xc2894c7c][SLP] csh 656 c29f3000 1001 653 656 0004102 [SLPQ wait 0xc29f3000][SLP] su 653 c29f3624 1001 652 653 0004002 [SLPQ pause 0xc29f3658][SLP] csh 652 c26ec624 1001 650 650 0000100 [SLPQ select 0xc079d3c4][SLP] sshd 650 c29f3a3c 0 523 650 0004100 [SLPQ sbwait 0xc2a2a20c][SLP] sshd 649 c29f3c48 0 1 649 0004002 [SLPQ ttyin 0xc254d410][SLP] getty 648 c29f7000 0 1 648 0004002 [SLPQ ttyin 0xc254f010][SLP] getty 647 c29f720c 0 1 647 0004002 [SLPQ ttyin 0xc254c410][SLP] getty 646 c29f7418 0 1 646 0004002 [SLPQ ttyin 0xc254fc10][SLP] getty 645 c29f7624 0 1 645 0004002 [SLPQ ttyin 0xc2555010][SLP] getty 644 c29f7830 0 1 644 0004002 [SLPQ ttyin 0xc254f810][SLP] getty 643 c2734000 0 1 643 0004002 [SLPQ ttyin 0xc2555810][SLP] getty 642 c2733000 0 1 642 0004002 [SLPQ ttyin 0xc2557010][SLP] getty 641 c26efc48 0 1 641 0004002 [SLPQ ttyin 0xc2556010][SLP] getty 595 c2730a3c 0 1 64 0000002 [SLPQ select 0xc079d3c4][SLP] 3dm2 586 c2730624 0 1 586 0000000 [LOCK ipsec request c26bacc0] natd 570 c2734624 0 1 570 0000101 [SLPQ select 0xc079d3c4][SLP] bgpd 568 c289020c 0 1 568 0000101 [SLPQ select 0xc079d3c4][SLP] zebra 545 c2734c48 0 1 545 0000000 [SLPQ nanslp 0xc07500ac][SLP] cron 532 c2733830 25 1 532 0000100 [SLPQ pause 0xc2733864][SLP] sendmail 528 c2730418 0 1 528 0000100 [SLPQ select 0xc079d3c4][SLP] sendmail 523 c2733c48 0 1 523 0000100 [SLPQ select 0xc079d3c4][SLP] sshd 521 c2890c48 0 505 505 0000000 [SLPQ pause 0xc2890c7c][SLP] ntpd 505 c26eca3c 0 1 505 0000000 [SLPQ select 0xc079d3c4][SLP] ntpd 407 c2890624 53 1 407 0000100 [SLPQ select 0xc079d3c4][SLP] named 346 c2890000 0 1 346 0000000 [SLPQ select 0xc079d3c4][SLP] syslogd 313 c26ec000 0 1 313 0000000 [SLPQ select 0xc079d3c4][SLP] devd 191 c26ef000 0 1 191 0000000 [SLPQ pause 0xc26ef034][SLP] adjkerntz 63 c238d20c 0 0 0 0000204 [SLPQ - 0xe4f4fd04][SLP] schedcpu 62 c238d418 0 0 0 0000204 [SLPQ - 0xc07a502c][SLP] nfsiod 3 61 c238d624 0 0 0 0000204 [SLPQ - 0xc07a5028][SLP] nfsiod 2 60 c238d830 0 0 0 0000204 [SLPQ - 0xc07a5024][SLP] nfsiod 1 59 c238da3c 0 0 0 0000204 [SLPQ - 0xc07a5020][SLP] nfsiod 0 58 c238dc48 0 0 0 0000204 [SLPQ vlruwt 0xc238dc48][SLP] vnlru 57 c23f4000 0 0 0 0000204 [SLPQ syncer 0xc074fe1c][SLP] syncer 56 c23f420c 0 0 0 0000204 [SLPQ psleep 0xc079d90c][SLP] bufdaemon 55 c23f4418 0 0 0 000020c [SLPQ pgzero 0xc07ab684][SLP] pagezero 54 c23f4624 0 0 0 0000204 [SLPQ psleep 0xc07ab1d4][SLP] vmdaemon 53 c23f4830 0 0 0 0000204 [SLPQ psleep 0xc07ab190][SLP] pagedaemon 52 c23f4a3c 0 0 0 0000204 [IWAIT] swi0: sio 51 c23f4c48 0 0 0 0000204 [SLPQ usbevt 0xc238b210][SLP] usb4 50 c2316624 0 0 0 0000204 [SLPQ usbevt 0xc23d4210][SLP] usb3 49 c2316830 0 0 0 0000204 [SLPQ usbevt 0xc23c0210][SLP] usb2 48 c2316a3c 0 0 0 0000204 [SLPQ usbevt 0xc2394210][SLP] usb1 47 c2316c48 0 0 0 0000204 [SLPQ usbtsk 0xc074ad64][SLP] usbtask 46 c238c000 0 0 0 0000204 [SLPQ usbevt 0xc2398210][SLP] usb0 45 c238c20c 0 0 0 0000204 [IWAIT] swi6: task queue 44 c238c418 0 0 0 0000204 [SLPQ - 0xc233a980][SLP] acpi_task2 43 c238c624 0 0 0 0000204 [SLPQ - 0xc233a980][SLP] acpi_task1 9 c238c830 0 0 0 0000204 [SLPQ - 0xc233a980][SLP] acpi_task0 8 c238ca3c 0 0 0 0000204 [SLPQ - 0xc233aa00][SLP] kqueue taskq 42 c238cc48 0 0 0 0000204 [IWAIT] swi2: cambio 41 c238d000 0 0 0 0000204 [IWAIT] swi5:+ 7 c2309c48 0 0 0 0000204 [SLPQ - 0xc233ac80][SLP] thread taskq 40 c2315000 0 0 0 0000204 [IWAIT] swi6:+ 39 c231520c 0 0 0 0000204 [SLPQ - 0xc074a5a0][SLP] yarrow 6 c2315418 0 0 0 0000204 [SLPQ - 0xc074d5a8][SLP] g_down 5 c2315624 0 0 0 0000204 [SLPQ - 0xc074d5a4][SLP] g_up 4 c2315830 0 0 0 0000204 [SLPQ - 0xc074d59c][SLP] g_event 3 c2315a3c 0 0 0 0000204 [SLPQ crypto_ret_wait 0xc07aa444][SLP] crypto returns 2 c2315c48 0 0 0 0000204 [SLPQ crypto_wait 0xc07aa404][SLP] crypto 38 c2316000 0 0 0 0000204 [IWAIT] swi3: vm 37 c231620c 0 0 0 000020c [LOCK inp c22f8dc0] swi4: clock sio 36 c2316418 0 0 0 0000204 [LOCK div c29f57c0] swi1: net 35 c2300624 0 0 0 0000204 [IWAIT] irq23: uhci0 ehci0 34 c2300830 0 0 0 0000204 [IWAIT] irq22: em0 33 c2300a3c 0 0 0 0000204 [IWAIT] irq21: twe0 32 c2300c48 0 0 0 0000204 [IWAIT] irq20: fxp0 31 c2309000 0 0 0 0000204 [IWAIT] irq19: uhci1++ 30 c230920c 0 0 0 0000204 [IWAIT] irq18: uhci2 29 c2309418 0 0 0 0000204 [IWAIT] irq17: fwohci0 28 c2309624 0 0 0 0000204 [IWAIT] irq16: uhci3 27 c2309830 0 0 0 0000204 [IWAIT] irq15: ata1 26 c2309a3c 0 0 0 0000204 [IWAIT] irq14: ata0 25 c22ad20c 0 0 0 0000204 [IWAIT] irq13: 24 c22ad418 0 0 0 0000204 [IWAIT] irq12: 23 c22ad624 0 0 0 0000204 [IWAIT] irq11: 22 c22ad830 0 0 0 0000204 [IWAIT] irq10: 21 c22ada3c 0 0 0 0000204 [IWAIT] irq9: acpi0 20 c22adc48 0 0 0 0000204 [IWAIT] irq8: 19 c2300000 0 0 0 0000204 [IWAIT] irq7: ppc0 18 c230020c 0 0 0 0000204 [IWAIT] irq6: 17 c2300418 0 0 0 0000204 [IWAIT] irq5: 16 c22a8000 0 0 0 0000204 [IWAIT] irq4: sio0 15 c22a820c 0 0 0 0000204 [IWAIT] irq3: 14 c22a8418 0 0 0 0000204 [IWAIT] irq0: 13 c22a8624 0 0 0 0000204 [IWAIT] irq1: atkbd0 12 c22a8830 0 0 0 000020c [CPU 0] idle: cpu0 11 c22a8a3c 0 0 0 000020c [CPU 1] idle: cpu1 1 c22a8c48 0 0 1 0004200 [SLPQ wait 0xc22a8c48][SLP] init 10 c22ad000 0 0 0 0000204 [SLPQ ktrace 0xc074dff8][SLP] ktrace 0 c074d6a0 0 0 0 0000200 [IWAIT] swapper db> show lockedvnods Locked vnodes db> show lockedbufs db> trace Tracing pid 12 tid 100004 td 0xc22a9900 kdb_enter(c07056df) at kdb_enter+0x2b siointr1(c254b000,c07ad5c0,0,c07054eb,56e) at siointr1+0xce siointr(c254b000) at siointr+0x21 intr_execute_handlers(c229f490,e3481c94,4,e3481cd8,c068c0e3) at intr_execute_handlers+0xa5 lapic_handle_intr(34) at lapic_handle_intr+0x2e Xapic_isr1() at Xapic_isr1+0x33 --- interrupt, eip = 0xc08a18fd, esp = 0xe3481cd8, ebp = 0xe3481cd8 --- acpi_cpu_c1(c074f780,1,e3481cf8,1,c22a8830) at acpi_cpu_c1+0x5 acpi_cpu_idle(e3481d0c,c0516a51,c05169f4,e3481d24,c0516834) at acpi_cpu_idle+0x13e cpu_idle(c05169f4,e3481d24,c0516834,0,e3481d38) at cpu_idle+0x28 idle_proc(0,e3481d38,0,c05169f4,0) at idle_proc+0x5d fork_exit(c05169f4,0,e3481d38) at fork_exit+0xa0 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe3481d6c, ebp = 0 --- db> show lockedvnods Locked vnodes db>
For the archives. I added the LOR with ID 163 to "the LOR page" See http://sources.zabbadoz.net/freebsd/lor.html#163
I'm getting what seems to be either the same problem or its fraternal twin ... only without either IPSEC (any flavor) or any vpn. Running FreeBSD 7.0-CURRENT #0: Wed Jan 4 13:41:21 EST 20 I get this Jan 12 19:27:51 jerusalem kernel: lock order reversal: Jan 12 19:27:51 jerusalem kernel: 1st 0xc364a090 inp (divinp) @ /usr/src/sys/netinet/ip_divert.c:327 Jan 12 19:27:51 jerusalem kernel: 2nd 0xc07655cc in_multi_mtx (in_multi_mtx) @ /usr/src/sys/netinet/ip_output.c:291 Jan 12 19:27:51 jerusalem kernel: KDB: stack backtrace: Jan 12 19:27:51 jerusalem kernel: kdb_backtrace(c06b7a91,c07655cc,c06b7470,c06b7470,c06c0886) at kdb_backtrace+0x2f Jan 12 19:27:51 jerusalem kernel: witness_checkorder(c07655cc,9,c06c0886,123,c06bead6) at witness_checkorder+0x6e1 Jan 12 19:27:51 jerusalem kernel: _mtx_lock_flags(c07655cc,0,c06c0886,123,c05427bd) at _mtx_lock_flags+0x85 Jan 12 19:27:51 jerusalem kernel: ip_output(c33c0b00,0,d56e5afc,22,0) at ip_output+0x460 Jan 12 19:27:51 jerusalem kernel: div_output(c35ff000,c33c0b00,c341d970,0,d56e5bb8) at div_output+0x1d5 Jan 12 19:27:51 jerusalem kernel: div_send(c35ff000,0,c33c0b00,c341d970,0) at div_send+0x5d Jan 12 19:27:51 jerusalem kernel: sosend(c35ff000,c341d970,d56e5be4,c33c0b00,0) at sosend+0x49e Jan 12 19:27:51 jerusalem kernel: kern_sendit(c339c900,3,d56e5c64,0,0) at kern_sendit+0x106 Jan 12 19:27:51 jerusalem kernel: sendit(c339c900,3,d56e5c64,0,bfbeedb0) at sendit+0x1a8 Jan 12 19:27:51 jerusalem kernel: sendto(c339c900,d56e5d04,18,43c,6) at sendto+0x5b Jan 12 19:27:51 jerusalem kernel: syscall(3b,3b,3b,bfbeed90,2) at syscall+0x2a6 Jan 12 19:27:51 jerusalem kernel: Xint0x80_syscall() at Xint0x80_syscall+0x1f Jan 12 19:27:51 jerusalem kernel: --- syscall (133, FreeBSD ELF32, sendto), eip and startup continues. If this is a problem with ipfw, it's one that happened since approximately the middle of December.
On Fri, 13 Jan 2006, Robert Huff wrote: > The following reply was made to PR kern/86427; it has been noted by GNATS. > > From: Robert Huff <roberthuff@rcn.com> > To: bug-followup@FreeBSD.org, mike@sentex.net > Cc: > Subject: Re: kern/86427: LOR / Deadlock with FASTIPSEC and nat > Date: Thu, 12 Jan 2006 20:23:11 -0500 > > I'm getting what seems to be either the same problem or its fraternal > twin ... only without either IPSEC (any flavor) or any vpn. > Running This may in part be due to a bug in IP divert sockets, resulting in recursion in the network stack. The attached untested patch may help, or at least, eliminate part of the problem. This hasn't yet been committed because I've had trouble finding someone to test it. Index: ip_divert.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/ip_divert.c,v retrieving revision 1.113 diff -u -r1.113 ip_divert.c --- ip_divert.c 13 May 2005 11:44:37 -0000 1.113 +++ ip_divert.c 13 Nov 2005 19:27:32 -0000 @@ -61,6 +61,7 @@ #include <vm/uma.h> #include <net/if.h> +#include <net/netisr.h> #include <net/route.h> #include <netinet/in.h> @@ -378,7 +379,7 @@ SOCK_UNLOCK(so); #endif /* Send packet to input processing */ - ip_input(m); + netisr_queue(NETISR_IP, m); } return error;
(I don't know this will be useful, but in case ....) Running: huff@jerusalem>> uname -v FreeBSD 7.0-CURRENT #0: Fri Jan 13 13:21:14 EST 2006 at the last re-boot I got this: Feb 14 07:42:47 jerusalem kernel: lock order reversal: Feb 14 07:42:47 jerusalem kernel: 1st 0xc3628090 inp (divinp) @ /usr/src/sys/netinet/ip_divert.c:328 Feb 14 07:42:47 jerusalem kernel: 2nd 0xc0764fe8 in_multi_mtx (in_multi_mtx) @ /usr/src/sys/netinet/ip_output.c:291 Feb 14 07:42:47 jerusalem kernel: KDB: stack backtrace: Feb 14 07:42:47 jerusalem kernel: kdb_backtrace(c06b7d5c,c0764fe8,c06b772c,c06b772c,c06c0c61) at kdb_backtrace+0x2f Feb 14 07:42:47 jerusalem kernel: witness_checkorder(c0764fe8,9,c06c0c61,123,c06beeb1) at witness_checkorder+0x6e4 Feb 14 07:42:47 jerusalem kernel: _mtx_lock_flags(c0764fe8,0,c06c0c61,123,c0542b9f) at _mtx_lock_flags+0x8b Feb 14 07:42:47 jerusalem kernel: ip_output(c394d700,0,d56daafc,22,0) at ip_output+0x460 Feb 14 07:42:47 jerusalem kernel: div_output(c35fd3e4,c394d700,c3522760,0,d56dabb8) at div_output+0x1d5 Feb 14 07:42:47 jerusalem kernel: div_send(c35fd3e4,0,c394d700,c3522760,0) at div_send+0x5d Feb 14 07:42:47 jerusalem kernel: sosend(c35fd3e4,c3522760,d56dabe4,c394d700,0) at sosend+0x49e Feb 14 07:42:47 jerusalem kernel: kern_sendit(c339b300,3,d56dac64,0,0) at kern_sendit+0x106 Feb 14 07:42:47 jerusalem kernel: sendit(c339b300,3,d56dac64,0,bfbeedb0) at sendit+0x1a8 Feb 14 07:42:47 jerusalem kernel: sendto(c339b300,d56dad04,18,43c,6) at sendto+0x5b Feb 14 07:42:47 jerusalem kernel: syscall(3b,3b,3b,bfbeed90,2) at syscall+0x2a6 Feb 14 07:42:47 jerusalem kernel: Xint0x80_syscall() at Xint0x80_syscall+0x1f Feb 14 07:42:47 jerusalem kernel: --- syscall (133, FreeBSD ELF32, sendto), eip = 0x4814230b, esp = 0xbfbeecfc, ebp = 0xbfbfeda8 --- which was was expected, and then this, which was new: Feb 14 07:43:22 jerusalem kernel: lock order reversal: Feb 14 07:43:22 jerusalem kernel: 1st 0xc3629480 inp (rawinp) @ /usr/src/sys/netinet/raw_ip.c:202 Feb 14 07:43:22 jerusalem kernel: 2nd 0xc36293d8 inp (raw6inp) @ /usr/src/sys/netinet/raw_ip.c:202 Feb 14 07:43:22 jerusalem kernel: KDB: stack backtrace: Feb 14 07:43:22 jerusalem kernel: kdb_backtrace(c06b7d5c,c36293d8,c06c4e3d,c06c4ca8,c06c0cdb) at kdb_backtrace+0x2f Feb 14 07:43:22 jerusalem kernel: witness_checkorder(c36293d8,9,c06c0cdb,ca,246) at witness_checkorder+0x6e4 Feb 14 07:43:22 jerusalem kernel: _mtx_lock_flags(c36293d8,0,c06c0cdb,ca,1) at _mtx_lock_flags+0x8b Feb 14 07:43:22 jerusalem kernel: rip_input(c394d600,14,0,d4477be8,c0542b9f) at rip_input+0x7b Feb 14 07:43:22 jerusalem kernel: icmp_input(c394d600,14,c3429000,1,0) at icmp_input+0x511 Feb 14 07:43:22 jerusalem kernel: ip_input(c394d600,0,c06bec13,e9,c0764bd8) at ip_input+0x656 Feb 14 07:43:22 jerusalem kernel: netisr_processqueue(c0764bd8,0,c06bec13,153,c32b41c0) at netisr_processqueue+0x8a Feb 14 07:43:22 jerusalem kernel: swi_net(0,d4477cdc,c050e8f0,c0719970,1) at swi_net+0xa4 Feb 14 07:43:22 jerusalem kernel: ithread_execute_handlers(c32acac8,c32a0500,c06b14b0,2f9,c32ad780) at ithread_execute_handlers+0x10d Feb 14 07:43:22 jerusalem kernel: ithread_loop(c327e6c0,d4477d38,c06b12e3,30e,c327e6c0) at ithread_loop+0x77 Feb 14 07:43:22 jerusalem kernel: fork_exit(c0502d1c,c327e6c0,d4477d38) at fork_exit+0xc5 Feb 14 07:43:22 jerusalem kernel: fork_trampoline() at fork_trampoline+0x8 Feb 14 07:43:22 jerusalem kernel: --- trap 0x1, eip = 0, esp = 0xd4477d6c, ebp = 0 --- The machine in question is still functional: huff@jerusalem>> uptime 1:39PM up 1 day, 5:57, 6 users, load averages: 1.39, 2.44, 2.99 and I believe (but have no hard data) Robert's patch has reduced the impact (time between failure was smallnum days, now smallnum weeks). Robert Huff
I have some new and hopefully useful information. Now running: FreeBSD 7.0-CURRENT #0: Mon Mar 13 09:23:39 EST 2006 a reboot today produced: Mar 27 18:14:44 jerusalem kernel: lock order reversal: Mar 27 18:14:44 jerusalem kernel: 1st 0xc362a090 inp (divinp) @ /usr/src/sys/netinet/ip_divert.c:327 Mar 27 18:14:44 jerusalem kernel: 2nd 0xc076c618 PFil hook read/write mutex (PFil hook read/write mutex) @ /usr/src/sys/net/pfil.c:73 Mar 27 18:14:44 jerusalem kernel: KDB: stack backtrace: Mar 27 18:14:44 jerusalem kernel: kdb_backtrace(c06bdee4,c076c618,c06c5154,c06c5154,c06c5120) at kdb_backtrace+0x2f Mar 27 18:14:44 jerusalem kernel: witness_checkorder(c076c618,1,c06c5120,49,c051034d) at witness_checkorder+0x6e4 Mar 27 18:14:44 jerusalem kernel: _rw_rlock(c076c618,c06c5120,49,c36049c0,0) at _rw_rlock+0x6d Mar 27 18:14:44 jerusalem kernel: pfil_run_hooks(c076c600,d5917b34,c34f8c00,2,0) at pfil_run_hooks+0x37 Mar 27 18:14:44 jerusalem kernel: ip_output(c33bc100,0,d5917b00,22,0) at ip_output+0x6f4 Mar 27 18:14:44 jerusalem kernel: div_output(c36003e4,c33bc100,c334f9b0,0,d5917bbc) at div_output+0x1d5 Mar 27 18:14:44 jerusalem kernel: div_send(c36003e4,0,c33bc100,c334f9b0,0) at div_send+0x5d Mar 27 18:14:44 jerusalem kernel: sosend(c36003e4,c334f9b0,d5917be8,c33bc100,0) at sosend+0x49e Mar 27 18:14:44 jerusalem kernel: kern_sendit(c3507870,3,d5917c68,0,0) at kern_sendit+0x106 Mar 27 18:14:44 jerusalem kernel: sendit(c3507870,3,d5917c68,0,bfbeedd1) at sendit+0x1a8 Mar 27 18:14:44 jerusalem kernel: sendto(c3507870,d5917d04,18,c35068d0,6) at sendto+0x5b Mar 27 18:14:44 jerusalem kernel: syscall(3b,3b,3b,bfbeed90,2) at syscall+0x2a4 Mar 27 18:14:44 jerusalem kernel: Xint0x80_syscall() at Xint0x80_syscall+0x1f Mar 27 18:14:44 jerusalem kernel: --- syscall (133, FreeBSD ELF32, sendto), eip = 0x4814d283, esp = 0xbfbeecfc, ebp = 0xbfbfeda8 ---
Responsible Changed From-To: freebsd-bugs->gnn LOR #163 was assigned to gnn as he is working on the locking in netipsec.
Responsible Changed From-To: gnn->freebsd-net I believe this is fixed but others can comment on it at will.
This for sure is not an issue anymore. Closing and if needed can be re-opened.