as suggested by Mark Johnson, I add this as a "new" bug because it does not seem to be related to #255104 after some tests. Description: 13.0 stock Kernel crashes within a few mins if ipfw has been turned on in rc.conf. If turned off in rc.conf and started later on by root manually, ipfw works flawlessly and the machine is stable for weeks! There is no fancy setup for ipfw, no divert, no nat, just plain "deny if it comes from addr x" rules. As I have been told already, I created a kernel with latest patches (including 255104) and turned on INVARIANTS. I attach the core.txt file at the end, a brief summary is here: panic: Assertion m->m_nextpkt == NULL failed at /root/src/sys/net/iflib.c:4087 cpuid = 0 time = 1620674444 KDB: stack backtrace: #0 0xffffffff80c400e5 at kdb_backtrace+0x65 #1 0xffffffff80bf5be1 at vpanic+0x181 #2 0xffffffff80bf59b3 at panic+0x43 #3 0xffffffff80d29c5b at iflib_if_transmit+0x15b #4 0xffffffff80d0fb9b at ether_output_frame+0xab #5 0xffffffff80d0faa1 at ether_output+0x6b1 #6 0xffffffff80da58ef at ip_output_send+0x8f #7 0xffffffff80da55a5 at ip_output+0x1495 #8 0xffffffff80d12350 at gif_transmit+0x2f0 #9 0xffffffff80df2b9b at ip6_forward+0x95b #10 0xffffffff80df4414 at ip6_input+0xf04 #11 0xffffffff80d2cb11 at netisr_dispatch_src+0xb1 #12 0xffffffff80d0fd3e at ether_demux+0x17e #13 0xffffffff80d113cc at ether_nh_input+0x40c #14 0xffffffff80d2cb11 at netisr_dispatch_src+0xb1 #15 0xffffffff80d10231 at ether_input+0xa1 #16 0xffffffff80d28bd7 at iflib_rxeof+0xe07 #17 0xffffffff80d2274a at _task_fn_rx+0x7a Uptime: 25s Dumping 1160 out of 32617 MB:..2%..12%..21%..31%..42%..51%..61%..71%..82%..91% __curthread () at /root/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /root/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=<optimized out>) at /root/src/sys/kern/kern_shutdown.c:399 #2 0xffffffff80bf580b in kern_reboot (howto=260) at /root/src/sys/kern/kern_shutdown.c:486 #3 0xffffffff80bf5c50 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /root/src/sys/kern/kern_shutdown.c:919 #4 0xffffffff80bf59b3 in panic (fmt=<unavailable>) at /root/src/sys/kern/kern_shutdown.c:843 #5 0xffffffff80d29c5b in iflib_if_transmit (ifp=0xfffff80003dff800, m=0xfffff8005ce3ce00) at /root/src/sys/net/iflib.c:4087 #6 0xffffffff80d0fb9b in ether_output_frame ( ifp=ifp@entry=0xfffff80003dff800, m=<unavailable>) at /root/src/sys/net/if_ethersubr.c:511 #7 0xffffffff80d0faa1 in ether_output (ifp=<optimized out>, ifp@entry=<error reading variable: value is not available>, m=<unavailable>, m@entry=<error reading variable: value is not available>, dst=0xfffffe003499c5a0, dst@entry=<error reading variable: value is not available>, ro=<optimized out>, ro@entry=<error reading variable: value is not available>) at /root/src/sys/net/if_ethersubr.c:438 #8 0xffffffff80da58ef in ip_output_send (inp=inp@entry=0x0, ifp=<unavailable>, ifp@entry=0xfffff80003dff800, m=m@entry=0xfffff8005ce3ce00, gw=gw@entry=0xfffffe003499c5a0, ro=<unavailable>, ro@entry=0x0, stamp_tag=<optimized out>) at /root/src/sys/netinet/ip_output.c:275 #9 0xffffffff80da55a5 in ip_output (m=0xfffff8005ce3ce00, m@entry=0x0, opt=opt@entry=0x0, ro=<optimized out>, ro@entry=0x0, flags=<optimized out>, flags@entry=0, imo=imo@entry=0x0, inp=<optimized out>, inp@entry=0x0) at /root/src/sys/netinet/ip_output.c:812 #10 0xffffffff80d92c59 in in_gif_output (ifp=ifp@entry=0xfffff80134802000, m=<optimized out>, m@entry=0xfffff8005cc87200, proto=<optimized out>, ecn=<optimized out>) at /root/src/sys/netinet/in_gif.c:306 #11 0xffffffff80d12350 in gif_transmit (ifp=0xfffff80134802000, m=0xfffff8005cc87200) at /root/src/sys/net/if_gif.c:380 #12 0xffffffff80df2b9b in ip6_forward (m=<unavailable>, srcrt=srcrt@entry=0) at /root/src/sys/netinet6/ip6_forward.c:387 #13 0xffffffff80df4414 in ip6_input (m=<unavailable>, m@entry=<error reading variable: value is not available>) at /root/src/sys/netinet6/ip6_input.c:897 #14 0xffffffff80d2cb11 in netisr_dispatch_src (proto=6, source=source@entry=0, m=0xfffff8005cc87200) at /root/src/sys/net/netisr.c:1143 #15 0xffffffff80d2ce5f in netisr_dispatch (proto=<unavailable>, m=<unavailable>) at /root/src/sys/net/netisr.c:1234 #16 0xffffffff80d0fd3e in ether_demux (ifp=ifp@entry=0xfffff80003dff800, m=<unavailable>) at /root/src/sys/net/if_ethersubr.c:923 #17 0xffffffff80d113cc in ether_input_internal (ifp=0xfffff80003dff800, m=<unavailable>) at /root/src/sys/net/if_ethersubr.c:709 #18 ether_nh_input (m=<optimized out>, m@entry=<error reading variable: value is not available>) at /root/src/sys/net/if_ethersubr.c:739 #19 0xffffffff80d2cb11 in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=m@entry=0xfffff8005cc87200) at /root/src/sys/net/netisr.c:1143 #20 0xffffffff80d2ce5f in netisr_dispatch (proto=<unavailable>, proto@entry=5, m=<unavailable>, m@entry=0xfffff8005cc87200) at /root/src/sys/net/netisr.c:1234 #21 0xffffffff80d10231 in ether_input (ifp=0xfffff80003dff800, ifp@entry=<error reading variable: value is not available>, m=0xfffff8005cc87200, m@entry=<error reading variable: value is not available>) at /root/src/sys/net/if_ethersubr.c:830 #22 0xffffffff80d28bd7 in iflib_rxeof (rxq=<optimized out>, rxq@entry=0xfffff80003dcc000, budget=<optimized out>) at /root/src/sys/net/iflib.c:3006 #23 0xffffffff80d2274a in _task_fn_rx (context=0xfffff80003dcc000) at /root/src/sys/net/iflib.c:3949 #24 0xffffffff80c3ea77 in gtaskqueue_run_locked ( queue=queue@entry=0xfffff80003988100) at /root/src/sys/kern/subr_gtaskqueue.c:371 #25 0xffffffff80c3e874 in gtaskqueue_thread_loop ( arg=arg@entry=0xfffffe00379de008) at /root/src/sys/kern/subr_gtaskqueue.c:547 #26 0xffffffff80bb1f00 in fork_exit ( callout=0xffffffff80c3e7e0 <gtaskqueue_thread_loop>, arg=0xfffffe00379de008, frame=0xfffffe003499cc00) at /root/src/sys/kern/kern_fork.c:1069 #27 <signal handler called> (kgdb)
Created attachment 224828 [details] crashlog full version
I was wondering what the difference is between being started by rc.conf or manually by root afterwards. Scrolling through the crashlog showed me that with rc start, the tunnel interface gif0 has not yet been created and attached. But ipfw contains (a lot) of rules for gif0 (actually it's his main job to keep the baddies from the net out of this machine and grant access only to certain services/machines). Maybe this is a hint where to look for the mbuf being NULL ???
Your panic doesn't seem related to ipfw. This backtraces shows that your host receives IPv6 packet, that was forwarded into IP-IP tunnel. Then panic was triggered by KASSERT in iflib due to packet's mbuf has unexpected non NULL m_nextpkt field.
Thanks for the analyses, but what am I supposed to do now??? This mbuf stuff only occurs, if ipfw is loaded BEFORE the tunnel is even created. (or, maybe its created just now and the first packets are coming in, can't decide, it looks like the machine survives until the tunnel starts receiving)
(In reply to Michael Meiszl from comment #4) Can you show some output prom kgdb? I think these commands should work to obtain needed info: # cd /var/crash/ # kgdb -q /boot/kernel/kernel vmcore.0 f 11 p/x *m f 8 p/x *m You also can try to patch the kernel as workaround for test: --- a/sys/netinet/ip_output.c +++ b/sys/netinet/ip_output.c @@ -807,6 +807,7 @@ ip_output(struct mbuf *m, struct mbuf *opt, struct route *ro, int flags, * Reset layer specific mbuf flags * to avoid confusing lower layers. */ + m->m_nextpkt = NULL; m_clrprotoflags(m); IP_PROBE(send, NULL, NULL, ip, ifp, ip, NULL); error = ip_output_send(inp, ifp, m, gw, ro,
The Data you have asked for: [root@l3router ~]# cd /var/crash/ [root@l3router /var/crash]# kgdb -q /boot/kernel/kernel vmcore.0 Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... __curthread () at /root/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) f 11 #11 0xffffffff80d12350 in gif_transmit (ifp=0xfffff80134802000, m=0xfffff8005cc87200) at /root/src/sys/net/if_gif.c:380 380 error = in_gif_output(ifp, m, proto, ecn); (kgdb) p/x *m $1 = {{m_next = 0x0, m_slist = {sle_next = 0x0}, m_stailq = {stqe_next = 0x0}}, {m_nextpkt = 0x0, m_slistpkt = {sle_next = 0x0}, m_stailqpkt = {stqe_next = 0x0}}, m_data = 0xfffff8005cc872a2, m_len = 0x14, m_type = 0x1, m_flags = 0x0, {{{m_pkthdr = {{ snd_tag = 0xfffff80003dff800, rcvif = 0xfffff80003dff800}, tags = {slh_first = 0x0}, len = 0x50, flowid = 0x7a3d245c, csum_flags = 0xc000000, fibnum = 0x0, numa_domain = 0xff, rsstype = 0xbf, {rcv_tstmp = 0x0, {l2hlen = 0x0, l3hlen = 0x0, l4hlen = 0x0, l5hlen = 0x0, inner_l2hlen = 0x0, inner_l3hlen = 0x0, inner_l4hlen = 0x0, inner_l5hlen = 0x0}}, PH_per = { eight = {0x0, 0x0, 0x0, 0x0, 0x1c, 0x0, 0x0, 0x0}, sixteen = {0x0, 0x0, 0x1c, 0x0}, thirtytwo = {0x0, 0x1c}, sixtyfour = { 0x1c00000000}, unintptr = {0x1c00000000}, ptr = 0x1c00000000}, PH_loc = {eight = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, sixteen = {0x0, 0x0, 0x0, 0x0}, thirtytwo = {0x0, 0x0}, sixtyfour = {0x0}, unintptr = {0x0}, ptr = 0x0}}, {m_epg_npgs = 0x0, m_epg_nrdy = 0xf8, m_epg_hdrlen = 0xdf, m_epg_trllen = 0x3, m_epg_1st_off = 0xf800, m_epg_last_len = 0xffff, m_epg_flags = 0x0, m_epg_record_type = 0x0, __spare = {0x0, 0x0}, m_epg_enc_cnt = 0x0, m_epg_tls = 0x7a3d245c00000050, m_epg_so = 0xbfff00000c000000, m_epg_seqno = 0x0, m_epg_stailq = {stqe_next = 0x1c00000000}}}, {m_ext = {{ ext_count = 0x209f36a0, ext_cnt = 0x26702e30209f36a0}, ext_size = 0x1c1fd705, ext_type = 0x86, ext_flags = 0x60dd, {{ ext_buf = 0x1203f0628000000, ext_arg2 = 0xb41d0100af707004}, {extpg_pa = {0x1203f0628000000, 0xb41d0100af707004, 0x2aa924398bd2db, 0x2f0801405014, 0xb2a1032000000000}, extpg_trail = {0x1, 0xbb, 0xbf, 0xf9, 0x9e, 0xae, 0x0, 0x0, 0x0, 0x0, 0xa0, 0x2, 0xff, 0xff, 0xb8, 0xf1, 0x0, 0x0, 0x2, 0x4, 0x5, 0xa0, 0x4, 0x2, 0x8, 0xa, 0x1, 0x74, 0xa6, 0x8b, 0x0, 0x0, 0x0, 0x0, 0x1, 0x3, 0x3, 0x6, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde}, extpg_hdr = {0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad}}}, ext_free = 0xdeadc0dedeadc0de, ext_arg1 = 0xdeadc0dedeadc0de}, m_pktdat = 0xfffff8005cc87258}}, m_dat = 0xfffff8005cc87220}} (kgdb) f 8 #8 0xffffffff80da58ef in ip_output_send (inp=inp@entry=0x0, ifp=<unavailable>, ifp@entry=0xfffff80003dff800, m=m@entry=0xfffff8005ce3ce00, gw=gw@entry=0xfffffe003499c5a0, ro=<unavailable>, ro@entry=0x0, stamp_tag=<optimized out>) at /root/src/sys/netinet/ip_output.c:275 275 error = (*ifp->if_output)(ifp, m, (const struct sockaddr *)gw, ro); (kgdb) p/x *m $2 = {{m_next = 0xdeadc0dedeadc0de, m_slist = {sle_next = 0xdeadc0dedeadc0de}, m_stailq = {stqe_next = 0xdeadc0dedeadc0de}}, { m_nextpkt = 0xdeadc0dedeadc0de, m_slistpkt = {sle_next = 0xdeadc0dedeadc0de}, m_stailqpkt = {stqe_next = 0xdeadc0dedeadc0de}}, m_data = 0xdeadc0dedeadc0de, m_len = 0xdeadc0de, m_type = 0xde, m_flags = 0xdeadc0, {{{m_pkthdr = {{snd_tag = 0xdeadc0dedeadc0de, rcvif = 0xdeadc0dedeadc0de}, tags = {slh_first = 0xdeadc0dedeadc0de}, len = 0xdeadc0de, flowid = 0xdeadc0de, csum_flags = 0xdeadc0de, fibnum = 0xc0de, numa_domain = 0xad, rsstype = 0xde, {rcv_tstmp = 0xdeadc0dedeadc0de, {l2hlen = 0xde, l3hlen = 0xc0, l4hlen = 0xad, l5hlen = 0xde, inner_l2hlen = 0xde, inner_l3hlen = 0xc0, inner_l4hlen = 0xad, inner_l5hlen = 0xde}}, PH_per = {eight = {0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde}, sixteen = {0xc0de, 0xdead, 0xc0de, 0xdead}, thirtytwo = {0xdeadc0de, 0xdeadc0de}, sixtyfour = {0xdeadc0dedeadc0de}, unintptr = {0xdeadc0dedeadc0de}, ptr = 0xdeadc0dedeadc0de}, PH_loc = {eight = {0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde}, sixteen = {0xc0de, 0xdead, 0xc0de, 0xdead}, thirtytwo = {0xdeadc0de, 0xdeadc0de}, sixtyfour = {0xdeadc0dedeadc0de}, unintptr = {0xdeadc0dedeadc0de}, ptr = 0xdeadc0dedeadc0de}}, {m_epg_npgs = 0xde, m_epg_nrdy = 0xc0, m_epg_hdrlen = 0xad, m_epg_trllen = 0xde, m_epg_1st_off = 0xc0de, m_epg_last_len = 0xdead, m_epg_flags = 0xde, m_epg_record_type = 0xc0, __spare = {0xad, 0xde}, m_epg_enc_cnt = 0xdeadc0de, m_epg_tls = 0xdeadc0dedeadc0de, m_epg_so = 0xdeadc0dedeadc0de, m_epg_seqno = 0xdeadc0dedeadc0de, m_epg_stailq = {stqe_next = 0xdeadc0dedeadc0de}}}, {m_ext = {{ext_count = 0xdeadc0de, ext_cnt = 0xdeadc0dedeadc0de}, ext_size = 0xdeadc0de, ext_type = 0xde, ext_flags = 0xdeadc0, {{ext_buf = 0xdeadc0dedeadc0de, ext_arg2 = 0xdeadc0dedeadc0de}, { extpg_pa = {0xdeadc0dedeadc0de, 0xdeadc0dedeadc0de, 0xdeadc0dedeadc0de, 0xdeadc0dedeadc0de, 0xdeadc0dedeadc0de}, extpg_trail = {0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde}, extpg_hdr = {0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad}}}, ext_free = 0xdeadc0dedeadc0de, ext_arg1 = 0xdeadc0dedeadc0de}, m_pktdat = 0xfffff8005ce3ce58}}, m_dat = 0xfffff8005ce3ce20}} (kgdb) I can't try out the kernel patches today anymore I am afraid, the machine is busy and cannot be rebooted for now. (else you might hear in the tv news tomorrow "admin got killed by upset users"). I will try them as soon as possible and report
btw: cute :-) I dont understand anything from the output, but "deadcode" sounds funny, even written in hex :-)))
(In reply to Michael Meiszl from comment #7) I think there is no need to test suggested patch. It seems the problem is due to "use after free". mbuf is already freed and memory were filled with 0xdeadc0de pattern. This is why KASSERT was triggered.
ok, as you wish. But finding a "free after use" maybe very complicated to find, sorry to have triggered it somehow. If you need more kdb, just give me advices, or I can upload the whole /var/crash files somewhere if you want to look at more details.
Just a note, in the file you have pointed out to patch, just a few lines below on 834 you find error = ip_fragment(ip, &m, mtu, ifp->if_hwassist); if (error) goto bad; for (; m; m = m0) { m0 = m->m_nextpkt; m->m_nextpkt = 0; <<<<<<<!!!! if (error == 0) { /* Record statistics for this interface address. */ if (ia != NULL) { counter_u64_add(ia->ia_ifa.ifa_opackets, 1); counter_u64_add(ia->ia_ifa.ifa_obytes, m->m_pkthdr.len); } although legal "m->m_nextpkt = 0;" does not look right. Better it should be "m->m_nextpkt = NULL;" I think. but 0 is surely not 0xdeadc0de...
after running for some days (with fw started manually) it crashed again yesterday, but on a totally different function: Unread portion of the kernel message buffer: panic: Assertion stp->st_flags == 0 failed at /root/src/sys/kern/sys_generic.c:1942 cpuid = 1 time = 1620999784 KDB: stack backtrace: #0 0xffffffff80c400e5 at kdb_backtrace+0x65 #1 0xffffffff80bf5be1 at vpanic+0x181 #2 0xffffffff80bf59b3 at panic+0x43 #3 0xffffffff80c63b20 at seltdfini+0xa0 #4 0xffffffff80bac8fa at exit1+0x49a #5 0xffffffff80bbddda at kproc_exit+0xaa #6 0xffffffff82b5116e at smb_iod_thread+0x37e #7 0xffffffff80bb1f00 at fork_exit+0x80 #8 0xffffffff8105c6ae at fork_trampoline+0xe Uptime: 2d2h27m29s I guess this has nothing to do with the main issue, but it made me revert to the original, unpatched 13.0 kernel for now. My current approach is to start the fw with a combination of cron and at: CronEntry: @reboot at -f /root/startfirewall now+3min Startfirewall script: #!/bin/sh /usr/sbin/service ipfw onestart totally simple, but it seems to work for now. I did not notice the panic yesterday so the whole net ran without fw protection for almost a day. This is not acceptable. Cron+at limit the dangerous time after a panic or reboot to 3mins
Do you have the net.link.ether.ipfw sysctl set to 1 by any chance?
(In reply to Mark Johnston from comment #12) no, it is 0 here
My suggested workaround still works flawlessly. When started "later" ipfw does not crash here. The machine now is up for weeks without any problems and it also comes back up alive after a reboot. So I did not investigate any further during the last weeks. But if you tell me to try out some new changes, I will. Sadly, here is no L2 filtering going on at all.
(In reply to Michael Meiszl from comment #14) I think to make progress on this we'd have to look at a vmcore from one of the panics. For this I'd also need a copy of the matching /boot/kernel and /usr/lib/debug/boot/kernel directories.
yeah it still happens if I switch on IPFW at boottime. Where do you want all those files to be uploaded to?
(In reply to Michael Meiszl from comment #16) I don't have a good place to upload vmcores, sorry. Google drive is used sometimes. Reading through again, I'm not sure that a vmcore will be very useful. I suspect your comment 2 is a good clue. I don't quite follow though: the firewall rules reference gif0, and the rules are loaded before gif0 is created? I would have assumed that this would not be permitted. It may be more useful to share the exact ruleset you're using.
(In reply to Mark Johnston from comment #17) No big problem, the rules are straightforward. See attached rulesfile. (I needed to blur out some IPs, they are not good for the public) Originally I had Table(1/2/3) filled by failtoban, but I converted them to static rules in the hope that the bug was inside the table management (did not work, sniff) Anyway, its a very restrictive setting. 80% of the world is locked out. For V6 only certain ports to certain hosts are offered to the outside (there is no V4 restriction on ports because that is handled by a different machine).
Created attachment 225933 [details] ruleset mam
About the load sequence: I was also suprised to see when I have checked it with "rcorder ..." But it does not seem to be important. I've tried to change the order to insure that gif0 is created and up before ipfw is loaded, it made no difference to the crashes.
one more idea where to search: this machine uses "Intel(R) Ethernet Converged Network Adapter X550-T2" Adaptor (more ONE, because it is a dual-port card) running on 10Gbe Copper Connection. After enabling the card it takes really long (>10s) to establish the link with the switch. Maybe during this time some "strange" (wrong, cut off or something else) packets come in and drive the firewall crazy??? Maybe that patch is already good for me even if I do not check for L2 packets yet??? If I get some spare time next week, I will give it a try again and see if the bug has already vanished. I will report...
Yeah Kubilay , we know that already. But so far, we think, it does not apply because I have net.link.ether.ipfw set to 0, effecitivly turning off all that L2 processing of ipfw.
After waiting some weeks (and some Kernel Patches :-) ) I gave it a new chance again this morning. IT WORKED !!! it did not crash anymore and is now up and running fine for at least 2hrs. (remember: before it did not survive for 30s after a reboot). So I guess we can close this strange case now. tnx for your patience MAM
(In reply to Michael Meiszl from comment #23) Thanks for following up. I had stared at this for quite a while but wasn't able to make much progress. To be clear, you're no longer able to reproduce the problem on recent stable/13?
yeah, since yesterday (or maybe some patches before already, I did not test again after each updates, call me lazy:-) ) i can start ipfw again from rc.conf without any crash. The machine is up for 24hrs now already, I did reboot it yesterday once more to prove that the bug is gone. Still a bit curious what triggered it, but then, I'm pragmatic, as long as it works, I don't care. I guess I can now dare to update the external production machine too (if they would crash I have no means to get to the console and start into single user mode, so I am very very careful when it comes to kernel crashes and instant reboots) MAM
(In reply to Michael Meiszl from comment #25) > I guess I can now dare to update the external production machine too (if they would crash I have no means to get to the console and start into single user mode, so I am very very careful when it comes to kernel crashes and instant reboots) If you're building kernels from source you might try installing like this: # make installkernel INSTKERNNAME=kernel.test # nextboot -k kernel.test # shutdown -r now so if the new kernel (installed to /boot/kernel.test) panics and reboots, the system will boot back into the old kernel without intervention in single-user mode. Be sure to try this before updating userland.
Tnx for your suggestions, but they are not usable for me: a) I usually don't build my own kernel (unless I was told to to activate debugging for catching the bug) b) this happened to me on the update from 12 to 13. So userland had to be updated too before it occured and then there was no way back but to throw in the complete backup c) the bug only showed up going multiuser. d) I need the network to be up and running to access the machine. The console is serveral hundred miles away from me and my arms are not that long :-))) (But the hint is good if I ever plan to run my own kernel, although I have no idea why I should use something different but GENERIC. Since additinal drivers can be added as loadable modules, there is no need for a custom kernel, at least for me)