The last CURRENT that has been really solid for me has been from the first week of June. The consistently reproducible issue I had earlier in bug 201642 has been fixed but I'm still running into intermittent issues. This was just seen on a recent build and I'm still working on a case to reproduce it. This is one `uname -a` FreeBSD xju-rtr 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r285871: Sat Jul 25 15:52:43 UTC 2015 root@xts-bsd:/usr/obj/nanobsd.soekris/i386.i386/usr/src/head/sys/GENERIC i386 panic: rw lock 0xcc0bef50 not unlocked cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper(c1538ae5,c1793ce8,eb7eae30,eb7ead90,c0a2a98d,...) at db_trace_self_wrapper+0x2a/frame 0xeb7ead60 kdb_backtrace(c170905b,1,c153365c,eb7eae30,1,...) at kdb_backtrace+0x2d/frame 0xeb7eadc8 vpanic(c153365c,eb7eae30,c153365c,eb7eae30,eb7eae30,...) at vpanic+0x117/frame 0xeb7eae00 kassert_panic(c153365c,cc0bef50,cc0bef00,cc0bef00,eb7eae50,...) at kassert_panic+0xe9/frame 0xeb7eae24 _rw_destroy(cc0bef60,cc0bef00,ccd9f180,eb7eae80,c0e30141,...) at _rw_destroy+0x40/frame 0xeb7eae3c ipsec_delisr(cc0bef00,c1565abd,49d,3f2,eb7eae80,...) at ipsec_delisr+0x15/frame 0xeb7eae50 _key_freesp(cc0bef4c,c1567acf,3f3,baf5ab2c,7e951cde,...) at _key_freesp+0xb1/frame 0xeb7eae80 esp_output_cb(cc658a14,41,eb7eaf38,c0e52170,cc350800,...) at esp_output_cb+0x4fc/frame 0xeb7eaf08 crypto_done(cc658a14,cc350800,fc,20,eb7eafc0,...) at crypto_done+0xa3/frame 0xeb7eaf38 swcr_process(c7197200,cc658a14,0,c23ba700,0,...) at swcr_process+0xca/frame 0xeb7eb5d0 crypto_invoke(0,0,c17d23a8,c7ea29a0,cc3e1888,...) at crypto_invoke+0x92/frame 0xeb7eb604 crypto_dispatch(cc658a14,c1567acf,363,eb7eb683,c719f888,...) at crypto_dispatch+0x78/frame 0xeb7eb620 esp_output(cc350800,cc0bef00,0,14,9,...) at esp_output+0xa03/frame 0xeb7eb6d8 ipsec4_process_packet(cc350800,cc0bef00,eb7eb870,0,ccd9f180,...) at ipsec4_process_packet+0x226/frame 0xeb7eb7d0 ip_ipsec_output(eb7eb874,0,eb7eb870,4a3e5,eb7eb8c0,...) at ip_ipsec_output+0xc9/frame 0xeb7eb7f8 ip_output(cc350800,0,0,1,0,...) at ip_output+0xb74/frame 0xeb7eb8b4 in_gre_output(cc350800,2,18,373,35b,...) at in_gre_output+0xad/frame 0xeb7eb8e0 gre_transmit(cb442c00,cc350800,4,1,2,...) at gre_transmit+0x30f/frame 0xeb7eb948 gre_output(cb442c00,cc350800,c7934bd0,eb7eb9f8,cb415000,...) at gre_output+0x1a5/frame 0xeb7eb970 ip_output(cc350800,0,0,0,0,...) at ip_output+0x18da/frame 0xeb7eba30 udp_send(c9f298e8,0,cc350800,cce47000,0,...) at udp_send+0xb97/frame 0xeb7ebaec sosend_dgram(c9f298e8,cce47000,eb7ebb70,0,0,...) at sosend_dgram+0x39a/frame 0xeb7ebb30 kern_sendit(c7e6f660,7,eb7ebbf8,0,0,...) at kern_sendit+0x298/frame 0xeb7ebba4 sendit(eb7ebbf8,0,bfbfd20a,0,cce47000,...) at sendit+0x131/frame 0xeb7ebbe0 sys_sendto(c7e6f660,eb7ebca8,c152c1b1,4,c1960a10,...) at sys_sendto+0x58/frame 0xeb7ebc18 syscall(eb7ebce8) at syscall+0x33e/frame 0xeb7ebcdc Xint0x80_syscall() at Xint0x80_syscall+0x2c/frame 0xeb7ebcdc --- syscall (133, FreeBSD ELF32, sys_sendto), eip = 0x2818ecbf, esp = 0xbfbfcf94, ebp = 0xbfbfd644 --- KDB: enter: panic [ thread pid 53412 tid 100133 ] Stopped at kdb_enter+0x3d: movl $0,kdb_why
Probably you did `setkey -FP` while processing some IPSEC request. There is possible race.
(In reply to Andrey V. Elsukov from comment #1) That could very well be the case. The basic background is Strongswan is maintaining a transport mode IPSEC router to router link for GRE traffic only. This could have been when Strongswan was behind the scenes doing a rekey. I did see another panic before doing an `ifconfig gre0 down` for the time being. I'm going to see if the manual `setkey -FP` when there is traffic over the link triggers a panic. Fatal double fault: eip = 0xc0c35e49 esp = 0xe23a8fec ebp = 0xe23a901c cpuid = 1; apic id = 01 panic: double fault cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper(c1538ae5,c1793ce8,c19e545c,c19e53d0,c0a2a98d,...) at db_trace_self_wrapper+0x2a/frame 0xc19e53a0 kdb_backtrace(c170905b,1,c16ff6e2,c19e545c,1,...) at kdb_backtrace+0x2d/frame 0xc19e5408 vpanic(c16ff6e2,c19e545c,c19e545c,c19e5474,c1159ee6,...) at vpanic+0x117/frame 0xc19e543c panic(c16ff6e2,1,1,1,e23a901c,...) at panic+0x1b/frame 0xc19e5450 dblfault_handler() at dblfault_handler+0xa6/frame 0xc19e5450 --- trap 0x17, eip = 0xc0c35e49, esp = 0xe23a8fec, ebp = 0xe23a901c --- sched_add(c7e69cc0,0,c1536ae4,7e9,3c22,2710) at sched_add+0x59/frame 0xe23a901c sched_wakeup(c7e69cc0,4,c1534db8,1ef,c7e69cc0,...) at sched_wakeup+0x188/frame 0xe23a9048 setrunnable(c7e69cc0,4,c153a711,2da,c1a28a10,...) at setrunnable+0x77/frame 0xe23a9068 sleepq_resume_thread(0,4,c153a711,10b,c1a28ac4,...) at sleepq_resume_thread+0x1d9/frame 0xe23a9094 sleepq_timeout(c7e69cc0,0,c1535b0a,2aa,0,...) at sleepq_timeout+0x121/frame 0xe23a90b8 softclock_call_cc(1,2,c1535b0a,1bb,e23a91ac,...) at softclock_call_cc+0x191/frame 0xe23a9140 callout_process(78d9306d,1d52,c1162107,0,e23a926c,...) at callout_process+0x1cc/frame 0xe23a91ac handleevents(0,c764f660,0,c764f660,e23a9228,...) at handleevents+0x25c/frame 0xe23a91f0 timercb(c1a1ce84,0,1,1,15aa349c,...) at timercb+0x267/frame 0xe23a9240 lapic_handle_timer(e23a926c) at lapic_handle_timer+0x7f/frame 0xe23a9260 Xtimerint() at Xtimerint+0x2b/frame 0xe23a9260 --- interrupt, eip = 0xc054a498, esp = 0xe23a92ac, ebp = 0xe23a9408 --- _libmd_SHA256_Transform(e23a9934,e23a9460,3c,80,84,...) at _libmd_SHA256_Transform+0xcf8/frame 0xe23a9408 _libmd_SHA256_Update(e23a9908,e23a9460,100,106,0,...) at _libmd_SHA256_Update+0xb9/frame 0xe23a9438 random_fortuna_pre_read(0,c719f7b8,c71a9898,c199fc40,c71a9898,...) at random_fortuna_pre_read+0x20a/frame 0xe23a9988 read_random(e23a9ae0,100,0,0,210,...) at read_random+0x2b/frame 0xe23a9ab0 arc4rand(e23a9c70,10,0,20,e23a9c80,...) at arc4rand+0x77/frame 0xe23a9bf8 swcr_process(c7197200,cc3e9c6c,0,c23ba700,0,...) at swcr_process+0x4ed/frame 0xe23aa2c0 crypto_invoke(0,0,c17d23a8,c7ea4a40,cc940150,...) at crypto_invoke+0x92/frame 0xe23aa2f4 crypto_dispatch(cc3e9c6c,c1567acf,363,e23aa373,c719fd00,...) at crypto_dispatch+0x78/frame 0xe23aa310 esp_output(cbc96300,cc253b00,0,14,9,...) at esp_output+0xa03/frame 0xe23aa3c8 ipsec4_process_packet(cbc96300,cc253b00,e23aa560,0,cce89c80,...) at ipsec4_process_packet+0x226/frame 0xe23aa4c0 ip_ipsec_output(e23aa564,0,e23aa560,c78e5990,e23aa570,...) at ip_ipsec_output+0xc9/frame 0xe23aa4e8 ip_output(cbc96300,0,0,1,0,...) at ip_output+0xb74/frame 0xe23aa5a4 in_gre_output(cbc96300,2,18,373,35b,...) at in_gre_output+0xad/frame 0xe23aa5d0 gre_transmit(ca1b5800,cbc96300,4,1,2,...) at gre_transmit+0x30f/frame 0xe23aa638 gre_output(ca1b5800,cbc96300,c7934b90,e23aa838,cc49a300,...) at gre_output+0x1a5/frame 0xe23aa660 ip_output(cbc96300,0,e23aa838,0,0,...) at ip_output+0x18da/frame 0xe23aa720 tcp_output(cc9a5c20,1,c155db2d,769,cce89e80,...) at tcp_output+0x226e/frame 0xe23aa898 tcp_do_segment(cc47c390,cc9a5c20,34,554,0,...) at tcp_do_segment+0x3352/frame 0xe23aa948 tcp_input(e23aaa94,e23aaa90,6,c7934fa8,a,...) at tcp_input+0x100d/frame 0xe23aaa48 ip_input(cb62e200,c0c8b39d,c7934600,c17bd7a8,c7934600,...) at ip_input+0x2a5/frame 0xe23aaab4 netisr_dispatch_src(1,0,cb62e200) at netisr_dispatch_src+0xab/frame 0xe23aaaf8 netisr_dispatch(1,cb62e200,588,0,14,...) at netisr_dispatch+0x20/frame 0xe23aab0c gre_input(e23aabe4,e23aabe0,2f,1,14,...) at gre_input+0x224/frame 0xe23aab40 encap4_input(e23aabe4,e23aabe0,2f,c1960a00,c1960a10,...) at encap4_input+0x1e2/frame 0xe23aab98 ip_input(cb62e200,0,c154c9dd,2f3,1,...) at ip_input+0x2a5/frame 0xe23aac08 swi_net(c2025880,c152c859,560,8a3d8a50,c763e0c8,...) at swi_net+0x1a3/frame 0xe23aac48 intr_event_execute_handlers(c1960a10,c763e080,c152c859,560,c1960a00,...) at intr_event_execute_handlers+0xde/frame 0xe23aac70 ithread_loop(c7643410,e23aace8,c152c5cd,3e6,0,...) at ithread_loop+0x90/frame 0xe23aacac fork_exit(c0bd1ff0,c7643410,e23aace8) at fork_exit+0x7e/frame 0xe23aacd4 fork_trampoline() at fork_trampoline+0x8/frame 0xe23aacd4 --- trap 0, eip = 0, esp = 0xe23aad20, ebp = 0 --- KDB: enter: panic [ thread pid 12 tid 100006 ] Stopped at kdb_enter+0x3d: movl $0,kdb_why db>
(In reply to Andrey V. Elsukov from comment #1) gre0 is still down, but I did notice an LOR doing a setkey -DP. The LOR was not reproducible on subsequent setkey -DP runs. lock order reversal: 1st 0xc19d043c sptree (sptree) @ /usr/src/head/sys/netipsec/key.c:2403 2nd 0xc1a2bdbc rawcb (rawcb) @ /usr/src/head/sys/netipsec/keysock.c:286 KDB: stack backtrace: db_trace_self_wrapper(c1538ae5,eb7b000a,c0bedbfb,c7603400,8,...) at db_trace_self_wrapper+0x2a/frame 0xeb7b6748 kdb_backtrace(c153c771,c1a2bdbc,c154d495,c71a2e90,c1567253,...) at kdb_backtrace+0x2d/frame 0xeb7b67b0 witness_checkorder(c1a2bdbc,9,c1567253,11e,0,...) at witness_checkorder+0xd6e/frame 0xeb7b6804 __mtx_lock_flags(c1a2bdcc,0,c1567253,11e,0,...) at __mtx_lock_flags+0x99/frame 0xeb7b6838 key_sendup_mbuf(cc8261c8,cb507400,0,c1565abd,963,...) at key_sendup_mbuf+0x1da/frame 0xeb7b6888 key_spddump(cc8261c8,cc0f4a00,eb7b6908,c199dfb8,10,...) at key_spddump+0x14d/frame 0xeb7b68e0 key_parse(cc0f4a00,cc8261c8,eb7b6a88,cc0f4aac,10,...) at key_parse+0xedf/frame 0xeb7b6a68 key_output(cc0f4a00,cc8261c8) at key_output+0x24a/frame 0xeb7b6aa4 raw_usend(cc8261c8,0,cc0f4a00,0,0,...) at raw_usend+0x49/frame 0xeb7b6abc sosend_generic(cc8261c8,0,eb7b6b70,0,0,...) at sosend_generic+0x423/frame 0xeb7b6b30 kern_sendit(c78e6660,3,eb7b6bf8,0,0,...) at kern_sendit+0x298/frame 0xeb7b6ba4 sendit(eb7b6bf8,0,bfbfece0,0,0,...) at sendit+0x131/frame 0xeb7b6be0 sys_sendto(c78e6660,eb7b6ca8,c152c1b1,4,c1161a77,...) at sys_sendto+0x58/frame 0xeb7b6c18 syscall(eb7b6ce8) at syscall+0x33e/frame 0xeb7b6cdc Xint0x80_syscall() at Xint0x80_syscall+0x2c/frame 0xeb7b6cdc --- syscall (133, FreeBSD ELF32, sys_sendto), eip = 0x2818acbf, esp = 0xbfbf6c3c, ebp = 0xbfbf6c60 --- For reference, this is the result of the setkey -DP x.x.246.90[any] x.x.115.49[any] gre in ipsec esp/transport//unique:1 created: Jul 26 13:38:57 2015 lastused: Jul 26 13:38:57 2015 lifetime: 2147483647(s) validtime: 0(s) spid=136 seq=1 pid=66946 refcnt=1 x.x.115.49[any] x.x.246.90[any] gre out ipsec esp/transport//unique:1 created: Jul 26 13:38:57 2015 lastused: Jul 26 13:38:57 2015 lifetime: 2147483647(s) validtime: 0(s) spid=135 seq=0 pid=66946 refcnt=1
I've had some time to get back to this and have found something interesting. This is on the below `uname -a`: FreeBSD xju-rtr 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r286781: Fri Aug 14 15:38:29 UTC 2015 root@xts-bsd:/usr/obj/nanobsd.soekris/i386.i386/usr/src/head/sys/GENERIC i386 This configuration seems stable for me. /etc/pf.conf #scrub on $wan_phy all no-df random-id fragment reassemble scrub on $wan_phy all no-df fragment reassemble /etc/sysctl.conf #net.inet.ip.random_id="1" With both the random_id sysctl AND a random-id scrub rule I see a panic within a minute of doing a heavy rsync. With just one random IPID measure, I was able to do and rsync for about 10-15 minutes without seeing a panic. Here's another panic backtrace with both enabled. Now that I have some things of interest, I am going to try one config for a day each to evaluate if it is stable over time since my individual test cases were only for about 10-15 minutes of passing traffic. Fatal double fault: eip = 0xc054980c esp = 0xeb5a2ffc ebp = 0xeb5a3158 cpuid = 0; apic id = 00 panic: double fault cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper(c153b57d,c1796cb4,c19e865c,c19e85d0,c0a2aa0d,...) at db_trace_self_wrapper+0x2a/frame 0xc19e85a0 kdb_backtrace(c170be7b,0,c1702502,c19e865c,0,...) at kdb_backtrace+0x2d/frame 0xc19e8608 vpanic(c1702502,c19e865c,c19e865c,c19e8674,c115c696,...) at vpanic+0x117/frame 0xc19e863c panic(c1702502,0,0,0,eb5a3158,...) at panic+0x1b/frame 0xc19e8650 dblfault_handler() at dblfault_handler+0xa6/frame 0xc19e8650 --- trap 0x17, eip = 0xc054980c, esp = 0xeb5a2ffc, ebp = 0xeb5a3158 --- _libmd_SHA256_Transform(c18b1774,eb5a319c,8,c19a2fa0,aa55aa55,...) at _libmd_SHA256_Transform+0xbc/frame 0xeb5a3158 _libmd_SHA256_Update(c18b1714,eb5a319c,8,0,809a0000,...) at _libmd_SHA256_Update+0x78/frame 0xeb5a3188 _libmd_SHA256_Final(eb5a35e8,c18b1714) at _libmd_SHA256_Final+0x8a/frame 0xeb5a31b0 randomdev_hash_finish(c18b1714,eb5a35e8,c14dfc14,161,aa55aa55,...) at randomdev_hash_finish+0x18/frame 0xeb5a31c0 random_fortuna_pre_read(2811818,ffffffff,58f,bd95c999,bd90b689,...) at random_fortuna_pre_read+0x124/frame 0xeb5a3710 read_random(eb5a3868,100,c154a471,8c,8,...) at read_random+0x2b/frame 0xeb5a3838 arc4rand(eb5a39a0,2,0,db,2,...) at arc4rand+0x77/frame 0xeb5a3980 ip_fillid(cc1083e8,c7ecba00,c9fe4bbc,4a0,c7190da8,...) at ip_fillid+0x93/frame 0xeb5a39c0 pf_normalize_ip(eb5a3c1c,2,c7ecba00,eb5a3b80,eb5a3b28,...) at pf_normalize_ip+0x1062/frame 0xeb5a3aa0 pf_test(2,c760bc00,eb5a3c1c,0,c1a2f344,...) at pf_test+0x216/frame 0xeb5a3bcc pf_check_out(0,eb5a3c1c,c760bc00,2,0,...) at pf_check_out+0x4b/frame 0xeb5a3bf0 pfil_run_hooks(c1a2f344,eb5a3cec,c760bc00,2,0,...) at pfil_run_hooks+0x9f/frame 0xeb5a3c50 ip_output(cc108300,0,0,2,0,...) at ip_output+0xb05/frame 0xeb5a3d30 ipsec_process_done(cc108300,cc4fd680,3a1,f5276691,df38d9c3,...) at ipsec_process_done+0x29a/frame 0xeb5a3d80 esp_output_cb(cc694c30,41,eb5a3e38,c0e545c0,cc108300,...) at esp_output_cb+0x4a6/frame 0xeb5a3e08 crypto_done(cc694c30,cc108300,5bc,10,eb5a3ec0,...) at crypto_done+0xa3/frame 0xeb5a3e38 swcr_process(c76cdc00,cc694c30,0,c23bbc40,0,...) at swcr_process+0xca/frame 0xeb5a4508 crypto_invoke(0,0,c7622600,9,cc698738,...) at crypto_invoke+0x92/frame 0xeb5a453c crypto_dispatch(cc694c30,c156a7f6,36e,eb5a45bb,3,...) at crypto_dispatch+0x78/frame 0xeb5a4558 esp_output(cc108300,cc4fd680,0,14,9,...) at esp_output+0x9d2/frame 0xeb5a4610 ipsec4_process_packet(cc108300,cc4fd680,eb5a47ac,0,cc5ef380,...) at ipsec4_process_packet+0x226/frame 0xeb5a4708 ip_ipsec_output(eb5a47cc,0,eb5a47ac,c0bee61b,c7605400,...) at ip_ipsec_output+0xc9/frame 0xeb5a4730 ip_output(cc108300,0,0,1,0,...) at ip_output+0xa96/frame 0xeb5a480c in_gre_output(cc108300,2,18,373,35b,...) at in_gre_output+0xad/frame 0xeb5a4838 gre_transmit(c7f5dc00,cbd3b200,4,cbd3b200,2,...) at gre_transmit+0x30f/frame 0xeb5a48a0 gre_output(c7f5dc00,cbd3b200,c7b9bbf0,eb5a49f8,0,...) at gre_output+0x1a5/frame 0xeb5a48c8 ip_output(cbd3b200,0,eb5a49f8,1,0,...) at ip_output+0x191d/frame 0xeb5a49a8 ip_forward(cbd3b200,0,28e,1,0,...) at ip_forward+0x425/frame 0xeb5a4a30 ip_input(cbd3b200,c7872330,c7e7021c,c7e7020c,cbd3b200,...) at ip_input+0xa98/frame 0xeb5a4a9c netisr_dispatch_src(1,0,cbd3b200) at netisr_dispatch_src+0xab/frame 0xeb5a4ae0 netisr_dispatch(1,cbd3b200,1,c760a800,c7e70200,cbd3b200) at netisr_dispatch+0x20/frame 0xeb5a4af4 ether_demux(c760a800,cbd3b200,6,0,8,...) at ether_demux+0x18d/frame 0xeb5a4b20 ether_nh_input(cbd3b200,801,246,eb5a4bac,cbd41800,...) at ether_nh_input+0x377/frame 0xeb5a4b4c netisr_dispatch_src(5,0,cbd3b200) at netisr_dispatch_src+0xab/frame 0xeb5a4b90 netisr_dispatch(5,cbd3b200,c7896920,1,c7896900,...) at netisr_dispatch+0x20/frame 0xeb5a4ba4 ether_input(c760c800,cbd3b200,eb5a4c2c,c06f4bbc,c760c800,...) at ether_input+0x4f/frame 0xeb5a4bc0 if_input(c760c800,cbd3b200,c1253435,1239,c78a3000,...) at if_input+0x19/frame 0xeb5a4bd0 em_rxeof(c760c800,c1963510,0,c74dab00,c7519b80,...) at em_rxeof+0x3bc/frame 0xeb5a4c2c em_msix_rx(c7896900,c152f3b5,560,1f3619f8,c7519bc8,...) at em_msix_rx+0x2f/frame 0xeb5a4c48 intr_event_execute_handlers(c1963510,c7519b80,c152f3b5,560,c1963500,...) at intr_event_execute_handlers+0xde/frame 0xeb5a4c70 ithread_loop(c7698ed0,eb5a4ce8,c152f129,3e6,0,...) at ithread_loop+0x90/frame 0xeb5a4cac fork_exit(c0bd2a20,c7698ed0,eb5a4ce8) at fork_exit+0x7e/frame 0xeb5a4cd4 fork_trampoline() at fork_trampoline+0x8/frame 0xeb5a4cd4 --- trap 0, eip = 0, esp = 0xeb5a4d20, ebp = 0 --- KDB: enter: panic [ thread pid 12 tid 100072 ] Stopped at kdb_enter+0x3d: movl $0,kdb_why db>
After the couple hours of uptime with both the random_id options turn off, I got a panic with the original message: "panic: rw lock 0xcb208a50 not unlocked" It would seem there are two issues going on here. I am going to keep the random stuff off to focus on one thing at a time. Any ideas on what would be beneficial to help to debug this?
Seeing bug 202667 I got the idea to try amd64 since the Soekris does support it with device mptable is in the kernel config. Without all the INVARIANTS and other debugging options I didn't see any panics for 10 days on both routers. I switched one to be GENERIC with device mptable added to get all the debugging and still see the same basic panic. panic: rw lock 0xfffff8000a37b660 not unlocked cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00970cb930 vpanic() at vpanic+0x189/frame 0xfffffe00970cb9b0 kassert_panic() at kassert_panic+0x132/frame 0xfffffe00970cba20 _rw_destroy() at _rw_destroy+0x42/frame 0xfffffe00970cba40 ipsec_delisr() at ipsec_delisr+0x12/frame 0xfffffe00970cba60 _key_freesp() at _key_freesp+0xc1/frame 0xfffffe00970cbaa0 esp_output_cb() at esp_output_cb+0x1d8/frame 0xfffffe00970cbb30 swcr_process() at swcr_process+0xde/frame 0xfffffe00970cc2a0 crypto_dispatch() at crypto_dispatch+0x82/frame 0xfffffe00970cc2c0 esp_output() at esp_output+0x850/frame 0xfffffe00970cc3b0 ipsec4_process_packet() at ipsec4_process_packet+0x222/frame 0xfffffe00970cc4a0 ip_ipsec_output() at ip_ipsec_output+0xb1/frame 0xfffffe00970cc4e0 ip_output() at ip_output+0x96d/frame 0xfffffe00970cc620 gre_transmit() at gre_transmit+0x2e4/frame 0xfffffe00970cc6c0 gre_output() at gre_output+0x183/frame 0xfffffe00970cc700 ip_output() at ip_output+0x12f1/frame 0xfffffe00970cc840 udp_send() at udp_send+0xaab/frame 0xfffffe00970cc920 sosend_dgram() at sosend_dgram+0x380/frame 0xfffffe00970cc990 kern_sendit() at kern_sendit+0x211/frame 0xfffffe00970cca40 sendit() at sendit+0x129/frame 0xfffffe00970cca90 sys_sendto() at sys_sendto+0x4d/frame 0xfffffe00970ccae0 amd64_syscall() at amd64_syscall+0x2d6/frame 0xfffffe00970ccbf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe00970ccbf0 --- syscall (133, FreeBSD ELF64, sys_sendto), rip = 0x20b6aa7a, rsp = 0x7fffffffcce8, rbp = 0x7fffffffd420 --- KDB: enter: panic [ thread pid 55975 tid 100107 ] The PID listed was syslogd. 55975 1 55975 0 Rs CPU 0 syslogd I tried to do some stress testing with a while true/logger loop but wasn't able to trigger a panic.
Ermal, it looks like I have introduced this problem, so I' taking this PR.
It seems the crypto callback releases the last reference to SP (probably IKE daemon has removed security policy from userland). But since ISR is locked by ipsec4_process_packet() you get a panic triggered by KASSERT. Do you use IKE daemon?
(In reply to Andrey V. Elsukov from comment #8) I am using strongswan-5.3.2
Created attachment 161220 [details] Proposed patch (untested) Change the way how we protect from removing SP. Take one reference before calling ipsec[46]_process_packet. This protects us from removing SP while we preparing crypto request. We release this reference just after crypto request will be queued. Then we take one reference when scheduling crypto request. This protects us from removing SP while crypto request will be processed. We will release it crypto callback function.
Created attachment 161221 [details] Proposed patch (untested) There is no need to take extra reference for bundled SA, since we still hold reference in the crypto callback.
(In reply to Andrey V. Elsukov from comment #11) Thanks! I just put this later patch in place. The only thing is I will be away from home for a couple days so I won't be able to provide more feedback for a bit. For clarity as well this is my home router with a single IPSec transport mode connection for GRE traffic to a family member. If it's been panicing for me in this small use case hopefully we can get to the bottom before this gets used in production environments. Thanks again!
(In reply to Andrey V. Elsukov from comment #11) I did not have a panic while I was away. I updated the other side of the link so both are running r288061M with the patch in comment 11. Now that I am home again I will continue to monitor as well as throw some more traffic across the link.
A commit references this bug: Author: ae Date: Wed Sep 30 08:16:36 UTC 2015 New revision: 288418 URL: https://svnweb.freebsd.org/changeset/base/288418 Log: Take extra reference to security policy before calling crypto_dispatch(). Currently we perform crypto requests for IPSEC synchronous for most of crypto providers (software, aesni) and only VIA padlock calls crypto callback asynchronous. In synchronous mode it is possible, that security policy will be removed during the processing crypto request. And crypto callback will release the last reference to SP. Then upon return into ipsec[46]_process_packet() IPSECREQUEST_UNLOCK() will be called to already freed request. To prevent this we will take extra reference to SP. PR: 201876 Sponsored by: Yandex LLC Changes: head/sys/netinet/ip_ipsec.c head/sys/netinet6/ip6_ipsec.c head/sys/netipsec/ipsec_output.c head/sys/netipsec/xform_ah.c head/sys/netipsec/xform_esp.c head/sys/netipsec/xform_ipcomp.c