Bug 201876 - panic: rw lock 0xcc0bef50 not unlocked (CURRENT w/IPSEC + if_gre)
Summary: panic: rw lock 0xcc0bef50 not unlocked (CURRENT w/IPSEC + if_gre)
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Andrey V. Elsukov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-25 20:15 UTC by Jason Unovitch
Modified: 2015-09-30 08:27 UTC (History)
3 users (show)

See Also:


Attachments
Proposed patch (untested) (3.40 KB, patch)
2015-09-21 03:40 UTC, Andrey V. Elsukov
no flags Details | Diff
Proposed patch (untested) (3.53 KB, patch)
2015-09-21 06:24 UTC, Andrey V. Elsukov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jason Unovitch freebsd_committer freebsd_triage 2015-07-25 20:15:53 UTC
The last CURRENT that has been really solid for me has been from the first week of June.  The consistently reproducible issue I had earlier in bug 201642 has been fixed but I'm still running into intermittent issues.  This was just seen on a recent build and I'm still working on a case to reproduce it.

This is one `uname -a`
FreeBSD xju-rtr 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r285871: Sat Jul 25 15:52:43 UTC 2015 root@xts-bsd:/usr/obj/nanobsd.soekris/i386.i386/usr/src/head/sys/GENERIC  i386


panic: rw lock 0xcc0bef50 not unlocked
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper(c1538ae5,c1793ce8,eb7eae30,eb7ead90,c0a2a98d,...) at db_trace_self_wrapper+0x2a/frame 0xeb7ead60
kdb_backtrace(c170905b,1,c153365c,eb7eae30,1,...) at kdb_backtrace+0x2d/frame 0xeb7eadc8
vpanic(c153365c,eb7eae30,c153365c,eb7eae30,eb7eae30,...) at vpanic+0x117/frame 0xeb7eae00
kassert_panic(c153365c,cc0bef50,cc0bef00,cc0bef00,eb7eae50,...) at kassert_panic+0xe9/frame 0xeb7eae24
_rw_destroy(cc0bef60,cc0bef00,ccd9f180,eb7eae80,c0e30141,...) at _rw_destroy+0x40/frame 0xeb7eae3c
ipsec_delisr(cc0bef00,c1565abd,49d,3f2,eb7eae80,...) at ipsec_delisr+0x15/frame 0xeb7eae50
_key_freesp(cc0bef4c,c1567acf,3f3,baf5ab2c,7e951cde,...) at _key_freesp+0xb1/frame 0xeb7eae80
esp_output_cb(cc658a14,41,eb7eaf38,c0e52170,cc350800,...) at esp_output_cb+0x4fc/frame 0xeb7eaf08
crypto_done(cc658a14,cc350800,fc,20,eb7eafc0,...) at crypto_done+0xa3/frame 0xeb7eaf38
swcr_process(c7197200,cc658a14,0,c23ba700,0,...) at swcr_process+0xca/frame 0xeb7eb5d0
crypto_invoke(0,0,c17d23a8,c7ea29a0,cc3e1888,...) at crypto_invoke+0x92/frame 0xeb7eb604
crypto_dispatch(cc658a14,c1567acf,363,eb7eb683,c719f888,...) at crypto_dispatch+0x78/frame 0xeb7eb620
esp_output(cc350800,cc0bef00,0,14,9,...) at esp_output+0xa03/frame 0xeb7eb6d8
ipsec4_process_packet(cc350800,cc0bef00,eb7eb870,0,ccd9f180,...) at ipsec4_process_packet+0x226/frame 0xeb7eb7d0
ip_ipsec_output(eb7eb874,0,eb7eb870,4a3e5,eb7eb8c0,...) at ip_ipsec_output+0xc9/frame 0xeb7eb7f8
ip_output(cc350800,0,0,1,0,...) at ip_output+0xb74/frame 0xeb7eb8b4
in_gre_output(cc350800,2,18,373,35b,...) at in_gre_output+0xad/frame 0xeb7eb8e0
gre_transmit(cb442c00,cc350800,4,1,2,...) at gre_transmit+0x30f/frame 0xeb7eb948
gre_output(cb442c00,cc350800,c7934bd0,eb7eb9f8,cb415000,...) at gre_output+0x1a5/frame 0xeb7eb970
ip_output(cc350800,0,0,0,0,...) at ip_output+0x18da/frame 0xeb7eba30
udp_send(c9f298e8,0,cc350800,cce47000,0,...) at udp_send+0xb97/frame 0xeb7ebaec
sosend_dgram(c9f298e8,cce47000,eb7ebb70,0,0,...) at sosend_dgram+0x39a/frame 0xeb7ebb30
kern_sendit(c7e6f660,7,eb7ebbf8,0,0,...) at kern_sendit+0x298/frame 0xeb7ebba4
sendit(eb7ebbf8,0,bfbfd20a,0,cce47000,...) at sendit+0x131/frame 0xeb7ebbe0
sys_sendto(c7e6f660,eb7ebca8,c152c1b1,4,c1960a10,...) at sys_sendto+0x58/frame 0xeb7ebc18
syscall(eb7ebce8) at syscall+0x33e/frame 0xeb7ebcdc
Xint0x80_syscall() at Xint0x80_syscall+0x2c/frame 0xeb7ebcdc
--- syscall (133, FreeBSD ELF32, sys_sendto), eip = 0x2818ecbf, esp = 0xbfbfcf94, ebp = 0xbfbfd644 ---
KDB: enter: panic
[ thread pid 53412 tid 100133 ]
Stopped at      kdb_enter+0x3d: movl    $0,kdb_why
Comment 1 Andrey V. Elsukov freebsd_committer freebsd_triage 2015-07-26 11:40:20 UTC
Probably you did `setkey -FP` while processing some IPSEC request. There is possible race.
Comment 2 Jason Unovitch freebsd_committer freebsd_triage 2015-07-26 13:51:21 UTC
(In reply to Andrey V. Elsukov from comment #1)

That could very well be the case.  The basic background is Strongswan is maintaining a transport mode IPSEC router to router link for GRE traffic only.  This could have been when Strongswan was behind the scenes doing a rekey.

I did see another panic before doing an `ifconfig gre0 down` for the time being.  I'm going to see if the manual `setkey -FP` when there is traffic over the link triggers a panic.

Fatal double fault:
eip = 0xc0c35e49
esp = 0xe23a8fec
ebp = 0xe23a901c
cpuid = 1; apic id = 01
panic: double fault
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper(c1538ae5,c1793ce8,c19e545c,c19e53d0,c0a2a98d,...) at db_trace_self_wrapper+0x2a/frame 0xc19e53a0
kdb_backtrace(c170905b,1,c16ff6e2,c19e545c,1,...) at kdb_backtrace+0x2d/frame 0xc19e5408
vpanic(c16ff6e2,c19e545c,c19e545c,c19e5474,c1159ee6,...) at vpanic+0x117/frame 0xc19e543c
panic(c16ff6e2,1,1,1,e23a901c,...) at panic+0x1b/frame 0xc19e5450
dblfault_handler() at dblfault_handler+0xa6/frame 0xc19e5450
--- trap 0x17, eip = 0xc0c35e49, esp = 0xe23a8fec, ebp = 0xe23a901c ---
sched_add(c7e69cc0,0,c1536ae4,7e9,3c22,2710) at sched_add+0x59/frame 0xe23a901c
sched_wakeup(c7e69cc0,4,c1534db8,1ef,c7e69cc0,...) at sched_wakeup+0x188/frame 0xe23a9048
setrunnable(c7e69cc0,4,c153a711,2da,c1a28a10,...) at setrunnable+0x77/frame 0xe23a9068
sleepq_resume_thread(0,4,c153a711,10b,c1a28ac4,...) at sleepq_resume_thread+0x1d9/frame 0xe23a9094
sleepq_timeout(c7e69cc0,0,c1535b0a,2aa,0,...) at sleepq_timeout+0x121/frame 0xe23a90b8
softclock_call_cc(1,2,c1535b0a,1bb,e23a91ac,...) at softclock_call_cc+0x191/frame 0xe23a9140
callout_process(78d9306d,1d52,c1162107,0,e23a926c,...) at callout_process+0x1cc/frame 0xe23a91ac
handleevents(0,c764f660,0,c764f660,e23a9228,...) at handleevents+0x25c/frame 0xe23a91f0
timercb(c1a1ce84,0,1,1,15aa349c,...) at timercb+0x267/frame 0xe23a9240
lapic_handle_timer(e23a926c) at lapic_handle_timer+0x7f/frame 0xe23a9260
Xtimerint() at Xtimerint+0x2b/frame 0xe23a9260
--- interrupt, eip = 0xc054a498, esp = 0xe23a92ac, ebp = 0xe23a9408 ---
_libmd_SHA256_Transform(e23a9934,e23a9460,3c,80,84,...) at _libmd_SHA256_Transform+0xcf8/frame 0xe23a9408
_libmd_SHA256_Update(e23a9908,e23a9460,100,106,0,...) at _libmd_SHA256_Update+0xb9/frame 0xe23a9438
random_fortuna_pre_read(0,c719f7b8,c71a9898,c199fc40,c71a9898,...) at random_fortuna_pre_read+0x20a/frame 0xe23a9988
read_random(e23a9ae0,100,0,0,210,...) at read_random+0x2b/frame 0xe23a9ab0
arc4rand(e23a9c70,10,0,20,e23a9c80,...) at arc4rand+0x77/frame 0xe23a9bf8
swcr_process(c7197200,cc3e9c6c,0,c23ba700,0,...) at swcr_process+0x4ed/frame 0xe23aa2c0
crypto_invoke(0,0,c17d23a8,c7ea4a40,cc940150,...) at crypto_invoke+0x92/frame 0xe23aa2f4
crypto_dispatch(cc3e9c6c,c1567acf,363,e23aa373,c719fd00,...) at crypto_dispatch+0x78/frame 0xe23aa310
esp_output(cbc96300,cc253b00,0,14,9,...) at esp_output+0xa03/frame 0xe23aa3c8
ipsec4_process_packet(cbc96300,cc253b00,e23aa560,0,cce89c80,...) at ipsec4_process_packet+0x226/frame 0xe23aa4c0
ip_ipsec_output(e23aa564,0,e23aa560,c78e5990,e23aa570,...) at ip_ipsec_output+0xc9/frame 0xe23aa4e8
ip_output(cbc96300,0,0,1,0,...) at ip_output+0xb74/frame 0xe23aa5a4
in_gre_output(cbc96300,2,18,373,35b,...) at in_gre_output+0xad/frame 0xe23aa5d0
gre_transmit(ca1b5800,cbc96300,4,1,2,...) at gre_transmit+0x30f/frame 0xe23aa638
gre_output(ca1b5800,cbc96300,c7934b90,e23aa838,cc49a300,...) at gre_output+0x1a5/frame 0xe23aa660
ip_output(cbc96300,0,e23aa838,0,0,...) at ip_output+0x18da/frame 0xe23aa720
tcp_output(cc9a5c20,1,c155db2d,769,cce89e80,...) at tcp_output+0x226e/frame 0xe23aa898
tcp_do_segment(cc47c390,cc9a5c20,34,554,0,...) at tcp_do_segment+0x3352/frame 0xe23aa948
tcp_input(e23aaa94,e23aaa90,6,c7934fa8,a,...) at tcp_input+0x100d/frame 0xe23aaa48
ip_input(cb62e200,c0c8b39d,c7934600,c17bd7a8,c7934600,...) at ip_input+0x2a5/frame 0xe23aaab4
netisr_dispatch_src(1,0,cb62e200) at netisr_dispatch_src+0xab/frame 0xe23aaaf8
netisr_dispatch(1,cb62e200,588,0,14,...) at netisr_dispatch+0x20/frame 0xe23aab0c
gre_input(e23aabe4,e23aabe0,2f,1,14,...) at gre_input+0x224/frame 0xe23aab40
encap4_input(e23aabe4,e23aabe0,2f,c1960a00,c1960a10,...) at encap4_input+0x1e2/frame 0xe23aab98
ip_input(cb62e200,0,c154c9dd,2f3,1,...) at ip_input+0x2a5/frame 0xe23aac08
swi_net(c2025880,c152c859,560,8a3d8a50,c763e0c8,...) at swi_net+0x1a3/frame 0xe23aac48
intr_event_execute_handlers(c1960a10,c763e080,c152c859,560,c1960a00,...) at intr_event_execute_handlers+0xde/frame 0xe23aac70
ithread_loop(c7643410,e23aace8,c152c5cd,3e6,0,...) at ithread_loop+0x90/frame 0xe23aacac
fork_exit(c0bd1ff0,c7643410,e23aace8) at fork_exit+0x7e/frame 0xe23aacd4
fork_trampoline() at fork_trampoline+0x8/frame 0xe23aacd4
--- trap 0, eip = 0, esp = 0xe23aad20, ebp = 0 ---
KDB: enter: panic
[ thread pid 12 tid 100006 ]
Stopped at      kdb_enter+0x3d: movl    $0,kdb_why
db>
Comment 3 Jason Unovitch freebsd_committer freebsd_triage 2015-07-26 13:55:53 UTC
(In reply to Andrey V. Elsukov from comment #1)

gre0 is still down, but I did notice an LOR doing a setkey -DP.  The LOR was not reproducible on subsequent setkey -DP runs.

lock order reversal:
 1st 0xc19d043c sptree (sptree) @ /usr/src/head/sys/netipsec/key.c:2403
 2nd 0xc1a2bdbc rawcb (rawcb) @ /usr/src/head/sys/netipsec/keysock.c:286
KDB: stack backtrace:
db_trace_self_wrapper(c1538ae5,eb7b000a,c0bedbfb,c7603400,8,...) at db_trace_self_wrapper+0x2a/frame 0xeb7b6748
kdb_backtrace(c153c771,c1a2bdbc,c154d495,c71a2e90,c1567253,...) at kdb_backtrace+0x2d/frame 0xeb7b67b0
witness_checkorder(c1a2bdbc,9,c1567253,11e,0,...) at witness_checkorder+0xd6e/frame 0xeb7b6804
__mtx_lock_flags(c1a2bdcc,0,c1567253,11e,0,...) at __mtx_lock_flags+0x99/frame 0xeb7b6838
key_sendup_mbuf(cc8261c8,cb507400,0,c1565abd,963,...) at key_sendup_mbuf+0x1da/frame 0xeb7b6888
key_spddump(cc8261c8,cc0f4a00,eb7b6908,c199dfb8,10,...) at key_spddump+0x14d/frame 0xeb7b68e0
key_parse(cc0f4a00,cc8261c8,eb7b6a88,cc0f4aac,10,...) at key_parse+0xedf/frame 0xeb7b6a68
key_output(cc0f4a00,cc8261c8) at key_output+0x24a/frame 0xeb7b6aa4
raw_usend(cc8261c8,0,cc0f4a00,0,0,...) at raw_usend+0x49/frame 0xeb7b6abc
sosend_generic(cc8261c8,0,eb7b6b70,0,0,...) at sosend_generic+0x423/frame 0xeb7b6b30
kern_sendit(c78e6660,3,eb7b6bf8,0,0,...) at kern_sendit+0x298/frame 0xeb7b6ba4
sendit(eb7b6bf8,0,bfbfece0,0,0,...) at sendit+0x131/frame 0xeb7b6be0
sys_sendto(c78e6660,eb7b6ca8,c152c1b1,4,c1161a77,...) at sys_sendto+0x58/frame 0xeb7b6c18
syscall(eb7b6ce8) at syscall+0x33e/frame 0xeb7b6cdc
Xint0x80_syscall() at Xint0x80_syscall+0x2c/frame 0xeb7b6cdc
--- syscall (133, FreeBSD ELF32, sys_sendto), eip = 0x2818acbf, esp = 0xbfbf6c3c, ebp = 0xbfbf6c60 ---


For reference, this is the result of the setkey -DP

x.x.246.90[any] x.x.115.49[any] gre
        in ipsec
        esp/transport//unique:1
        created: Jul 26 13:38:57 2015  lastused: Jul 26 13:38:57 2015
        lifetime: 2147483647(s) validtime: 0(s)
        spid=136 seq=1 pid=66946
        refcnt=1
x.x.115.49[any] x.x.246.90[any] gre
        out ipsec
        esp/transport//unique:1
        created: Jul 26 13:38:57 2015  lastused: Jul 26 13:38:57 2015
        lifetime: 2147483647(s) validtime: 0(s)
        spid=135 seq=0 pid=66946
        refcnt=1
Comment 4 Jason Unovitch freebsd_committer freebsd_triage 2015-08-14 20:18:55 UTC
I've had some time to get back to this and have found something interesting.  This is on the below `uname -a`:

FreeBSD xju-rtr 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r286781: Fri Aug 14 15:38:29 UTC 2015     root@xts-bsd:/usr/obj/nanobsd.soekris/i386.i386/usr/src/head/sys/GENERIC  i386

This configuration seems stable for me.

/etc/pf.conf
#scrub on $wan_phy all no-df random-id fragment reassemble
scrub on $wan_phy all no-df fragment reassemble

/etc/sysctl.conf
#net.inet.ip.random_id="1"

With both the random_id sysctl AND a random-id scrub rule I see a panic within a minute of doing a heavy rsync.  With just one random IPID measure, I was able to do and rsync for about 10-15 minutes without seeing a panic.

Here's another panic backtrace with both enabled.  Now that I have some things of interest, I am going to try one config for a day each to evaluate if it is stable over time since my individual test cases were only for about 10-15 minutes of passing traffic.

Fatal double fault:
eip = 0xc054980c
esp = 0xeb5a2ffc
ebp = 0xeb5a3158
cpuid = 0; apic id = 00
panic: double fault
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper(c153b57d,c1796cb4,c19e865c,c19e85d0,c0a2aa0d,...) at db_trace_self_wrapper+0x2a/frame 0xc19e85a0
kdb_backtrace(c170be7b,0,c1702502,c19e865c,0,...) at kdb_backtrace+0x2d/frame 0xc19e8608
vpanic(c1702502,c19e865c,c19e865c,c19e8674,c115c696,...) at vpanic+0x117/frame 0xc19e863c
panic(c1702502,0,0,0,eb5a3158,...) at panic+0x1b/frame 0xc19e8650
dblfault_handler() at dblfault_handler+0xa6/frame 0xc19e8650
--- trap 0x17, eip = 0xc054980c, esp = 0xeb5a2ffc, ebp = 0xeb5a3158 ---
_libmd_SHA256_Transform(c18b1774,eb5a319c,8,c19a2fa0,aa55aa55,...) at _libmd_SHA256_Transform+0xbc/frame 0xeb5a3158
_libmd_SHA256_Update(c18b1714,eb5a319c,8,0,809a0000,...) at _libmd_SHA256_Update+0x78/frame 0xeb5a3188
_libmd_SHA256_Final(eb5a35e8,c18b1714) at _libmd_SHA256_Final+0x8a/frame 0xeb5a31b0
randomdev_hash_finish(c18b1714,eb5a35e8,c14dfc14,161,aa55aa55,...) at randomdev_hash_finish+0x18/frame 0xeb5a31c0
random_fortuna_pre_read(2811818,ffffffff,58f,bd95c999,bd90b689,...) at random_fortuna_pre_read+0x124/frame 0xeb5a3710
read_random(eb5a3868,100,c154a471,8c,8,...) at read_random+0x2b/frame 0xeb5a3838
arc4rand(eb5a39a0,2,0,db,2,...) at arc4rand+0x77/frame 0xeb5a3980
ip_fillid(cc1083e8,c7ecba00,c9fe4bbc,4a0,c7190da8,...) at ip_fillid+0x93/frame 0xeb5a39c0
pf_normalize_ip(eb5a3c1c,2,c7ecba00,eb5a3b80,eb5a3b28,...) at pf_normalize_ip+0x1062/frame 0xeb5a3aa0
pf_test(2,c760bc00,eb5a3c1c,0,c1a2f344,...) at pf_test+0x216/frame 0xeb5a3bcc
pf_check_out(0,eb5a3c1c,c760bc00,2,0,...) at pf_check_out+0x4b/frame 0xeb5a3bf0
pfil_run_hooks(c1a2f344,eb5a3cec,c760bc00,2,0,...) at pfil_run_hooks+0x9f/frame 0xeb5a3c50
ip_output(cc108300,0,0,2,0,...) at ip_output+0xb05/frame 0xeb5a3d30
ipsec_process_done(cc108300,cc4fd680,3a1,f5276691,df38d9c3,...) at ipsec_process_done+0x29a/frame 0xeb5a3d80
esp_output_cb(cc694c30,41,eb5a3e38,c0e545c0,cc108300,...) at esp_output_cb+0x4a6/frame 0xeb5a3e08
crypto_done(cc694c30,cc108300,5bc,10,eb5a3ec0,...) at crypto_done+0xa3/frame 0xeb5a3e38
swcr_process(c76cdc00,cc694c30,0,c23bbc40,0,...) at swcr_process+0xca/frame 0xeb5a4508
crypto_invoke(0,0,c7622600,9,cc698738,...) at crypto_invoke+0x92/frame 0xeb5a453c
crypto_dispatch(cc694c30,c156a7f6,36e,eb5a45bb,3,...) at crypto_dispatch+0x78/frame 0xeb5a4558
esp_output(cc108300,cc4fd680,0,14,9,...) at esp_output+0x9d2/frame 0xeb5a4610
ipsec4_process_packet(cc108300,cc4fd680,eb5a47ac,0,cc5ef380,...) at ipsec4_process_packet+0x226/frame 0xeb5a4708
ip_ipsec_output(eb5a47cc,0,eb5a47ac,c0bee61b,c7605400,...) at ip_ipsec_output+0xc9/frame 0xeb5a4730
ip_output(cc108300,0,0,1,0,...) at ip_output+0xa96/frame 0xeb5a480c
in_gre_output(cc108300,2,18,373,35b,...) at in_gre_output+0xad/frame 0xeb5a4838
gre_transmit(c7f5dc00,cbd3b200,4,cbd3b200,2,...) at gre_transmit+0x30f/frame 0xeb5a48a0
gre_output(c7f5dc00,cbd3b200,c7b9bbf0,eb5a49f8,0,...) at gre_output+0x1a5/frame 0xeb5a48c8
ip_output(cbd3b200,0,eb5a49f8,1,0,...) at ip_output+0x191d/frame 0xeb5a49a8
ip_forward(cbd3b200,0,28e,1,0,...) at ip_forward+0x425/frame 0xeb5a4a30
ip_input(cbd3b200,c7872330,c7e7021c,c7e7020c,cbd3b200,...) at ip_input+0xa98/frame 0xeb5a4a9c
netisr_dispatch_src(1,0,cbd3b200) at netisr_dispatch_src+0xab/frame 0xeb5a4ae0
netisr_dispatch(1,cbd3b200,1,c760a800,c7e70200,cbd3b200) at netisr_dispatch+0x20/frame 0xeb5a4af4
ether_demux(c760a800,cbd3b200,6,0,8,...) at ether_demux+0x18d/frame 0xeb5a4b20
ether_nh_input(cbd3b200,801,246,eb5a4bac,cbd41800,...) at ether_nh_input+0x377/frame 0xeb5a4b4c
netisr_dispatch_src(5,0,cbd3b200) at netisr_dispatch_src+0xab/frame 0xeb5a4b90
netisr_dispatch(5,cbd3b200,c7896920,1,c7896900,...) at netisr_dispatch+0x20/frame 0xeb5a4ba4
ether_input(c760c800,cbd3b200,eb5a4c2c,c06f4bbc,c760c800,...) at ether_input+0x4f/frame 0xeb5a4bc0
if_input(c760c800,cbd3b200,c1253435,1239,c78a3000,...) at if_input+0x19/frame 0xeb5a4bd0
em_rxeof(c760c800,c1963510,0,c74dab00,c7519b80,...) at em_rxeof+0x3bc/frame 0xeb5a4c2c
em_msix_rx(c7896900,c152f3b5,560,1f3619f8,c7519bc8,...) at em_msix_rx+0x2f/frame 0xeb5a4c48
intr_event_execute_handlers(c1963510,c7519b80,c152f3b5,560,c1963500,...) at intr_event_execute_handlers+0xde/frame 0xeb5a4c70
ithread_loop(c7698ed0,eb5a4ce8,c152f129,3e6,0,...) at ithread_loop+0x90/frame 0xeb5a4cac
fork_exit(c0bd2a20,c7698ed0,eb5a4ce8) at fork_exit+0x7e/frame 0xeb5a4cd4
fork_trampoline() at fork_trampoline+0x8/frame 0xeb5a4cd4
--- trap 0, eip = 0, esp = 0xeb5a4d20, ebp = 0 ---
KDB: enter: panic
[ thread pid 12 tid 100072 ]
Stopped at      kdb_enter+0x3d: movl    $0,kdb_why
db>
Comment 5 Jason Unovitch freebsd_committer freebsd_triage 2015-08-14 22:05:47 UTC
After the couple hours of uptime with both the random_id options turn off, I got a panic with the original message:

"panic: rw lock 0xcb208a50 not unlocked"

It would seem there are two issues going on here.  I am going to keep the random stuff off to focus on one thing at a time.  Any ideas on what would be beneficial to help to debug this?
Comment 6 Jason Unovitch freebsd_committer freebsd_triage 2015-09-20 22:38:19 UTC
Seeing bug 202667 I got the idea to try amd64 since the Soekris does support it with device mptable is in the kernel config.  Without all the INVARIANTS and other debugging options I didn't see any panics for 10 days on both routers.  I switched one to be GENERIC with device mptable added to get all the debugging and still see the same basic panic.

panic: rw lock 0xfffff8000a37b660 not unlocked
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00970cb930
vpanic() at vpanic+0x189/frame 0xfffffe00970cb9b0
kassert_panic() at kassert_panic+0x132/frame 0xfffffe00970cba20
_rw_destroy() at _rw_destroy+0x42/frame 0xfffffe00970cba40
ipsec_delisr() at ipsec_delisr+0x12/frame 0xfffffe00970cba60
_key_freesp() at _key_freesp+0xc1/frame 0xfffffe00970cbaa0
esp_output_cb() at esp_output_cb+0x1d8/frame 0xfffffe00970cbb30
swcr_process() at swcr_process+0xde/frame 0xfffffe00970cc2a0
crypto_dispatch() at crypto_dispatch+0x82/frame 0xfffffe00970cc2c0
esp_output() at esp_output+0x850/frame 0xfffffe00970cc3b0
ipsec4_process_packet() at ipsec4_process_packet+0x222/frame 0xfffffe00970cc4a0
ip_ipsec_output() at ip_ipsec_output+0xb1/frame 0xfffffe00970cc4e0
ip_output() at ip_output+0x96d/frame 0xfffffe00970cc620
gre_transmit() at gre_transmit+0x2e4/frame 0xfffffe00970cc6c0
gre_output() at gre_output+0x183/frame 0xfffffe00970cc700
ip_output() at ip_output+0x12f1/frame 0xfffffe00970cc840
udp_send() at udp_send+0xaab/frame 0xfffffe00970cc920
sosend_dgram() at sosend_dgram+0x380/frame 0xfffffe00970cc990
kern_sendit() at kern_sendit+0x211/frame 0xfffffe00970cca40
sendit() at sendit+0x129/frame 0xfffffe00970cca90
sys_sendto() at sys_sendto+0x4d/frame 0xfffffe00970ccae0
amd64_syscall() at amd64_syscall+0x2d6/frame 0xfffffe00970ccbf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe00970ccbf0
--- syscall (133, FreeBSD ELF64, sys_sendto), rip = 0x20b6aa7a, rsp = 0x7fffffffcce8, rbp = 0x7fffffffd420 ---
KDB: enter: panic
[ thread pid 55975 tid 100107 ]


The PID listed was syslogd.

55975     1 55975     0  Rs      CPU 0                       syslogd

I tried to do some stress testing with a while true/logger loop but wasn't able to trigger a panic.
Comment 7 Andrey V. Elsukov freebsd_committer freebsd_triage 2015-09-21 02:05:31 UTC
Ermal, it looks like I have introduced this problem, so I' taking this PR.
Comment 8 Andrey V. Elsukov freebsd_committer freebsd_triage 2015-09-21 02:11:28 UTC
It seems the crypto callback releases the last reference to SP (probably IKE daemon has removed security policy from userland). But since ISR is locked by ipsec4_process_packet() you get a panic triggered by KASSERT.

Do you use IKE daemon?
Comment 9 Jason Unovitch freebsd_committer freebsd_triage 2015-09-21 02:12:33 UTC
(In reply to Andrey V. Elsukov from comment #8)

I am using strongswan-5.3.2
Comment 10 Andrey V. Elsukov freebsd_committer freebsd_triage 2015-09-21 03:40:40 UTC
Created attachment 161220 [details]
Proposed patch (untested)

Change the way how we protect from removing SP. Take one reference before calling ipsec[46]_process_packet. This protects us from removing SP while we preparing crypto request. We release this reference just after crypto request will be queued. Then we take one reference when scheduling crypto request. This protects us from removing SP while crypto request will be processed. We will release it crypto callback function.
Comment 11 Andrey V. Elsukov freebsd_committer freebsd_triage 2015-09-21 06:24:17 UTC
Created attachment 161221 [details]
Proposed patch (untested)

There is no need to take extra reference for bundled SA, since we still hold reference in the crypto callback.
Comment 12 Jason Unovitch freebsd_committer freebsd_triage 2015-09-21 14:34:49 UTC
(In reply to Andrey V. Elsukov from comment #11)
Thanks!  I just put this later patch in place.  The only thing is I will be away from home for a couple days so I won't be able to provide more feedback for a bit.

For clarity as well this is my home router with a single IPSec transport mode connection for GRE traffic to a family member.  If it's been panicing for me in this small use case hopefully we can get to the bottom before this gets used in production environments.

Thanks again!
Comment 13 Jason Unovitch freebsd_committer freebsd_triage 2015-09-26 15:00:35 UTC
(In reply to Andrey V. Elsukov from comment #11)
I did not have a panic while I was away.  I updated the other side of the link so both are running r288061M with the patch in comment 11.

Now that I am home again I will continue to monitor as well as throw some more traffic across the link.
Comment 14 commit-hook freebsd_committer freebsd_triage 2015-09-30 08:17:11 UTC
A commit references this bug:

Author: ae
Date: Wed Sep 30 08:16:36 UTC 2015
New revision: 288418
URL: https://svnweb.freebsd.org/changeset/base/288418

Log:
  Take extra reference to security policy before calling crypto_dispatch().

  Currently we perform crypto requests for IPSEC synchronous for most of
  crypto providers (software, aesni) and only VIA padlock calls crypto
  callback asynchronous. In synchronous mode it is possible, that security
  policy will be removed during the processing crypto request. And crypto
  callback will release the last reference to SP. Then upon return into
  ipsec[46]_process_packet() IPSECREQUEST_UNLOCK() will be called to already
  freed request. To prevent this we will take extra reference to SP.

  PR:		201876
  Sponsored by:	Yandex LLC

Changes:
  head/sys/netinet/ip_ipsec.c
  head/sys/netinet6/ip6_ipsec.c
  head/sys/netipsec/ipsec_output.c
  head/sys/netipsec/xform_ah.c
  head/sys/netipsec/xform_esp.c
  head/sys/netipsec/xform_ipcomp.c