Bug 211486 - [panic] [IPSec] [IP6] Crash with IPv6 ESP usage
Summary: [panic] [IPSec] [IP6] Crash with IPv6 ESP usage
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-STABLE
Hardware: Any Any
: --- Affects Many People
Assignee: Andrey V. Elsukov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-31 21:04 UTC by Harald Schmalzbauer
Modified: 2016-09-22 18:06 UTC (History)
3 users (show)

See Also:
op: mfc-stable10?


Attachments
Proposed patch (519 bytes, patch)
2016-08-02 11:56 UTC, Andrey V. Elsukov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Harald Schmalzbauer 2016-07-31 21:04:08 UTC
Unread portion of the kernel message buffer:
Kernel page fault with the following non-sleepable locks held:
exclusive rw tcpinp (tcpinp) r = 0 (0xfffff80007b1fe18) locked @ /usr/local/share/deploy-tools/RELENG_11/src/sys/netinet6/in6_pcb.c:1172
shared rw tcp (tcp) r = 0 (0xffffffff82ad2bd8) locked @ /usr/local/share/deploy-tools/RELENG_11/src/sys/netinet/tcp_input.c:802
stack backtrace:
#0 0xffffffff80ab4d30 at witness_debugger+0x70
#1 0xffffffff80ab6017 at witness_warn+0x3d7
#2 0xffffffff80ec63d7 at trap_pfault+0x57
#3 0xffffffff80ec5a64 at trap+0x284
#4 0xffffffff80ea6161 at calltrap+0x8
#5 0xffffffff80c43c51 at tcp_twrespond+0x231
#6 0xffffffff80c436f5 at tcp_twstart+0x1f5
#7 0xffffffff80c34078 at tcp_do_segment+0x23c8
#8 0xffffffff80c310b4 at tcp_input+0xe44
#9 0xffffffff80c30221 at tcp6_input+0xf1
#10 0xffffffff80c82799 at ipsec6_common_input_cb+0x4c9
#11 0xffffffff80c97101 at esp_input_cb+0x671
#12 0xffffffff80ca9e69 at swcr_process+0xd69
#13 0xffffffff80ca6c2f at crypto_dispatch+0x7f
#14 0xffffffff80c9605a at esp_input+0x4fa
#15 0xffffffff80c8179b at ipsec_common_input+0x40b
#16 0xffffffff80c8222d at ipsec6_common_input+0xcd
#17 0xffffffff80c64070 at ip6_input+0xc70


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x1a
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80c65afc
stack pointer           = 0x28:0xfffffe0091f1e5f0
frame pointer           = 0x28:0xfffffe0091f1e850
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (em0 que)


I have static keys and policy (via ipsec.conf) which is in use for several years.
Updated stable/10 to stable/11 whci crashes the machine as soon as there's traffic mathcing the IPSec policy.

core dump available, just tell me how I can help – not able to diagnose furthere :-(

-Harry
Comment 1 Harald Schmalzbauer 2016-08-01 07:19:19 UTC
(In reply to Harald Schmalzbauer from comment #0)
Missed helpful info I guess:


#0  doadump (textdump=-18464194list *0xffffffff80c65afc
0xffffffff80c65afc is in ip6_output (/usr/local/share/deploy-tools/RELENG_11/src/sys/netinet6/ip6_output.c:1060).
1055    done:
1056            /*
1057             * Release the route if using our private route, or if
1058             * (with flowtable) we don't have our own reference.
1059             */
1060            if (ro == &ip6route || ro->ro_flags & RT_NORTREF)
1061                    RO_RTFREE(ro);
1062            return (error);
1063
1064    freehdrs:
40) at pcpu.h:221

#1  0xffffffff80393346 in db_fncall (dummy1=<value optimized out>, dummy2=<value optimized out>, dummy3=<value optimized out>, dummy4=<value optimized out>)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/ddb/db_command.c:568
#2  0xffffffff80392de9 in db_command (cmd_table=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/ddb/db_command.c:440
#3  0xffffffff80392b44 in db_command_loop () at /usr/local/share/deploy-tools/RELENG_11/src/sys/ddb/db_command.c:493
#4  0xffffffff80395a7b in db_trap (type=<value optimized out>, code=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/ddb/db_main.c:251
#5  0xffffffff80a96133 in kdb_trap (type=<value optimized out>, code=<value optimized out>, tf=<value optimized out>)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/subr_kdb.c:654
#6  0xffffffff80ec6331 in trap_fatal (frame=0xfffffe0091f1e540, eva=26) at /usr/local/share/deploy-tools/RELENG_11/src/sys/amd64/amd64/trap.c:836
#7  0xffffffff80ec657d in trap_pfault (frame=0xfffffe0091f1e540, usermode=0) at /usr/local/share/deploy-tools/RELENG_11/src/sys/amd64/amd64/trap.c:691
#8  0xffffffff80ec5a64 in trap (frame=0xfffffe0091f1e540) at /usr/local/share/deploy-tools/RELENG_11/src/sys/amd64/amd64/trap.c:442
#9  0xffffffff80ea6161 in calltrap () at /usr/local/share/deploy-tools/RELENG_11/src/sys/amd64/amd64/exception.S:236
#10 0xffffffff80c65afc in ip6_output (m0=<value optimized out>, opt=<value optimized out>, ro=<value optimized out>, flags=<value optimized out>, im6o=0x0, 
    ifpp=0x0, inp=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/netinet6/ip6_output.c:1060
#11 0xffffffff80c43c51 in tcp_twrespond () at /usr/local/share/deploy-tools/RELENG_11/src/sys/netinet/tcp_timewait.c:594
#12 0xffffffff80c436f5 in tcp_twstart (tp=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/netinet/tcp_timewait.c:336
#13 0xffffffff80c34078 in tcp_do_segment (m=0xfffff8000732b400, th=<value optimized out>, so=<value optimized out>, tp=0xfffff80007b22000, drop_hdrlen=72, 
    tlen=<value optimized out>, iptos=<value optimized out>, ti_locked=Cannot access memory at address 0x1
) at /usr/local/share/deploy-tools/RELENG_11/src/sys/netinet/tcp_input.c:3141
#14 0xffffffff80c310b4 in tcp_input (mp=<value optimized out>, offp=<value optimized out>, proto=<value optimized out>)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/netinet/tcp_input.c:1442
#15 0xffffffff80c30221 in tcp6_input (mp=0xfffffe0091f1ebf8, offp=0xfffffe0091f1ebf4, proto=203)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/netinet/tcp_input.c:578
#16 0xffffffff80c82799 in ipsec6_common_input_cb (m=<value optimized out>, sav=<value optimized out>, skip=40, protoff=6)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/netipsec/ipsec_input.c:827
#17 0xffffffff80c97101 in esp_input_cb (crp=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/netipsec/xform_esp.c:626
#18 0xffffffff80ca9e69 in swcr_process (dev=<value optimized out>, crp=<value optimized out>, hint=<value optimized out>)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/opencrypto/cryptosoft.c:1185
#19 0xffffffff80ca6c2f in crypto_dispatch (crp=0xfffff80028f93840) at /usr/local/share/deploy-tools/RELENG_11/src/sys/opencrypto/crypto.c:807
#20 0xffffffff80c9605a in esp_input (m=<value optimized out>, sav=0xfffff80003ebb300, skip=<value optimized out>, protoff=<value optimized out>)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/netipsec/xform_esp.c:459
#21 0xffffffff80c8179b in ipsec_common_input (m=0xfffff8000732b400, skip=40, protoff=6, af=28, sproto=50)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/netipsec/ipsec_input.c:236
#22 0xffffffff80c8222d in ipsec6_common_input (mp=<value optimized out>, offp=<value optimized out>, proto=<value optimized out>)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/netipsec/ipsec_input.c:581
#23 0xffffffff80c64070 in ip6_input (m=0x3b003b00000001) at /usr/local/share/deploy-tools/RELENG_11/src/sys/netinet6/ip6_input.c:921
#24 0xffffffff80b5a7e0 in netisr_dispatch_src (proto=6, source=0, m=0xfffff8000732b400) at /usr/local/share/deploy-tools/RELENG_11/src/sys/net/netisr.c:1121
#25 0xffffffff80b4540a in ether_demux (ifp=<value optimized out>, m=0xffffffff81428eff)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/net/if_ethersubr.c:850
#26 0xffffffff80b46200 in ether_nh_input (m=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/net/if_ethersubr.c:639
#27 0xffffffff80b5a7e0 in netisr_dispatch_src (proto=5, source=0, m=0xfffff8000732b400) at /usr/local/share/deploy-tools/RELENG_11/src/sys/net/netisr.c:1121
#28 0xffffffff80b45772 in ether_input (ifp=<value optimized out>, m=0x0) at /usr/local/share/deploy-tools/RELENG_11/src/sys/net/if_ethersubr.c:759
#29 0xffffffff80b421fa in if_input (ifp=0xfffffe0091f1e5c8, sendmp=0xffffffff81428eff) at /usr/local/share/deploy-tools/RELENG_11/src/sys/net/if.c:3956
#30 0xffffffff80524acc in em_rxeof (count=98) at /usr/local/share/deploy-tools/RELENG_11/src/sys/dev/e1000/if_em.c:4873
#31 0xffffffff80526110 in em_handle_que (context=0xfffffe0000eb6000, pending=<value optimized out>)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/dev/e1000/if_em.c:1599
#32 0xffffffff80aa7a6c in taskqueue_run_locked (queue=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/subr_taskqueue.c:465
#33 0xffffffff80aa85b8 in taskqueue_thread_loop (arg=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/subr_taskqueue.c:719
#34 0xffffffff80a18904 in fork_exit (callout=0xffffffff80aa8530 <taskqueue_thread_loop>, arg=0xfffffe0000eb8730, frame=0xfffffe0091f1fac0)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_fork.c:103
#35 0xffffffff80ea669e in fork_trampoline () at /usr/local/share/deploy-tools/RELENG_11/src/sys/amd64/amd64/exception.S:611
#36 0x0000000000000000 in ?? ()
Comment 2 Andrey V. Elsukov freebsd_committer freebsd_triage 2016-08-01 21:53:37 UTC
Can you show your ipsec.conf (with masked keys/addresses)?
Comment 3 Harald Schmalzbauer 2016-08-02 08:33:57 UTC
(In reply to Andrey V. Elsukov from comment #2)

Thanks for your attention!

ipsec.conf of the affected machine:
############
# policies #
############
#----------------------------------------------------#
# Encrypt any IPv6 LDAP traffic to/from own networks #
#----------------------------------------------------#
# No local IP, -> site1
spdadd -6 ::/0 2001:db8:abcd::/48[389] any -P out ipsec esp/transport//require;
spdadd -6 2001:db8:abcd::/48[389] ::/0 any -P in ipsec esp/transport//require;
# No local IP, -> site2
spdadd -6 ::/0 2001:db8:ef00::/48[389] any -P out ipsec esp/transport//require;
spdadd -6 2001:db8:ef00::/48[389] ::/0 any -P in ipsec esp/transport//require;
#-----------------------------------------------#
#                    keys                       #
#-----------------------------------------------#
# key for host<->client
add -6 1stf.q.d.n 2ndf.q.d.n esp 54320 -E rijndael-cbc 0x00000000000000000000000000000000
add -6 2ndf.q.d.n 1st.f.q.d.n esp 54321 -E rijndael-cbc 0x000000000000000000000000000000000000

netstat -f inet6 -nr:

Destination                       Gateway                       Flags     Netif Expire
::/96                             ::1                           UGRS        lo0
default                           2001:db8:abcd:2::1            UGS         myif
::1                               link#2                        UH          lo0
::ffff:0.0.0.0/96                 ::1                           UGRS        lo0
2001:db8:abcd:2::/64              link#1                        U           myif
2001:db8:abcd:2::3:1              link#1                        UHS         lo0
fe80::/10                         ::1                           UGRS        lo0
fe80::%myif/64                     link#1                        U           myif
fe80::20c:29ff:feac:e09a%myif      link#1                        UHS         lo0
fe80::%lo0/64                     link#2                        U           lo0
fe80::1%lo0                       link#2                        UHS         lo0
ff02::/16                         ::1                           UGRS        lo0

Additional notes:
1st.f.q.d.n has the AAAA record 2001:db8:abcd::efgh:10, so default gateway sits on the trouted.
netif "myif" is renamed (and masked) em0|vmx0. With both interfaces it's the same panic. Also mtu settings (which is 9000 on the interface and 1500 on the default route normally) don't influence the panic.
No pf|ipfw involved.

As soon as I fire up 'ldapsearch', I get the result followed by an immediate crash. Since I'd like to help testing that this will work in 11-RELEASE, I'll keep 11 installed on this host, but it means a medium severe outage, since no ldap users can login anymore. Hope the fix isn't too hard to find!

-Harry
Comment 4 Andrey V. Elsukov freebsd_committer freebsd_triage 2016-08-02 11:56:58 UTC
Created attachment 173191 [details]
Proposed patch

Can you test this patch? It looks like it should help.
As I understand from your backtrace, your host is going to send TCP ACK via ip6_output, it handled by ip6_ipsec_output(), due to presence of corresponding IPSec policy. But, since *ro* pointer is zero and wasn't initialized yet, NULL pointer dereference occurs in listed by you check. Also ro_flags has 0x1a offset in the struct route_in6 (fault addres is 0x1a).
Comment 5 commit-hook freebsd_committer freebsd_triage 2016-08-02 12:18:48 UTC
A commit references this bug:

Author: ae
Date: Tue Aug  2 12:18:06 UTC 2016
New revision: 303657
URL: https://svnweb.freebsd.org/changeset/base/303657

Log:
  Fix NULL pointer dereference.
  ro pointer can be NULL when IPSec consumes mbuf.

  PR:		211486
  MFC after:	3 days

Changes:
  head/sys/netinet6/ip6_output.c
Comment 6 Harald Schmalzbauer 2016-08-02 14:00:21 UTC
(In reply to Andrey V. Elsukov from comment #4)

Thank you very much for your quick solution, which seems to solve the problem.
No more immediate crashes yet – just very briefly tested.
Looking forward seeing MFC happen.
Thanks,

-Harry
Comment 7 commit-hook freebsd_committer freebsd_triage 2016-08-05 15:12:32 UTC
A commit references this bug:

Author: ae
Date: Fri Aug  5 15:12:29 UTC 2016
New revision: 303768
URL: https://svnweb.freebsd.org/changeset/base/303768

Log:
  MFC r303657:
    Fix NULL pointer dereference.
    ro pointer can be NULL when IPSec consumes mbuf.

    PR:		211486
  Approved by:	re (gjb)

Changes:
_U  stable/11/
  stable/11/sys/netinet6/ip6_output.c
Comment 8 Andrey V. Elsukov freebsd_committer freebsd_triage 2016-08-05 15:23:02 UTC
Fixed in head/ and stable/11. Thanks!