There are lot's of corrupted IP packet with aesni module loaded. I've set-up a very simple lab across 2 FreeBSD servers in IPSec gateway mode. pkt-generator ====== fbsd srv1 ====== fbsd svr 2 ==== pkt-receiver With aesni module loaded and this very simple static ipsec configuration (setkey.conf): flush; spdflush; spdadd 1.0.0.0/8 3.0.0.0/8 any -P in ipsec esp/tunnel/2.2.2.2-2.2.2.3/require; spdadd 3.0.0.0/8 1.0.0.0/8 any -P out ipsec esp/tunnel/2.2.2.3-2.2.2.2/require; add 2.2.2.2 2.2.2.3 esp 0x1000 -E rijndael-cbc "1234567890123456"; add 2.2.2.3 2.2.2.2 esp 0x1001 -E rijndael-cbc "1234567890123456"; Then generating exactly 100 000 packets in a low-rate of 1000 paquet-per-second using netmap's pktgen crossing these 2 FreeBSD IPSec gateway. => On the packet-receiver, there is only about 80-95% of these 100 000 paquets received. Troubleshooting session show that the "receiving" IPSec gateway correctly receive all 100 000 encrypted packets and correctly decrypt them… but once decrypted these packets are no more valid IP packets: IP section of a "netstat - s" on fbsd srv2 show lot's of invalid IP packets exactly matching the number of missing packet. And these bad packet are never forwarded to the pkt-receiver. Here is an example of stat on the fbsd srv 2: [root@srv2]~# sysctl dev.igb.2.mac_stats.rx_frames_512_1023 dev.igb.2.mac_stats.rx_frames_512_1023: 100000 [root@srv2]~# sysctl dev.igb.3.mac_stats.tx_frames_512_1023 dev.igb.3.mac_stats.tx_frames_512_1023: 99128 => Here, 100K encrypted packets are received, but only 99128 are forwarded. 872 packet missing. netstat -s output on srv2: ip: 200131 total packets received 38 with data size < data length 15 with header length < data size 1 with bad options 818 with incorrect version number 99128 packets forwarded => 38+15+1+818=872 bad IP packets, we found all our missing packet. For fixing this problem we just had to NOT load aesni module on srv1. Bug reproduced with these release: - FreeBSD 10.1-RELEASE-p10 #0 r282880M (about 1-5 % of corrupted packet) - FreeBSD 11.0-CURRENT #2 r283536M (about 10-20 % of corrupted packet) Bug reproduced with these CPU: - Intel Atom CPU C2558 - Intel Xeon CPU L5630 More information on the IPsec lab here: http://bsdrp.net/documentation/examples/ipsec_performance_lab_of_an_ibm_system_x3550_m3_with_intel_82580
Can you kldload hwpmc and look which functions are in use when you are doing the tests? (run pmcstat -TS instructions -w1) You should see aesni_process() or swcr_process() in the top of list. If you will see swcr_process(), this means aesni isn't used.
Also, is there some errors in the IPSec stats? # netstat -sp ipsec # netstat -sp esp
Ok, new test under FreeBSD 11.0-CURRENT #3 r283536 (Still generating 100 000 packets in 1000pps.) Here is first line of pwmc output during the load (done on the "encrypter IPSec gateway side"): PMC: [INSTR_RETIRED_ANY] Samples: 544 (100.0%) , 0 unresolved %SAMP IMAGE FUNCTION CALLERS 7.4 aesni.ko aesni_encrypt_cbc aesni_process 4.2 kernel cpu_search_highest sched_idletd:2.6 cpu_search_highest:1.7 2.8 kernel spinlock_exit intr_event_schedule_thread:1.1 handleevents:0.6 2.4 kernel uma_zalloc_arg crypto_getreq:1.3 malloc:0.9 2.4 libc.so.7 bsearch 0x63b4 2.4 kernel cpu_search_lowest cpu_search_lowest:1.3 sched_pickcpu:1.1 2.0 kernel critical_exit spinlock_exit:1.1 sched_idletd:0.6 2.0 kernel __rw_rlock in_lltable_lookup:0.6 ip_input:0.6 1.8 kernel _rw_runlock_cookie rtalloc1_fib 1.8 kernel igb_rxeof igb_msix_que 1.8 kernel ip_output ipsec_process_done 1.7 kernel spinlock_enter thread_lock_flags_ 1.5 kernel sched_switch mi_switch 1.3 kernel key_allocsp ipsec_getpolicybyaddr 1.3 kernel sched_pickcpu sched_add 1.1 kernel rn_match rtalloc1_fib 1.1 kernel bzero 1.1 kernel cpu_switch mi_switch 1.1 kernel bounce_bus_dmamap_lo bus_dmamap_load_mbuf_sg 1.1 pmcstat 0x63d3 bsearch Now on the "decrypter IPSec gateway side" the netstat output: [root@R3]~# netstat -sp ipsec ipsec: 0 inbound packets violated process security policy 0 inbound packets failed due to insufficient memory 0 invalid inbound packets 0 outbound packets violated process security policy 0 outbound packets with no SA available 0 outbound packets failed due to insufficient memory 0 outbound packets with no route available 0 invalid outbound packets 0 outbound packets with bundled SAs 0 mbufs coalesced during clone 0 clusters coalesced during clone 0 clusters copied during clone 0 mbufs inserted during makespace [root@R3]~# netstat -sp esp esp: 0 packets shorter than header shows 0 packets dropped; protocol family not supported 0 packets dropped; no TDB 0 packets dropped; bad KCR 0 packets dropped; queue full 0 packets dropped; no transform 0 packets dropped; bad ilen 0 replay counter wraps 0 packets dropped; bad encryption detected 0 packets dropped; bad authentication detected 0 possible replay packets detected 100000 packets in 0 packets out 0 packets dropped; invalid TDB 54400000 bytes in 0 bytes out 0 packets dropped; larger than IP_MAXPACKET 0 packets blocked due to policy 0 crypto processing failures 0 tunnel sanity check failures ESP output histogram: rijndael-cbc: 100000 => No "Ipsec/esp" problem: IPsec packets are correctly generated. But once decrypted, lot's of errors (too small, bad header, incorrect version number, etc…): [root@R3]~# netstat -sp ip ip: 200145 total packets received 0 bad header checksums 0 with size smaller than minimum 40 with data size < data length 0 with ip length > max ip packet size 19 with header length < data size 0 with data length < header length 1 with bad options 818 with incorrect version number 0 fragments received 0 fragments dropped (dup or out of space) 0 fragments dropped after timeout 0 packets reassembled ok 100145 packets for this host 0 packets for unknown/unsupported protocol 99122 packets forwarded (0 packets fast forwarded) 0 packets not forwardable 0 packets received for unknown multicast group 0 redirects sent 120 packets sent from this host 0 packets sent with fabricated ip header 0 output packets dropped due to no bufs, etc. 0 output packets discarded due to no route 0 output datagrams fragmented 0 fragments created 0 datagrams that can't be fragmented 0 tunneling packets that can't find gif 0 datagrams with bad address in header => On 100 000 IPSec packets received, ALL of them are correctly decrypted, but once decrypted their contends are corrupted.
It is possible that IP packets were currupted before encryption or after decryption (encapsulation/decapsulation). Can you try similar test, but use IPSec transport mode? You need to create gif or gre tunnel between srv1 and srv2. Packets from pkt-generator should be routed through this tunnel. You can test such configuration without encryption, then add SP and SA for this traffic and see how it will work :)
If I unload aesni module on "encrypter" side, the problem disappear: Then how can the packet being corrupted after decryption ? New test without aesni module loaded on the "encrypter side" (srv1), but still loaded on "decrypter side" (srv2): Encrypter: [root@srv1]~# kldstat Id Refs Address Size Name 1 8 0xffffffff80200000 17dc0f0 kernel 2 1 0xffffffff81c11000 2dd6 ichsmb.ko 3 1 0xffffffff81c14000 e7e smbus.ko 4 1 0xffffffff81c15000 2a16 coretemp.ko Decrypter: [root@srv2]~# kldstat Id Refs Address Size Name 1 11 0xffffffff80200000 17dc0f0 kernel 2 1 0xffffffff81c11000 7fe8 aesni.ko 3 1 0xffffffff81c19000 2dd6 ichsmb.ko 4 1 0xffffffff81c1c000 e7e smbus.ko 5 1 0xffffffff81c1d000 2a16 coretemp.ko Then, again, generating exactly 100 000 packets in a low-rate of 1000 paquet-per-second using netmap's pktgen crossing these 2 FreeBSD IPSec gateway. Stat on "decrypter side" (srv2): [root@srv2]~# sysctl dev.igb.2.mac_stats.rx_frames_512_1023 dev.igb.2.mac_stats.rx_frames_512_1023: 100000 [root@srv2]~# sysctl dev.igb.3.mac_stats.tx_frames_512_1023 dev.igb.3.mac_stats.tx_frames_512_1023: 100000 => All packets are correctly decrypted AND forwarded No more "bad ip packet" errors on decrypter side: [root@srv2]~# netstat -ssp ip ip: 200064 total packets received 100064 packets for this host 100000 packets forwarded 69 packets sent from this host Then, should I still do a new test in Transport mode ?
I've found a workaround for this problem (tested multiple times): By removing automatic load of aesni module, either in /boot/loader.conf or in kld_list of rc.conf, and load it manually: Payload corruption of some IPSec packets disappear!
With aesni compiled in the kernel, I've still packet corruption. As conclusion: With early initiated aesni module, we've got some IPSec payload corruption. You can found online a tcpdump (500MB) of encrypted capture: http://dev.bsdrp.net/aesni-corrupt-ipsec.pcap It's rijndael-cbc with pass "1234567890123456" for decoding it in wireshark.
Commit r285289 fixes this problem.