Bug 200472 - aesni module corrupt IP packets during encryption with IPSec
Summary: aesni module corrupt IP packets during encryption with IPSec
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.1-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-26 19:40 UTC by olivier
Modified: 2015-07-09 21:31 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description olivier 2015-05-26 19:40:00 UTC
There are lot's of corrupted IP packet with aesni module loaded.

I've set-up a very simple lab across 2 FreeBSD servers in IPSec gateway mode.

pkt-generator ====== fbsd srv1 ====== fbsd svr 2  ==== pkt-receiver

With aesni module loaded and this very simple static ipsec configuration (setkey.conf):

flush;
spdflush;
spdadd 1.0.0.0/8 3.0.0.0/8 any -P in ipsec esp/tunnel/2.2.2.2-2.2.2.3/require;
spdadd 3.0.0.0/8 1.0.0.0/8 any -P out ipsec esp/tunnel/2.2.2.3-2.2.2.2/require;
add 2.2.2.2 2.2.2.3 esp 0x1000 -E rijndael-cbc "1234567890123456";
add 2.2.2.3 2.2.2.2 esp 0x1001 -E rijndael-cbc "1234567890123456";

Then generating exactly 100 000 packets in a low-rate of 1000 paquet-per-second using netmap's pktgen crossing these 2 FreeBSD IPSec gateway.

=> On the packet-receiver, there is only about 80-95% of these 100 000 paquets received.

Troubleshooting session show that the "receiving" IPSec gateway correctly receive all 100 000 encrypted packets and correctly decrypt them… but once decrypted these packets are no more valid IP packets: IP section of a "netstat -
s" on fbsd srv2 show lot's of invalid IP packets exactly matching the number of missing packet. And these bad packet are never forwarded to the pkt-receiver.

Here is an example of stat on the fbsd srv 2:

[root@srv2]~# sysctl dev.igb.2.mac_stats.rx_frames_512_1023
dev.igb.2.mac_stats.rx_frames_512_1023: 100000
[root@srv2]~# sysctl dev.igb.3.mac_stats.tx_frames_512_1023
dev.igb.3.mac_stats.tx_frames_512_1023: 99128

=> Here, 100K encrypted packets are received, but only 99128 are forwarded. 872 packet missing.

netstat -s output on srv2:
ip:
        200131 total packets received
        38 with data size < data length
        15 with header length < data size
        1 with bad options
        818 with incorrect version number
        99128 packets forwarded

=> 38+15+1+818=872 bad IP packets, we found all our missing packet.

For fixing this problem we just had to NOT load aesni module on srv1.

Bug reproduced with these release:
- FreeBSD 10.1-RELEASE-p10 #0 r282880M (about 1-5 % of corrupted packet)
- FreeBSD 11.0-CURRENT #2 r283536M (about 10-20 % of corrupted packet)

Bug reproduced with these CPU:
- Intel Atom CPU C2558
- Intel Xeon CPU L5630

More information on the IPsec lab here:
http://bsdrp.net/documentation/examples/ipsec_performance_lab_of_an_ibm_system_x3550_m3_with_intel_82580
Comment 1 Andrey V. Elsukov freebsd_committer freebsd_triage 2015-05-27 10:32:05 UTC
Can you kldload hwpmc and look which functions are in use when you are doing the tests? (run pmcstat -TS instructions -w1)
You should see aesni_process() or swcr_process() in the top of list.
If you will see swcr_process(), this means aesni isn't used.
Comment 2 Andrey V. Elsukov freebsd_committer freebsd_triage 2015-05-27 10:35:41 UTC
Also, is there some errors in the IPSec stats?
# netstat -sp ipsec
# netstat -sp esp
Comment 3 olivier 2015-05-27 13:08:24 UTC
Ok, new test under FreeBSD 11.0-CURRENT #3 r283536 (Still generating 100 000 packets in 1000pps.)

Here is first line of pwmc output during the load (done on the "encrypter IPSec gateway side"):

PMC: [INSTR_RETIRED_ANY] Samples: 544 (100.0%) , 0 unresolved

%SAMP IMAGE      FUNCTION             CALLERS
  7.4 aesni.ko   aesni_encrypt_cbc    aesni_process
  4.2 kernel     cpu_search_highest   sched_idletd:2.6 cpu_search_highest:1.7
  2.8 kernel     spinlock_exit        intr_event_schedule_thread:1.1 handleevents:0.6
  2.4 kernel     uma_zalloc_arg       crypto_getreq:1.3 malloc:0.9
  2.4 libc.so.7  bsearch              0x63b4
  2.4 kernel     cpu_search_lowest    cpu_search_lowest:1.3 sched_pickcpu:1.1
  2.0 kernel     critical_exit        spinlock_exit:1.1 sched_idletd:0.6
  2.0 kernel     __rw_rlock           in_lltable_lookup:0.6 ip_input:0.6
  1.8 kernel     _rw_runlock_cookie   rtalloc1_fib
  1.8 kernel     igb_rxeof            igb_msix_que
  1.8 kernel     ip_output            ipsec_process_done
  1.7 kernel     spinlock_enter       thread_lock_flags_
  1.5 kernel     sched_switch         mi_switch
  1.3 kernel     key_allocsp          ipsec_getpolicybyaddr
  1.3 kernel     sched_pickcpu        sched_add
  1.1 kernel     rn_match             rtalloc1_fib
  1.1 kernel     bzero
  1.1 kernel     cpu_switch           mi_switch
  1.1 kernel     bounce_bus_dmamap_lo bus_dmamap_load_mbuf_sg
  1.1 pmcstat    0x63d3               bsearch


Now on the "decrypter IPSec gateway side" the netstat output:

[root@R3]~# netstat -sp ipsec
ipsec:
        0 inbound packets violated process security policy
        0 inbound packets failed due to insufficient memory
        0 invalid inbound packets
        0 outbound packets violated process security policy
        0 outbound packets with no SA available
        0 outbound packets failed due to insufficient memory
        0 outbound packets with no route available
        0 invalid outbound packets
        0 outbound packets with bundled SAs
        0 mbufs coalesced during clone
        0 clusters coalesced during clone
        0 clusters copied during clone
        0 mbufs inserted during makespace
[root@R3]~# netstat -sp esp
esp:
        0 packets shorter than header shows
        0 packets dropped; protocol family not supported
        0 packets dropped; no TDB
        0 packets dropped; bad KCR
        0 packets dropped; queue full
        0 packets dropped; no transform
        0 packets dropped; bad ilen
        0 replay counter wraps
        0 packets dropped; bad encryption detected
        0 packets dropped; bad authentication detected
        0 possible replay packets detected
        100000 packets in
        0 packets out
        0 packets dropped; invalid TDB
        54400000 bytes in
        0 bytes out
        0 packets dropped; larger than IP_MAXPACKET
        0 packets blocked due to policy
        0 crypto processing failures
        0 tunnel sanity check failures
        ESP output histogram:
                rijndael-cbc: 100000

=> No "Ipsec/esp" problem: IPsec packets are correctly generated.
But once decrypted, lot's of errors (too small, bad header, incorrect version number, etc…):

[root@R3]~# netstat -sp ip
ip:
        200145 total packets received
        0 bad header checksums
        0 with size smaller than minimum
        40 with data size < data length
        0 with ip length > max ip packet size
        19 with header length < data size
        0 with data length < header length
        1 with bad options
        818 with incorrect version number
        0 fragments received
        0 fragments dropped (dup or out of space)
        0 fragments dropped after timeout
        0 packets reassembled ok
        100145 packets for this host
        0 packets for unknown/unsupported protocol
        99122 packets forwarded (0 packets fast forwarded)
        0 packets not forwardable
        0 packets received for unknown multicast group
        0 redirects sent
        120 packets sent from this host
        0 packets sent with fabricated ip header
        0 output packets dropped due to no bufs, etc.
        0 output packets discarded due to no route
        0 output datagrams fragmented
        0 fragments created
        0 datagrams that can't be fragmented
        0 tunneling packets that can't find gif
        0 datagrams with bad address in header

=> On 100 000 IPSec packets received, ALL of them are correctly decrypted, but once decrypted their contends are corrupted.
Comment 4 Andrey V. Elsukov freebsd_committer freebsd_triage 2015-05-28 11:21:03 UTC
It is possible that IP packets were currupted before encryption or after decryption (encapsulation/decapsulation). Can you try similar test, but use IPSec transport mode?
You need to create gif or gre tunnel between srv1 and srv2. Packets from pkt-generator should be routed through this tunnel. You can test such configuration without encryption, then add SP and SA for this traffic and see how it will work :)
Comment 5 olivier 2015-05-28 12:27:50 UTC
If I unload aesni module on "encrypter" side, the problem disappear: Then how can the packet being corrupted after decryption ?

New test without aesni module loaded on the "encrypter side" (srv1), but still loaded on "decrypter side" (srv2):

Encrypter:

[root@srv1]~# kldstat
Id Refs Address            Size     Name
 1    8 0xffffffff80200000 17dc0f0  kernel
 2    1 0xffffffff81c11000 2dd6     ichsmb.ko
 3    1 0xffffffff81c14000 e7e      smbus.ko
 4    1 0xffffffff81c15000 2a16     coretemp.ko

Decrypter:

[root@srv2]~# kldstat
Id Refs Address            Size     Name
 1   11 0xffffffff80200000 17dc0f0  kernel
 2    1 0xffffffff81c11000 7fe8     aesni.ko
 3    1 0xffffffff81c19000 2dd6     ichsmb.ko
 4    1 0xffffffff81c1c000 e7e      smbus.ko
 5    1 0xffffffff81c1d000 2a16     coretemp.ko

Then, again, generating exactly 100 000 packets in a low-rate of 1000 paquet-per-second using netmap's pktgen crossing these 2 FreeBSD IPSec gateway.

Stat on "decrypter side" (srv2):
[root@srv2]~# sysctl dev.igb.2.mac_stats.rx_frames_512_1023
dev.igb.2.mac_stats.rx_frames_512_1023: 100000
[root@srv2]~# sysctl dev.igb.3.mac_stats.tx_frames_512_1023
dev.igb.3.mac_stats.tx_frames_512_1023: 100000

=> All packets are correctly decrypted AND forwarded

No more "bad ip packet" errors on decrypter side:
[root@srv2]~# netstat -ssp ip
ip:
        200064 total packets received
        100064 packets for this host
        100000 packets forwarded
        69 packets sent from this host

Then, should I still do a new test in Transport mode ?
Comment 6 olivier 2015-06-03 20:26:54 UTC
I've found a workaround for this problem (tested multiple times):

By removing automatic load of aesni module, either in /boot/loader.conf or in kld_list of rc.conf, and load it manually: Payload corruption of some IPSec packets disappear!
Comment 7 olivier 2015-06-10 23:21:56 UTC
With aesni compiled in the kernel, I've still packet corruption.
As conclusion: With early initiated aesni module, we've got some IPSec payload corruption.
You can found online a tcpdump (500MB) of encrypted capture:
http://dev.bsdrp.net/aesni-corrupt-ipsec.pcap

It's rijndael-cbc with pass "1234567890123456" for decoding it in wireshark.
Comment 8 olivier 2015-07-09 21:31:01 UTC
Commit r285289 fixes this problem.