Bug 242606 - Low capacity of Variable "IPSEC_MANUAL_REQID_MAX" crashes StrongSwan IPSec/IKEV2 VPN Server
Summary: Low capacity of Variable "IPSEC_MANUAL_REQID_MAX" crashes StrongSwan IPSec/IK...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.2-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL: https://wiki.strongswan.org/issues/2315
Keywords:
Depends on:
Blocks:
 
Reported: 2019-12-12 16:09 UTC by Geovane
Modified: 2020-10-13 08:15 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Geovane 2019-12-12 16:09:30 UTC
Hi,

We have a IPSec/IKEV2 Server running in PFSense 2.4.4-RELEASE-p3 (amd64).
The VPN server serves an average of 40 concurrent mobile clients.
Each phase 1 tunnel created has three phase 2 tunnels.
When the "reqid" variable reaches the value "16384", the "trap not found" error logged in the logs below occurs and users can connect but cannot traffic over the VPN.
In my environment this value is reached approximately every 30 days.
To resolve the issue, I need to stop the VPN service and start it again for the variable to be reset.

Logs samples:

Aug 18 20:12:10 vpn2 charon: 02[KNL] creating acquire job for policy serverIP/32|/0 === clientIP/32|/0 with reqid {16384}
Aug 18 20:12:10 vpn2 charon: 13[CFG] trap not found, unable to acquire reqid 16384

Dec 11 11:34:34 vpn2 charon: 14[KNL] creating acquire job for policy serverIP/32|/0 === clientIP/32|/0 with reqid {16384}
Dec 11 11:34:34 vpn2 charon: 01[CFG] trap not found, unable to acquire reqid 16384

Strongswan developer response:

That because of IPSEC_MANUAL_REQID_MAX (0x3fff == 16383), file "include/uapi/linux/ipsec.h". Which is a strangely low limit (at least for keying daemons like strongSwan that manage reqids themselves) since reqids are 32-bit numbers.

reqids are currently allocated sequentially using a sttic counter (source:src/libcharon/kernel/kernel_interface.c#L328). The code that allocates them does not know anything about the limit above (it doesn't even know or care that it runs on a FreeBSD kernel).

My report:
https://forum.netgate.com/topic/148857/ipsec-ikev2-error-trap-not-found-unable-to-acquire-reqid

Others reports:

https://wiki.strongswan.org/issues/2315
https://lists.strongswan.org/pipermail/dev/2018-August/001929.html
Comment 1 crest 2019-12-12 16:23:13 UTC
(In reply to Geovane from comment #0)

FreeBSD already contains a suitable allocator in "sys/kern/subr_unit.c".
Comment 2 Eugene Grosbein freebsd_committer freebsd_triage 2019-12-13 15:31:23 UTC
Andrey, can you comment this?
Comment 3 Geovane 2019-12-13 15:40:05 UTC
(In reply to crest from comment #1)

Hi, 

The StrongSwan team answered: "Which doesn't seem related to the issue. Probably someone replied to the wrong email thread."

Thanks
Comment 4 Conrad Meyer freebsd_committer freebsd_triage 2019-12-14 05:48:33 UTC
IPSEC_MANUAL_REQID_MAX is not FreeBSD-specific; it is also 0x3fff on Linux.
Comment 5 Conrad Meyer freebsd_committer freebsd_triage 2019-12-14 06:00:06 UTC
Anyway, the comment in the header is clear enough: REQIDs over 0x3fff are reserved for the kernel.  Linux uses this range for the kernel as well (see net/key/af_key.c#L1915, gen_reqid()). They simply ignore bogus user requests for higher numbers:

https://github.com/torvalds/linux/blob/master/net/key/af_key.c#L1959

		if (t->reqid > IPSEC_MANUAL_REQID_MAX)
			t->reqid = 0;
Comment 6 Conrad Meyer freebsd_committer freebsd_triage 2019-12-14 06:08:14 UTC
In fact, FreeBSD does something similar, but produces a warning first (ipseclog LOG_DEBUG, "reqid=%d range violation, updated by kernel").  That code is present since 2002.  I can't tell if libcharon is broken on Linux and merely doesn't observe it there, or if it's just poorly designed.  I don't know if pfsense has any modifications to FreeBSD in this area that might be relevant.  Can you reproduce the problem on FreeBSD, or just pfsense?
Comment 7 Geovane 2019-12-16 13:34:13 UTC
(In reply to Conrad Meyer from comment #6)

Hi Conrad,

Unfortunately, in our environment we have only one PFSense VPN server with enough demand to reach the 16k limit of the "reqid" variable.

It seems the StrongSwan team is working on a variable reuse solution after my report:

https://wiki.strongswan.org/issues/2315

Thnak you.


Geovane
Comment 8 Andrey V. Elsukov freebsd_committer freebsd_triage 2020-10-13 07:30:20 UTC
It seems the problem is fixed since Strongswan 5.8.3.
Comment 9 Eugene Grosbein freebsd_committer freebsd_triage 2020-10-13 08:15:49 UTC
Fixed in StrongSwan, see https://wiki.strongswan.org/issues/2315