I'm using mpd5 to run an L2TP Server on FreeBSD. This crash is currently happening at least once within 24 hours. At first I discovered this on the client side within pfSense (https://redmine.pfsense.org/issues/9058). Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x1c fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80b79f76 stack pointer = 0x28:0xfffffe0094a7ca60 frame pointer = 0x28:0xfffffe0094a7caa0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 369 (ng_queue1) trap number = 12 panic: page fault cpuid = 2 KDB: stack backtrace: #0 0xffffffff80b3d517 at kdb_backtrace+0x67 #1 0xffffffff80af6ab7 at vpanic+0x177 #2 0xffffffff80af6933 at panic+0x43 #3 0xffffffff80f7827f at trap_fatal+0x35f #4 0xffffffff80f782d9 at trap_pfault+0x49 #5 0xffffffff80f77aa7 at trap+0x2c7 #6 0xffffffff80f5803c at calltrap+0x8 #7 0xffffffff82835bf4 at ng_l2tp_seq_rack_timeout+0x164 #8 0xffffffff8282868d at ng_apply_item+0xcd #9 0xffffffff8282b084 at ngthread+0x1a4 #10 0xffffffff80aba033 at fork_exit+0x83 #11 0xffffffff80f58f6e at fork_trampoline+0xe Uptime: 12h56m8s Dumping 1124 out of 1999 MB:..2%..12%..22%..32%..42%..52%..62%..72%..82%..92% Dump complete
Seems likely this was fixed by https://svnweb.freebsd.org/changeset/base/353027
A commit references this bug: Author: markj Date: Fri Sep 25 18:55:50 UTC 2020 New revision: 366167 URL: https://svnweb.freebsd.org/changeset/base/366167 Log: ng_l2tp: Fix callout synchronization in the rexmit timeout handler A received control packet may cause the transmit queue to be flushed, in which case ng_l2tp_seq_recv_nr() cancels the transmit timeout handler. The handler checks to see if it was cancelled before doing anything, but did so before acquiring the node lock, so a small race window could cause ng_l2tp_seq_rack_timeout() to attempt to flush an empty queue, ultimately causing a null pointer dereference. PR: 241133 Reviewed by: bz, glebius, Lutz Donnerhacke MFC after: 3 days Sponsored by: Rubicon Communications, LLC (Netgate) Differential Revision: https://reviews.freebsd.org/D26548 Changes: head/sys/netgraph/ng_l2tp.c
It's great to see progress on this. Can someone point me to a howto or tutorial, which explains, what I need to do to apply this fix to my environment? I've hit this bug multiple times during the last month and want to verify, that it fixes the crash. I'm currently running FreeBSD 12.1-RELEASE-p8 GENERIC.
(In reply to zivillian from comment #3) You would have to either build a new copy of ng_l2tp.ko with the patches applied, or update to 12.2, which is still being finalized. The most recent fix is not yet in the 12 branch but I expect to merge in a few days so it'll appear in 12.2 (due to be released in about a month). If you want to try building ng_l2tp.ko from 12.1 sources, the steps are: $ svn checkout https://svn.freebsd.org/base/releng/12.1 /usr/src $ cd /usr/src $ svn merge -c 353027 ^/head . $ svn merge -c 366167 ^/head . $ cd sys/modules/l2tp $ make $ sudo make install This will install ng_l2tp.ko into /boot/modules. By default the system will use the copy in /boot/kernel, so either copy the new one over or ensure that ng_l2tp.ko gets loaded from /boot/modules (e.g., by kldload'ing it manually).
A commit references this bug: Author: markj Date: Mon Sep 28 11:46:04 UTC 2020 New revision: 366220 URL: https://svnweb.freebsd.org/changeset/base/366220 Log: MFC r366167: ng_l2tp: Fix callout synchronization in the rexmit timeout handler PR: 241133 Changes: _U stable/12/ stable/12/sys/netgraph/ng_l2tp.c
A commit references this bug: Author: markj Date: Mon Sep 28 11:48:20 UTC 2020 New revision: 366221 URL: https://svnweb.freebsd.org/changeset/base/366221 Log: MFC r366167: ng_l2tp: Fix callout synchronization in the rexmit timeout handler PR: 241133 Changes: _U stable/11/ stable/11/sys/netgraph/ng_l2tp.c
I will mark the bug as resolved for now. Please re-open if you are still able to trigger ng_l2tp panics with r353027 and r366167 applied.
A commit references this bug: Author: markj Date: Mon Sep 28 12:14:38 UTC 2020 New revision: 366223 URL: https://svnweb.freebsd.org/changeset/base/366223 Log: MFS r366220: MFC r366167: ng_l2tp: Fix callout synchronization in the rexmit timeout handler PR: 241133 Approved by: re (gjb) Changes: _U releng/12.2/ releng/12.2/sys/netgraph/ng_l2tp.c
Related patch: https://reviews.freebsd.org/D26586
After applying 353027 and 366167 it crashed again after 5 days.
(In reply to zivillian from comment #10) Thanks for testing. Could you try applying this patch on top of the others? https://people.freebsd.org/~markj/patches/ng_l2tp_debug.diff
I've applied the patch, and it crashed again after 6 days. Please let me know, if you need any dump or logs.
(In reply to zivillian from comment #12) Argh. Thanks for your patience. Would you be willing to provide the vmcore and debug symbols?
My vmcore is 30MB (gzipped), so I can't upload it here, do you have any preferred hoster? And I don't know where I can find the debug symbols (I've build the module with the commands from Comment 4, but there are no obvious files).
(In reply to zivillian from comment #14) I have no preference. Feel free to mail me a link privately if you prefer. Debug symbols should be under /usr/lib/debug/boot/*. A tarball of that is required.
Created attachment 218744 [details] patched l2tp mopdule /usr/lib/debug/boot only contains an empty folder kernel. Have I missed something? I've attached the module and file says it's not stripped, but I'm unsure if this is enough. The vmcore can be downloaded from https://we.tl/t-wPU1j9UOwm
It looks like pfsense has added a check to the latest code. https://github.com/pfsense/FreeBSD-src/blob/RELENG_2_5_2/sys/netgraph/ng_l2tp.c
I have also seen that and got a patch in progress: https://reviews.freebsd.org/D31476
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=89042ff77668555e77c88549e6ba697088ee72f9 commit 89042ff77668555e77c88549e6ba697088ee72f9 Author: Gleb Smirnoff <glebius@FreeBSD.org> AuthorDate: 2021-08-06 23:04:31 +0000 Commit: Gleb Smirnoff <glebius@FreeBSD.org> CommitDate: 2021-09-10 18:27:19 +0000 ng_l2tp: improve callout locking. Apparently e62e4b85942 wasn't enough to close the race between a queue being flushed by a packet and callout executing, because the callouts used without a lock aren't 100% bulletproof. To close the race use callout_init_mtx() for L2TP timers, and make sure that all calls to ng_callout()/ng_uncallout() are done under the seq lock. If used properly, a locked callout can be used transparently with old netgraph KPI of ng_callout/ng_uncallout which predates locked callouts. While here, utilize ng_uncallout_drain() instead of ng_uncallout() on the node shutdown. PR: 241133 Reviewed by: mjg, markj Differential Revision: https://reviews.freebsd.org/D31476 sys/netgraph/ng_l2tp.c | 29 +++++++++++------------------ 1 file changed, 11 insertions(+), 18 deletions(-)
*** Bug 259894 has been marked as a duplicate of this bug. ***
Created attachment 230123 [details] the fix for stable/12 For users of FreeBSD 12.x here is backport of fixes by glebius@ to stable/12 reviewed by him and slightly tested by me. They can be merged to stable/13 automatically but require manual merge to stable/12, to I attach the patch. ng_l2tp.ko kernel module must be rebuilt and reinstalled for those who use the module; rebuild the kernel if you include options NETGRAPH_L2TP to custom kernel.
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=42301a9db1a4e877cb8de8e2d30d62cff09d60f9 commit 42301a9db1a4e877cb8de8e2d30d62cff09d60f9 Author: Gleb Smirnoff <glebius@FreeBSD.org> AuthorDate: 2021-08-06 22:49:51 +0000 Commit: Eugene Grosbein <eugen@FreeBSD.org> CommitDate: 2021-12-19 18:21:38 +0000 ng_l2tp: improve seq structure locking. PR: 241133 Reviewed by: mjg, markj Differential Revision: https://reviews.freebsd.org/D31476 Author: glebius (cherry picked from commit 0a76c63dd4987d8f7af37fe93569ce8a020cf43e) (cherry picked from commit 89042ff77668555e77c88549e6ba697088ee72f9) (cherry picked from commit ae04d30451056f16096cba7d8debcb15dac275d7) sys/netgraph/ng_l2tp.c | 221 +++++++++++++++++++++++-------------------------- 1 file changed, 102 insertions(+), 119 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=8b167e14837950ab2a2a5aae834932c22752c147 commit 8b167e14837950ab2a2a5aae834932c22752c147 Author: Eugene Grosbein <eugen@FreeBSD.org> AuthorDate: 2021-12-19 18:28:37 +0000 Commit: Eugene Grosbein <eugen@FreeBSD.org> CommitDate: 2021-12-19 18:28:37 +0000 ng_l2tp: improve seq structure locking. This is direct commit to stable/12 due to different code base. PR: 241133 Reviewed by: glebius (author) Differential Revision: https://reviews.freebsd.org/D31476 sys/netgraph/ng_l2tp.c | 219 ++++++++++++++++++++++--------------------------- 1 file changed, 96 insertions(+), 123 deletions(-)
Merged downto stable/12 and tested with 12.3-STABLE server that previously suffered with this problem. Now it is all stable.
MARKED AS SPAM