Bug 241133 - page fault in ng_l2tp_seq_rack_timeout
Summary: page fault in ng_l2tp_seq_rack_timeout
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.2-RELEASE
Hardware: amd64 Any
: --- Affects Many People
Assignee: Mark Johnston
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2019-10-08 15:53 UTC by zivillian
Modified: 2020-10-14 15:54 UTC (History)
3 users (show)

See Also:


Attachments
patched l2tp mopdule (31.46 KB, application/x-object)
2020-10-14 15:54 UTC, zivillian
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description zivillian 2019-10-08 15:53:46 UTC
I'm using mpd5 to run an L2TP Server on FreeBSD. This crash is currently happening at least once within 24 hours.

At first I discovered this on the client side within pfSense (https://redmine.pfsense.org/issues/9058).

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address      = 0x1c
fault code         = supervisor read data, page not present
instruction pointer        = 0x20:0xffffffff80b79f76
stack pointer              = 0x28:0xfffffe0094a7ca60
frame pointer              = 0x28:0xfffffe0094a7caa0
code segment               = base rx0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags   = interrupt enabled, resume, IOPL = 0
current process            = 369 (ng_queue1)
trap number                = 12
panic: page fault
cpuid = 2
KDB: stack backtrace:
#0 0xffffffff80b3d517 at kdb_backtrace+0x67
#1 0xffffffff80af6ab7 at vpanic+0x177
#2 0xffffffff80af6933 at panic+0x43
#3 0xffffffff80f7827f at trap_fatal+0x35f
#4 0xffffffff80f782d9 at trap_pfault+0x49
#5 0xffffffff80f77aa7 at trap+0x2c7
#6 0xffffffff80f5803c at calltrap+0x8
#7 0xffffffff82835bf4 at ng_l2tp_seq_rack_timeout+0x164
#8 0xffffffff8282868d at ng_apply_item+0xcd
#9 0xffffffff8282b084 at ngthread+0x1a4
#10 0xffffffff80aba033 at fork_exit+0x83
#11 0xffffffff80f58f6e at fork_trampoline+0xe
Uptime: 12h56m8s
Dumping 1124 out of 1999 MB:..2%..12%..22%..32%..42%..52%..62%..72%..82%..92%
Dump complete
Comment 1 Mark Johnston freebsd_committer 2020-09-24 18:09:09 UTC
Seems likely this was fixed by https://svnweb.freebsd.org/changeset/base/353027
Comment 2 commit-hook freebsd_committer 2020-09-25 18:56:08 UTC
A commit references this bug:

Author: markj
Date: Fri Sep 25 18:55:50 UTC 2020
New revision: 366167
URL: https://svnweb.freebsd.org/changeset/base/366167

Log:
  ng_l2tp: Fix callout synchronization in the rexmit timeout handler

  A received control packet may cause the transmit queue to be flushed, in
  which case ng_l2tp_seq_recv_nr() cancels the transmit timeout handler.
  The handler checks to see if it was cancelled before doing anything, but
  did so before acquiring the node lock, so a small race window could
  cause ng_l2tp_seq_rack_timeout() to attempt to flush an empty queue,
  ultimately causing a null pointer dereference.

  PR:		241133
  Reviewed by:	bz, glebius, Lutz Donnerhacke
  MFC after:	3 days
  Sponsored by:	Rubicon Communications, LLC (Netgate)
  Differential Revision:	https://reviews.freebsd.org/D26548

Changes:
  head/sys/netgraph/ng_l2tp.c
Comment 3 zivillian 2020-09-25 19:25:23 UTC
It's great to see progress on this. Can someone point me to a howto or tutorial, which explains, what I need to do to apply this fix to my environment? I've hit this bug multiple times during the last month and want to verify, that it fixes the crash.

I'm currently running FreeBSD 12.1-RELEASE-p8 GENERIC.
Comment 4 Mark Johnston freebsd_committer 2020-09-25 20:18:48 UTC
(In reply to zivillian from comment #3)
You would have to either build a new copy of ng_l2tp.ko with the patches applied, or update to 12.2, which is still being finalized.  The most recent fix is not yet in the 12 branch but I expect to merge in a few days so it'll appear in 12.2 (due to be released in about a month).

If you want to try building ng_l2tp.ko from 12.1 sources, the steps are:

$ svn checkout https://svn.freebsd.org/base/releng/12.1 /usr/src
$ cd /usr/src
$ svn merge -c 353027 ^/head .
$ svn merge -c 366167 ^/head .
$ cd sys/modules/l2tp
$ make
$ sudo make install

This will install ng_l2tp.ko into /boot/modules.  By default the system will use the copy in /boot/kernel, so either copy the new one over or ensure that ng_l2tp.ko gets loaded from /boot/modules (e.g., by kldload'ing it manually).
Comment 5 commit-hook freebsd_committer 2020-09-28 11:46:24 UTC
A commit references this bug:

Author: markj
Date: Mon Sep 28 11:46:04 UTC 2020
New revision: 366220
URL: https://svnweb.freebsd.org/changeset/base/366220

Log:
  MFC r366167:
  ng_l2tp: Fix callout synchronization in the rexmit timeout handler

  PR:	241133

Changes:
_U  stable/12/
  stable/12/sys/netgraph/ng_l2tp.c
Comment 6 commit-hook freebsd_committer 2020-09-28 11:48:27 UTC
A commit references this bug:

Author: markj
Date: Mon Sep 28 11:48:20 UTC 2020
New revision: 366221
URL: https://svnweb.freebsd.org/changeset/base/366221

Log:
  MFC r366167:
  ng_l2tp: Fix callout synchronization in the rexmit timeout handler

  PR:	241133

Changes:
_U  stable/11/
  stable/11/sys/netgraph/ng_l2tp.c
Comment 7 Mark Johnston freebsd_committer 2020-09-28 11:53:29 UTC
I will mark the bug as resolved for now.  Please re-open if you are still able to trigger ng_l2tp panics with r353027 and r366167 applied.
Comment 8 commit-hook freebsd_committer 2020-09-28 12:15:32 UTC
A commit references this bug:

Author: markj
Date: Mon Sep 28 12:14:38 UTC 2020
New revision: 366223
URL: https://svnweb.freebsd.org/changeset/base/366223

Log:
  MFS r366220:
  MFC r366167:
  ng_l2tp: Fix callout synchronization in the rexmit timeout handler

  PR:		241133
  Approved by:	re (gjb)

Changes:
_U  releng/12.2/
  releng/12.2/sys/netgraph/ng_l2tp.c
Comment 9 Mark Johnston freebsd_committer 2020-09-30 17:58:35 UTC
Related patch: https://reviews.freebsd.org/D26586
Comment 10 zivillian 2020-10-05 08:16:56 UTC
After applying 353027 and 366167 it crashed again after 5 days.
Comment 11 Mark Johnston freebsd_committer 2020-10-06 13:30:44 UTC
(In reply to zivillian from comment #10)
Thanks for testing.  Could you try applying this patch on top of the others? https://people.freebsd.org/~markj/patches/ng_l2tp_debug.diff
Comment 12 zivillian 2020-10-13 19:19:40 UTC
I've applied the patch, and it crashed again after 6 days. Please let me know, if you need any dump or logs.
Comment 13 Mark Johnston freebsd_committer 2020-10-13 19:29:21 UTC
(In reply to zivillian from comment #12)
Argh.  Thanks for your patience.  Would you be willing to provide the vmcore and debug symbols?
Comment 14 zivillian 2020-10-13 20:11:59 UTC
My vmcore is 30MB (gzipped), so I can't upload it here, do you have any preferred hoster? And I don't know where I can find the debug symbols (I've build the module with the commands from Comment 4, but there are no obvious files).
Comment 15 Mark Johnston freebsd_committer 2020-10-13 20:38:08 UTC
(In reply to zivillian from comment #14)
I have no preference.  Feel free to mail me a link privately if you prefer.

Debug symbols should be under /usr/lib/debug/boot/*.  A tarball of that is required.
Comment 16 zivillian 2020-10-14 15:54:39 UTC
Created attachment 218744 [details]
patched l2tp mopdule

/usr/lib/debug/boot only contains an empty folder kernel. Have I missed something?

I've attached the module and file says it's not stripped, but I'm unsure if this is enough.
The vmcore can be downloaded from https://we.tl/t-wPU1j9UOwm