Bug 206053 - kqueue support code of netmap causes panic
Summary: kqueue support code of netmap causes panic
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.2-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: David Bright
Keywords: crash
Depends on:
Reported: 2016-01-09 04:30 UTC by Tiwei Bie
Modified: 2018-11-30 02:38 UTC (History)
6 users (show)

See Also:
dab: mfc-stable12+
dab: mfc-stable11+
dab: mfc-stable10-


Note You need to log in before you can comment on or make changes to this bug.
Description Tiwei Bie 2016-01-09 04:30:45 UTC
It seems that there is a bug in kqueue support code of netmap which can cause kernel panic. And I can reproduce the kernel panic with this simple program on -HEAD:

#include <stdio.h>
#include <assert.h>

#include <sys/types.h>
#include <sys/event.h>
#include <sys/time.h>

#include <net/netmap_user.h>

static int
kq_add(int kq, int fd)
	struct kevent changes[1];
	int ret;

	EV_SET(&changes[0], fd, EVFILT_READ, EV_ADD, 0, 0, NULL);
	ret = kevent(kq, changes, 1, NULL, 0, NULL);
	assert(ret != -1);

	printf("[%s] success\n", __func__);

	return (ret);

static void
kq_wait(int kq)
	struct kevent events[1];
	int ret;

	ret = kevent(kq, NULL, 0, events, 1, NULL);
	assert(ret != -1);

	printf("[%s] success\n", __func__);

int main(void)
	const char *ifname = "vale0:vm1";
	struct nm_desc *d;
	int kq;

	d = nm_open(ifname, NULL, 0, 0);
	assert(d != NULL);

	kq = kqueue();
	assert(kq != -1);

	kq_add(kq, d->fd);

	return (0);

And below is part of the crash summary:

freebsd dumped core - see /var/crash/vmcore.0

Fri Jan  8 22:19:39 CST 2016

FreeBSD freebsd 11.0-CURRENT FreeBSD 11.0-CURRENT #2 d1f6105(master)-dirty: Fri Jan  8 21:55:33 CST 2016     btw@freebsd:/usr/obj/root/freebsd/sys/GENERIC  amd64


panic: mutex nm_kn_lock not owned at /root/freebsd/sys/kern/kern_event.c:2073
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe011744c420
vpanic() at vpanic+0x182/frame 0xfffffe011744c4a0
panic() at panic+0x43/frame 0xfffffe011744c500
__mtx_assert() at __mtx_assert+0xbf/frame 0xfffffe011744c510
knlist_add() at knlist_add+0x20/frame 0xfffffe011744c540
netmap_kqfilter() at netmap_kqfilter+0x101/frame 0xfffffe011744c580
devfs_kqfilter_f() at devfs_kqfilter_f+0x81/frame 0xfffffe011744c5d0
kqueue_register() at kqueue_register+0x5bf/frame 0xfffffe011744c670
kqueue_kevent() at kqueue_kevent+0xc8/frame 0xfffffe011744c840
kern_kevent_fp() at kern_kevent_fp+0x99/frame 0xfffffe011744c890
kern_kevent() at kern_kevent+0x9f/frame 0xfffffe011744c8f0
sys_kevent() at sys_kevent+0x11c/frame 0xfffffe011744c9a0
amd64_syscall() at amd64_syscall+0x2db/frame 0xfffffe011744cab0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe011744cab0
--- syscall (363, FreeBSD ELF64, sys_kevent), rip = 0x80095bc7a, rsp = 0x7fffffffe988, rbp = 0x7fffffffe9e0 ---
KDB: enter: panic

Comment 1 Sean Bruno freebsd_committer 2016-01-10 18:50:19 UTC
I don't know enough to verify as a real bug, adding some folks who might have a grasp of what's going on here.
Comment 2 Konstantin Belousov freebsd_committer 2016-01-10 19:22:43 UTC
(In reply to Sean Bruno from comment #1)
I think replacing 1 with 0 in the sys/dev/netmap/netmap_freebsd.c:netmap_kqfilter(), the call to knlist_add() would fix the issue at hand.

But the panic probably means that netmap authors never ever run the code with INVARIANTS and WITNESS.

Also, there are other comments in the netmap_freebsd.c, esp. about 'not even munmap() on close()' which are, together with conslusions made, at least strange.
Comment 3 David Bright freebsd_committer 2018-07-31 01:50:50 UTC
Logged as netmap issue #521: https://github.com/luigirizzo/netmap/issues/521
Comment 4 David Bright freebsd_committer 2018-07-31 01:55:23 UTC
(In reply to Konstantin Belousov from comment #2)

I've verified that the change suggested by kib@ does, indeed, avoid the panic. I've submitted a pull request upstream to make that change. I don't know if we want a local fix in the meantime or not. Adding mmacy@ for an opinion as the commit log indicates he has been syncing with upstream recently.
Comment 5 David Bright freebsd_committer 2018-08-21 18:27:37 UTC
This bug has been fixed in upstream (netmap PR #520).
Comment 6 David Bright freebsd_committer 2018-11-30 02:05:33 UTC
(In reply to David Bright from comment #5)

This fix was (accidentally) committed in base r337812 and incorporated into the upcoming 12.0-RELEASE; I just noticed that now when I went to MFC that commit to stable/11. Noting the commit now, considerably after the fact. Also, flagging this as MFC-ed to stable/12 as it actually was committed before 12.0-RELEASE.
Comment 7 commit-hook freebsd_committer 2018-11-30 02:07:28 UTC
A commit references this bug:

Author: dab
Date: Fri Nov 30 02:06:32 UTC 2018
New revision: 341275
URL: https://svnweb.freebsd.org/changeset/base/341275

  MFC r337812,r337814,r337820,r341068:

  Fix several memory leaks (r337812 & r337814).

  The libkqueue tests have several places that leak memory by using an
  idiom like:


  Rework to save the pointer returned from kevent_to_str() and then
  free() it after it has been used.

  r337812 also fixed a bug in the netmap kevent code. The inclusion of
  that fix was an oversight that I didn't notice until this
  MFC. Reference the code review and PR here in the MFC for

  r337820 & r341068 were white-space only changes as a follow-up to
  r337812 & r337814:

  After r337820, which "corrected" some spaces-instead-of-tab whitespace
  issues in the libkqueue tests, jmg@ pointed out that these files were
  originally space-based, not tab-spaced, and so the correction should
  have been to get rid of the tabs that had been introduced in previous
  changes, not the spaces. This change does that. This is a whitespace
  only change; no functional change is intended.

  PR:  206053
  Differential Revision:    https://reviews.freebsd.org/D16531
  Sponsored by:	Dell EMC Isilon

_U  stable/11/
Comment 8 David Bright freebsd_committer 2018-11-30 02:12:15 UTC
I'm going to close this as fixed in 11 & 12. I don't intend to try to MFC to 10.