Bug 163370 - [bpf] [request] enable zero-copy BPF by default
Summary: [bpf] [request] enable zero-copy BPF by default
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks: 164534 203827
  Show dependency treegraph
 
Reported: 2011-12-17 00:00 UTC by guy
Modified: 2022-11-22 23:15 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description guy 2011-12-17 00:00:26 UTC
In revision 1.207 of sys/net/bpf.c, the kernel bpf_zerocopy_enable variable
was changed to initialize it to 0 rather than 1.  The comment was:

    Disable zerocopy by default for now.  It's causing some problems in pcap
    consumers which fork after the shared pages have been setup.  pflogd(8)
    is an example.  The problem is understood and there is a fix coming in
    shortly.

    Folks who want to continue using it can do so by setting

    net.bpf.zerocopy_enable

    to 1.

However, as of the current top-of-trunk, it's still disabled by default;
no fix has come in.

Back in 2009, Christian Peron sent me (as I'm one of the libpcap core
developers at tcpdump.org) a mail message saying:

  Ran into a bit of an issue with zerocopy bpf.  We have enabled zerocopy
  by default in -CURRENT in hopes of shaking out some bugs. We found
  issues for processes which fork.  An example would be the privsep
  requirements in pflogd.  The problem was easy enough to fix but it
  points to an issue with it being enabled for everyone un-conditionally.

  In some cases, it makes sense to have zerocopy while in others not so
  much.  So I am thinking about adding a pcap_set_bufmode() so if the
  application writers are aware, they can specify which buffer mode they
  prefer instead of having libpcap query the operating system directly.

  I guess I have two questions:

  (1) Is there anything in the existing API I can use to specify machine
     dependent options?

  (2) If not, are you ok with having a pcap_set_bufmode() or some other
     similar function?

  I would be interested in hearing your thoughts on this.

and, in response to a reply from me, said:

  If net.bpf.zerocopy_enable is set to 1 (which on current at least it is
  by default) libpcap will make use of zerocopy.  It checks for this via
  ioctl (i.e. the kernel inspect this variable to tell pcap whether
  or not zerocopy is enabled).  The problem is when an application
  initializes the pages and then forks.  This results in the pages being
  copied into the child which breaks zerocopy.  An example of where this
  is an issue is the privsep code in pflogd.

  Disabling net.bpf.zerocopy_enable fixes the problem however it means
  things like tcpdump cant take advantage of it.

  Calling minherit(INHERIT_SHARE) will fix this problem, however I am not
  sure I want to unconditionally do this.  If a child process does not want
  these pages mmaped, it has no way of knowing which pages to un-map,
  unlike closing un-wanted file descriptors. i.e. if a process forks to do
  a dns lookup, these pages would appear in the process as an example.

  So I was thinking about introducing three buffer modes the
  application can specify:

  PCAP_BUFMODE_BUFFER       - regular old buffer mode
  PCAP_BUFMODE_ZBUF         - zerocopy buffers without page inheritance
  PCAP_BUFMODE_ZBUF_INHERIT - zerocopy calling minherit so pages can be shared
                             across forks

  The only problem is pcap_open_live() calls pcap_activate() directly, so I am
  not sure how we could process a flag from the application after the pcap
  object is created, but before we call pcap_activate().

  It would be nice if the applications were explicit, so if they use
  PCAP_BUFMODE_ZBUF_INHERIT, they are aware of it and can operate with
  caution.  Instead of libpcap operating behind the scenes and changing
  the page inheritance policy for the application.

  So I was wondering if you had any ideas on approach.

In response to the comment about minherit(), I said:

  Why would that be an issue?  Why would there be a problem leaving those
  pages in the child's address space?

  libpcap changes a bunch of state when it opens a capture device - it
  gets a file descriptor, it mallocs some memory, and it might memory-map
  some stuff from the kernel.  There's currently no way of releasing that
  without closing the pcap_t, but that would cause libpcap to attempt to,
  for example, turn monitor mode off if monitor mode was turned off.

but I never got a response.

If zero-copy BPF is more efficient than non-zero-copy BPF, I would want
the default mode in libpcap to be zero-copy; applications should, by
default, get the best behavior out of the packet-capture mechanism,
and should not have to know about *any* of the details of how libpcap
uses that mechanism.

It sounds from "The problem is when an application initializes the pages
and then forks.  This results in the pages being copied into the child
which breaks zerocopy." and "Calling minherit(INHERIT_SHARE) will fix
this problem" as if the "best" mode for most programs involves sharing
the pages across forks.  Programs that don't fork won't care, I presume
programs that fork and exec won't care as all the pages including the
zerocopy-buffer pages would be unmapped, and programs that *do* fork won't
cause copies that will break BPF (I'm assuming from "being copied into
the child" that the default behavior is INHERIT_COPY and that this breaks
BPF, presumably because, in the child, the mapped region *isn't* shared
by the kernel and userland and thus doesn't deliver packets).

The only disadvantage appears to be that, if the child doesn't want to
use the pcap_t (or only uses it for injecting packets), it still has
pages mapped into its address space.  If that's a real problem, libpcap
could add a pcap_close_child() function, or something such as that, which
closes file descriptors etc. but does *not* do any of the manual mode
cleanup, such as turning monitor mode off on *BSD and Linux without
mac80211, deleting the monN device on Linux with mac80211, etc..  (Ideally,
there would be ways of requesting monitor mode that work similarly to
requesting promiscuous mode, so that monitor mode is on as long as at
least one BPF/PF_PACKET socket/etc. descriptor that wants monitor mode
is open and is turned off when the last such descriptor is closed.)

How-To-Repeat: Code inspection.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2012-01-16 01:32:37 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

reclassify.
Comment 2 Christian S.J. Peron freebsd_committer freebsd_triage 2012-01-21 05:54:39 UTC
Responsible Changed
From-To: freebsd-net->csjp

Take, I will follow up with Guy
Comment 3 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:58:36 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped