Bug 200323 - BPF userland misuse can crash the system
Summary: BPF userland misuse can crash the system
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.1-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL: https://reviews.freebsd.org/D2828
Keywords: needs-qa, patch
Depends on:
Blocks:
 
Reported: 2015-05-19 08:44 UTC by Ermal Luçi
Modified: 2015-08-17 19:08 UTC (History)
9 users (show)

See Also:
op: mfc-stable10+


Attachments
Misbehaving proxy arp daemon on bpf (13.12 KB, text/plain)
2015-05-19 08:44 UTC, Ermal Luçi
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ermal Luçi 2015-05-19 08:44:24 UTC
Created attachment 156930 [details]
Misbehaving proxy arp daemon on bpf

A bpf daemon can crash FreeBSD 10 system if it wrongly behaves.

After analysis seems that a userland process opening a blocking read descriptor and never reading from it will consume the BPF buffers in kernel eventually leading to a crash on the system.
This happens with bpf zerocopy enabled/disabled.

Attached is a software used for proxy arp through BPF that caused the misbehaviour on the system.
Depending on the load, traffic moving through the system this issue could be reproduced several times a day or even needing up to 4 days to reproduce.
Comment 1 Ermal Luçi 2015-06-13 19:39:59 UTC
This patch fixes the issue and the issue seems to a locked LLE which does not allow BPF to sleep when it needs to.

+diff --git a/sys/netinet/if_ether.c b/sys/netinet/if_ether.c
+index baa9c26..f31576d 100644
+--- a/sys/netinet/if_ether.c
++++ b/sys/netinet/if_ether.c
+@@ -353,6 +353,10 @@ retry:
+ 	if ((la->la_flags & LLE_VALID) &&
+ 	    ((la->la_flags & LLE_STATIC) || la->la_expire > time_uptime)) {
+ 		bcopy(&la->ll_addr, desten, ifp->if_addrlen);
++		if (flags & LLE_EXCLUSIVE)
++			LLE_WUNLOCK(la);
++		else
++			LLE_RUNLOCK(la);
+ 		/*
+ 		 * If entry has an expiry time and it is approaching,
+ 		 * see if we need to send an ARP request within this
+@@ -365,8 +369,7 @@ retry:
+ 		}
+ 
+ 		*lle = la;
+-		error = 0;
+-		goto done;
++		return (0);
+ 	}
+ 
+ 	if (la->la_flags & LLE_STATIC) {   /* should not happen! */
Comment 2 Andrey V. Elsukov freebsd_committer freebsd_triage 2015-06-15 12:06:50 UTC
Taking into account your description, this patch looks very strange. Can you show some core.txt.N files from described panics?
Comment 3 Ermal Luçi 2015-06-15 14:36:04 UTC
Here is a patch against HEAD of FreeBSD.
Also take a look at this link for a trace https://redmine.pfsense.org/issues/4685

If its ok with you Andrey i would like to commit this one.


diff --git a/sys/netinet/if_ether.c b/sys/netinet/if_ether.c
index fec6aa0..91e9568 100644
--- a/sys/netinet/if_ether.c
+++ b/sys/netinet/if_ether.c
@@ -364,6 +364,10 @@ retry:
        if ((la->la_flags & LLE_VALID) &&
            ((la->la_flags & LLE_STATIC) || la->la_expire > time_uptime)) {
                bcopy(&la->ll_addr, desten, ifp->if_addrlen);
+               if (flags & LLE_EXCLUSIVE)
+                       LLE_WUNLOCK(la);
+               else
+                       LLE_RUNLOCK(la);
                /*
                 * If entry has an expiry time and it is approaching,
                 * see if we need to send an ARP request within this
@@ -377,8 +381,7 @@ retry:
 
                if (pflags != NULL)
                        *pflags = la->la_flags;
-               error = 0;
-               goto done;
+               return (0);
        }
 
        if (la->la_flags & LLE_STATIC) {   /* should not happen! */
Comment 4 Franco Fichtner 2015-06-15 14:58:59 UTC
la is being read and modified after unlock with the attached patch.  Though it looks like arprequest() may indeed work as expected without the lock held.  Not sure if unlock/lock around arprequest is advisable; there may be a more fundamental issue with the code block itself?
Comment 6 Ermal Luçi 2015-06-15 15:47:44 UTC
(In reply to Franco Fichtner from comment #4)
Yeah but its a very quick read to me which does not create any race of sort from what i could tell.
For sure la will be there when the fields are accessed.
Comment 7 Franco Fichtner 2015-06-15 16:17:08 UTC
You read a value from a lock-protected entity, you acquire a (read) lock.  You write a value to a lock-protected entity, you acquire a (write) lock.  After the patch, there is neither.  Do you want to risk pushing a regression into production code?
Comment 8 Ermal Luçi 2015-06-15 16:22:21 UTC
(In reply to Franco Fichtner from comment #7)
Oh you refer to the decrement of the la preempt value.
Yeah but i highly dislike unlock and relock again code paths, that can be a solution.
Or moving this to atomics or even to a taskqueue might be better than this.

Still since the entry is just going to be updated or removed shortly after timeout/reply i am not sure the implications if any there.
Comment 9 Franco Fichtner 2015-06-15 16:33:14 UTC
The previous code in place before the "bad" revision in 2008 accessed la, made its changes and called arprequest() after releasing the lock, like the other code block still does.  You can maybe set a temporary variable inside the locked area that calls arprequest() and afterwards trigger the function based on the value of the temporary variable?
Comment 10 Kubilay Kocak freebsd_committer freebsd_triage 2015-06-15 16:49:53 UTC
Phabricator (reviews.freebsd.org) looks like the perfect place for this changeset/discussion, and it is open for !committer registrations and accounts :)
Comment 11 Ermal Luçi 2015-06-15 16:51:17 UTC
(In reply to Kubilay Kocak from comment #10)
Ok i am posting this to phabricator since i am a freebsd developer :)

Just for the reference here is another iteration of the patch

diff --git a/sys/netinet/if_ether.c b/sys/netinet/if_ether.c
index fec6aa0..aa370d6 100644
--- a/sys/netinet/if_ether.c
+++ b/sys/netinet/if_ether.c
@@ -364,6 +364,7 @@ retry:
        if ((la->la_flags & LLE_VALID) &&
            ((la->la_flags & LLE_STATIC) || la->la_expire > time_uptime)) {
                bcopy(&la->ll_addr, desten, ifp->if_addrlen);
+               renew = 0;
                /*
                 * If entry has an expiry time and it is approaching,
                 * see if we need to send an ARP request within this
@@ -371,14 +372,22 @@ retry:
                 */
                if (!(la->la_flags & LLE_STATIC) &&
                    time_uptime + la->la_preempt > la->la_expire) {
-                       arprequest(ifp, NULL, &SIN(dst)->sin_addr, NULL);
+                       renew = 1;
                        la->la_preempt--;
                }
 
                if (pflags != NULL)
                        *pflags = la->la_flags;
-               error = 0;
-               goto done;
+
+               if (flags & LLE_EXCLUSIVE)
+                       LLE_WUNLOCK(la);
+               else
+                       LLE_RUNLOCK(la);
+
+               if (renew == 1)
+                       arprequest(ifp, NULL, &SIN(dst)->sin_addr, NULL);
+
+               return (0);
        }
 
        if (la->la_flags & LLE_STATIC) {   /* should not happen! */
Comment 12 Ermal Luçi 2015-06-15 17:00:07 UTC
Moved to https://reviews.freebsd.org/D2828
Comment 13 Franco Fichtner 2015-06-15 17:35:07 UTC
Looks good now, thanks Ermal. :)
Comment 14 commit-hook freebsd_committer freebsd_triage 2015-06-17 12:42:01 UTC
A commit references this bug:

Author: eri
Date: Wed Jun 17 12:23:05 UTC 2015
New revision: 284512
URL: https://svnweb.freebsd.org/changeset/base/284512

Log:
  If there is a system with a bpf consumer running and a packet is wanted
  to be transmitted but the arp cache entry expired, which triggers an arp request
  to be sent, the bpf code might want to sleep but crash the system due
  to a non sleep lock held from the arp entry not released properly.

  Release the lock before calling the arp request code to solve the issue
  as is done on all the other code paths.

  PR:	200323
  Approved by: ae, gnn(mentor)
  MFC after:	1 week
  Sponsored by:	Netgate
  Differential Revision:	https://reviews.freebsd.org/D2828

Changes:
  head/sys/netinet/if_ether.c
Comment 15 Franco Fichtner 2015-06-28 08:52:00 UTC
MFC still pending.  Having this in 10.2 would be awesome. :)
Comment 16 Oliver Pinter freebsd_committer freebsd_triage 2015-06-28 16:21:10 UTC
This is already done in 10-STABLE: https://github.com/freebsd/freebsd/commit/5d11dcc72032e3027520c3aa2ffb5905115760e7
Comment 17 Franco Fichtner 2015-06-28 16:23:03 UTC
Ah, wasn't picked up by "PR:" in commit message, I see. My bad. Close this ticket then? :)
Comment 18 Oliver Pinter freebsd_committer freebsd_triage 2015-06-28 16:24:05 UTC
@eri: could you / we close this ticket? Or this patch is still needs on older branches too?
@franco: see the previous comment from me
Comment 19 Luiz Otavio O Souza,+55 (14) 99772-1255 freebsd_committer freebsd_triage 2015-07-30 14:42:55 UTC
This is not completely correct, BPF must never sleep on the hot path.

The problem still happens, because BPF is sleeping with the NIC lock held.

I tracked this down to the changes introduced in r244090 and now I'm looking for a better fix for it.
Comment 20 Guy Helmer freebsd_committer freebsd_triage 2015-07-30 14:52:10 UTC
(In reply to Luiz Otavio O Souza,+55 (14) 99772-1255 from comment #19)

The hold/store/free buffer sharing two buffers in BPF has been a source of trouble. Should there be a third buffer in BPF so we don't have to wait for access to a buffer in bpfread()? It would waste memory but seems like it could cleanly resolve some problems.
Comment 21 Luiz Otavio O Souza,+55 (14) 99772-1255 freebsd_committer freebsd_triage 2015-07-31 22:31:26 UTC
(In reply to Guy Helmer from comment #20)

I studied this code (a lot) and found that some of sleeps are unnecessary.

The cases we have to protect are: setting a new filter, setting a new interface and the flush/reset of buffers. The other cases can be dealt with some extra caution while handling the buffers (under the descriptor lock).

This should cover your previous issues while still permit that we eliminate some of the sleeps.
Comment 22 commit-hook freebsd_committer freebsd_triage 2015-08-03 22:15:17 UTC
A commit references this bug:

Author: loos
Date: Mon Aug  3 22:14:46 UTC 2015
New revision: 286260
URL: https://svnweb.freebsd.org/changeset/base/286260

Log:
  Remove the mtx_sleep() from the kqueue f_event filter.

  The filter is called from the network hot path and must not sleep.

  The filter runs with the descriptor lock held and does not manipulates the
  buffers, so it is not necessary sleep when the hold buffer is in use.

  Just ignore the hold buffer contents when it is being copied to user space
  (when hold buffer in use is set).

  This fix the "Sleeping thread owns a non-sleepable lock" panic when the
  userland thread is too busy reading the packets from bpf(4).

  PR:		200323
  MFC after:	2 weeks
  Sponsored by:	Rubicon Communications (Netgate)

Changes:
  head/sys/net/bpf.c
Comment 23 commit-hook freebsd_committer freebsd_triage 2015-08-17 19:06:28 UTC
A commit references this bug:

Author: loos
Date: Mon Aug 17 19:06:15 UTC 2015
New revision: 286859
URL: https://svnweb.freebsd.org/changeset/base/286859

Log:
  MFC r286260:
    Remove the mtx_sleep() from the kqueue f_event filter.

    The filter is called from the network hot path and must not sleep.

    The filter runs with the descriptor lock held and does not manipulate the
    buffers, so it is not necessary sleep when the hold buffer is in use.

    Just ignore the hold buffer contents when it is being copied to user space
    (when hold buffer in use is set).

    This fix the "Sleeping thread owns a non-sleepable lock" panic when the
    userland thread is too busy reading the packets from bpf(4).

    PR:           200323
    Sponsored by: Rubicon Communications (Netgate)

Changes:
_U  stable/10/
  stable/10/sys/net/bpf.c