Created attachment 156930 [details] Misbehaving proxy arp daemon on bpf A bpf daemon can crash FreeBSD 10 system if it wrongly behaves. After analysis seems that a userland process opening a blocking read descriptor and never reading from it will consume the BPF buffers in kernel eventually leading to a crash on the system. This happens with bpf zerocopy enabled/disabled. Attached is a software used for proxy arp through BPF that caused the misbehaviour on the system. Depending on the load, traffic moving through the system this issue could be reproduced several times a day or even needing up to 4 days to reproduce.
This patch fixes the issue and the issue seems to a locked LLE which does not allow BPF to sleep when it needs to. +diff --git a/sys/netinet/if_ether.c b/sys/netinet/if_ether.c +index baa9c26..f31576d 100644 +--- a/sys/netinet/if_ether.c ++++ b/sys/netinet/if_ether.c +@@ -353,6 +353,10 @@ retry: + if ((la->la_flags & LLE_VALID) && + ((la->la_flags & LLE_STATIC) || la->la_expire > time_uptime)) { + bcopy(&la->ll_addr, desten, ifp->if_addrlen); ++ if (flags & LLE_EXCLUSIVE) ++ LLE_WUNLOCK(la); ++ else ++ LLE_RUNLOCK(la); + /* + * If entry has an expiry time and it is approaching, + * see if we need to send an ARP request within this +@@ -365,8 +369,7 @@ retry: + } + + *lle = la; +- error = 0; +- goto done; ++ return (0); + } + + if (la->la_flags & LLE_STATIC) { /* should not happen! */
Taking into account your description, this patch looks very strange. Can you show some core.txt.N files from described panics?
Here is a patch against HEAD of FreeBSD. Also take a look at this link for a trace https://redmine.pfsense.org/issues/4685 If its ok with you Andrey i would like to commit this one. diff --git a/sys/netinet/if_ether.c b/sys/netinet/if_ether.c index fec6aa0..91e9568 100644 --- a/sys/netinet/if_ether.c +++ b/sys/netinet/if_ether.c @@ -364,6 +364,10 @@ retry: if ((la->la_flags & LLE_VALID) && ((la->la_flags & LLE_STATIC) || la->la_expire > time_uptime)) { bcopy(&la->ll_addr, desten, ifp->if_addrlen); + if (flags & LLE_EXCLUSIVE) + LLE_WUNLOCK(la); + else + LLE_RUNLOCK(la); /* * If entry has an expiry time and it is approaching, * see if we need to send an ARP request within this @@ -377,8 +381,7 @@ retry: if (pflags != NULL) *pflags = la->la_flags; - error = 0; - goto done; + return (0); } if (la->la_flags & LLE_STATIC) { /* should not happen! */
la is being read and modified after unlock with the attached patch. Though it looks like arprequest() may indeed work as expected without the lock held. Not sure if unlock/lock around arprequest is advisable; there may be a more fundamental issue with the code block itself?
https://github.com/freebsd/freebsd/commit/ec826ad5c7f97de814529d3b3bae7950f91d9a5d#diff-e08033318b7a3c6cc3ffb3e431a0f8f2L461 vs. https://github.com/freebsd/freebsd/commit/ec826ad5c7f97de814529d3b3bae7950f91d9a5d#diff-e08033318b7a3c6cc3ffb3e431a0f8f2R359
(In reply to Franco Fichtner from comment #4) Yeah but its a very quick read to me which does not create any race of sort from what i could tell. For sure la will be there when the fields are accessed.
You read a value from a lock-protected entity, you acquire a (read) lock. You write a value to a lock-protected entity, you acquire a (write) lock. After the patch, there is neither. Do you want to risk pushing a regression into production code?
(In reply to Franco Fichtner from comment #7) Oh you refer to the decrement of the la preempt value. Yeah but i highly dislike unlock and relock again code paths, that can be a solution. Or moving this to atomics or even to a taskqueue might be better than this. Still since the entry is just going to be updated or removed shortly after timeout/reply i am not sure the implications if any there.
The previous code in place before the "bad" revision in 2008 accessed la, made its changes and called arprequest() after releasing the lock, like the other code block still does. You can maybe set a temporary variable inside the locked area that calls arprequest() and afterwards trigger the function based on the value of the temporary variable?
Phabricator (reviews.freebsd.org) looks like the perfect place for this changeset/discussion, and it is open for !committer registrations and accounts :)
(In reply to Kubilay Kocak from comment #10) Ok i am posting this to phabricator since i am a freebsd developer :) Just for the reference here is another iteration of the patch diff --git a/sys/netinet/if_ether.c b/sys/netinet/if_ether.c index fec6aa0..aa370d6 100644 --- a/sys/netinet/if_ether.c +++ b/sys/netinet/if_ether.c @@ -364,6 +364,7 @@ retry: if ((la->la_flags & LLE_VALID) && ((la->la_flags & LLE_STATIC) || la->la_expire > time_uptime)) { bcopy(&la->ll_addr, desten, ifp->if_addrlen); + renew = 0; /* * If entry has an expiry time and it is approaching, * see if we need to send an ARP request within this @@ -371,14 +372,22 @@ retry: */ if (!(la->la_flags & LLE_STATIC) && time_uptime + la->la_preempt > la->la_expire) { - arprequest(ifp, NULL, &SIN(dst)->sin_addr, NULL); + renew = 1; la->la_preempt--; } if (pflags != NULL) *pflags = la->la_flags; - error = 0; - goto done; + + if (flags & LLE_EXCLUSIVE) + LLE_WUNLOCK(la); + else + LLE_RUNLOCK(la); + + if (renew == 1) + arprequest(ifp, NULL, &SIN(dst)->sin_addr, NULL); + + return (0); } if (la->la_flags & LLE_STATIC) { /* should not happen! */
Moved to https://reviews.freebsd.org/D2828
Looks good now, thanks Ermal. :)
A commit references this bug: Author: eri Date: Wed Jun 17 12:23:05 UTC 2015 New revision: 284512 URL: https://svnweb.freebsd.org/changeset/base/284512 Log: If there is a system with a bpf consumer running and a packet is wanted to be transmitted but the arp cache entry expired, which triggers an arp request to be sent, the bpf code might want to sleep but crash the system due to a non sleep lock held from the arp entry not released properly. Release the lock before calling the arp request code to solve the issue as is done on all the other code paths. PR: 200323 Approved by: ae, gnn(mentor) MFC after: 1 week Sponsored by: Netgate Differential Revision: https://reviews.freebsd.org/D2828 Changes: head/sys/netinet/if_ether.c
MFC still pending. Having this in 10.2 would be awesome. :)
This is already done in 10-STABLE: https://github.com/freebsd/freebsd/commit/5d11dcc72032e3027520c3aa2ffb5905115760e7
Ah, wasn't picked up by "PR:" in commit message, I see. My bad. Close this ticket then? :)
@eri: could you / we close this ticket? Or this patch is still needs on older branches too? @franco: see the previous comment from me
This is not completely correct, BPF must never sleep on the hot path. The problem still happens, because BPF is sleeping with the NIC lock held. I tracked this down to the changes introduced in r244090 and now I'm looking for a better fix for it.
(In reply to Luiz Otavio O Souza,+55 (14) 99772-1255 from comment #19) The hold/store/free buffer sharing two buffers in BPF has been a source of trouble. Should there be a third buffer in BPF so we don't have to wait for access to a buffer in bpfread()? It would waste memory but seems like it could cleanly resolve some problems.
(In reply to Guy Helmer from comment #20) I studied this code (a lot) and found that some of sleeps are unnecessary. The cases we have to protect are: setting a new filter, setting a new interface and the flush/reset of buffers. The other cases can be dealt with some extra caution while handling the buffers (under the descriptor lock). This should cover your previous issues while still permit that we eliminate some of the sleeps.
A commit references this bug: Author: loos Date: Mon Aug 3 22:14:46 UTC 2015 New revision: 286260 URL: https://svnweb.freebsd.org/changeset/base/286260 Log: Remove the mtx_sleep() from the kqueue f_event filter. The filter is called from the network hot path and must not sleep. The filter runs with the descriptor lock held and does not manipulates the buffers, so it is not necessary sleep when the hold buffer is in use. Just ignore the hold buffer contents when it is being copied to user space (when hold buffer in use is set). This fix the "Sleeping thread owns a non-sleepable lock" panic when the userland thread is too busy reading the packets from bpf(4). PR: 200323 MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate) Changes: head/sys/net/bpf.c
A commit references this bug: Author: loos Date: Mon Aug 17 19:06:15 UTC 2015 New revision: 286859 URL: https://svnweb.freebsd.org/changeset/base/286859 Log: MFC r286260: Remove the mtx_sleep() from the kqueue f_event filter. The filter is called from the network hot path and must not sleep. The filter runs with the descriptor lock held and does not manipulate the buffers, so it is not necessary sleep when the hold buffer is in use. Just ignore the hold buffer contents when it is being copied to user space (when hold buffer in use is set). This fix the "Sleeping thread owns a non-sleepable lock" panic when the userland thread is too busy reading the packets from bpf(4). PR: 200323 Sponsored by: Rubicon Communications (Netgate) Changes: _U stable/10/ stable/10/sys/net/bpf.c