Network polling has unnecessary calls to the scheduler. These require acquiring the sched lock which imposes a variable delay depending on contention for this lock. in netisr_pollmore() in kern/kern_poll.c there are two calls to schednetisrbits(). schednetisrbits is defined in net/netisr.h to set some bits in netisr and call legacy_setsoftnet(). legacy_setsoftnet() in net/netisr.c calls swi_sched(). swi_sched() in kern/kern_intr.c which calls ithread_schedule() in the same file. ithread_schedule() acquires and releases the sched_lock. Fix: Since the netisr is running when netisr_pollmore() is executing and swi_net() the main netisr despatcher loops until netisr is zero, it is sufficient in netisr_pollmore() to just set the bits in netisr and not also call legacy_setsoftnet(): replace the two instances of: schednetisrbits(1 << NETISR_POLL | 1 << NETISR_POLLMORE); in netisr_pollmore() by: atomic_set_rel_int(&netisr, (1 << NETISR_POLL | 1 << NETISR_POLLMORE));
Arthur, On Wed, Jul 19, 2006 at 07:49:59AM +0000, Arthur Hartwig wrote: A> >Fix: A> Since the netisr is running when netisr_pollmore() is executing and swi_net() the main netisr despatcher loops until netisr is zero, it is sufficient in netisr_pollmore() to just set the bits in netisr and not also call legacy_setsoftnet(): A> A> replace the two instances of: A> schednetisrbits(1 << NETISR_POLL | 1 << NETISR_POLLMORE); A> A> in netisr_pollmore() by: A> atomic_set_rel_int(&netisr, (1 << NETISR_POLL | 1 << NETISR_POLLMORE)); Hmm, interesting. Have you done any profiling? -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE
attach this to PR, so that it won't be lost in my mailbox. ----- Forwarded message from Arthur Hartwig <Arthur.Hartwig@nokia.com> ----- >Hmm, interesting. Have you done any profiling? G'day Gleb, My work was then largely concerned with fast forwarding performance between two em interfaces on FreeBSD 6.0 and in particular with trying to get some performance improvement at high rates by using additional CPUs. I did some timing using the tsc and found that the em driver in polling mode was using roughly roughly half the time spent in ether_input() to the point of putting a frame on the output queue. So I thought if I ran the polling code on one CPU and ether_input() on another CPU I would get an observable improvement. This turned out to be the case. At some stage I noticed that vmstat showed lots of interrupts even though I was using polling mode. I noticed v_intr was incremented in swi_sched() and intr_execute_handlers(). Another system not using polling mode didn't show the same high interrupt rate. After making the described change I found the interrupt rate reported by vmstat dropped significantly. The change didn't result in a significant performance improvement but clearly reduce the overhead in the polling thread, thus making more CPU time available to the system. Arthur I found interrupt driven fast forwarding performed significantly worse on a dual CPU system than on a single CPU system. Some profiling showed a lot of contention for the routing table locks, sometimes with significant delays between a thread attempting to acquire a lock and its actual acquisition. Changing to polling mode seemed as if would help reduce lock contention and so improve performance. A number of tweaks combined to get noloss fast forwarding performance between two GigE interfaces on a dual CPU system up to about what it was on a single CPU interrupt driven system Arthur ----- End forwarded message -----
Responsible Changed From-To: freebsd-bugs->bms I'll take this for the sake of convenience...
For convenience, I've attached a diff against bleeding edge HEAD. When I put fxp0 into polling mode with this patch, network operations appear to wedge. anglepoise# sysctl debug.mpsafenet debug.mpsafenet: 1 anglepoise# uname -a FreeBSD anglepoise.lon.incunabulum.net 7.0-CURRENT FreeBSD 7.0-CURRENT #19: Sat Feb 3 04:20:32 GMT 2007 bms@anglepoise.lon.incunabulum.net:/usr/obj/usr/home/bms/head/src/sys/ANGLEPOISE7 amd64 anglepoise# ifconfig fxp0 fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 options=48<VLAN_MTU,POLLING> inet 192.168.123.17 netmask 0xffffff00 broadcast 192.168.123.255 ether 00:90:27:59:40:2c media: Ethernet autoselect (10baseT/UTP) status: active anglepoise# ifconfig fxp0 -polling -> network operations resume
I tried the attached patch. With SWI_DELAY, nothing happens and traffic stalls. Without it, it's business as usual. Of course, with !SWI_DELAY. this will force a call to intr_event_schedule_thread(), making no real difference. I should point out all my testing was done on a uniprocessor amd64 system with an fxp card, though debug.mpsafenet is 1.
On Mon, Feb 05, 2007 at 01:06:06AM +0000, Bruce M Simpson wrote: > I tried the attached patch. With SWI_DELAY, nothing happens and [for the benefit of the archives] hmmm... so either the analysis in the PR is not correct and the call is done while the isr is not under service, or there is something that is done in swi_sched which we are not taking into account properly. cheers luigi > traffic stalls. Without it, it's business as usual. > > Of course, with !SWI_DELAY. this will force a call to > intr_event_schedule_thread(), making no real difference. > > I should point out all my testing was done on a uniprocessor > amd64 system with an fxp card, though debug.mpsafenet is 1. > > Index: src/sys/kern/kern_poll.c > =================================================================== > RCS file: /home/ncvs/src/sys/kern/kern_poll.c,v > retrieving revision 1.28 > diff -u -p -r1.28 kern_poll.c > --- src/sys/kern/kern_poll.c 6 Dec 2006 06:34:55 -0000 1.28 > +++ src/sys/kern/kern_poll.c 5 Feb 2007 00:50:30 -0000 > @@ -32,6 +32,8 @@ __FBSDID("$FreeBSD: src/sys/kern/kern_po > > #include <sys/param.h> > #include <sys/systm.h> > +#include <sys/bus.h> > +#include <sys/interrupt.h> > #include <sys/kernel.h> > #include <sys/socket.h> /* needed by net/if.h */ > #include <sys/sockio.h> > @@ -314,7 +316,10 @@ hardclock_device_poll(void) > if (phase != 0) > suspect++; > phase = 1; > - schednetisrbits(1 << NETISR_POLL | 1 << NETISR_POLLMORE); > + /* optimize out unnecessary calls to the scheduler. */ > + atomic_set_rel_int(&netisr, > + (1 << NETISR_POLL | 1 << NETISR_POLLMORE)); > + swi_sched(net_ih, SWI_DELAY); > phase = 2; > } > if (pending_polls++ > 0) > @@ -371,7 +376,10 @@ netisr_pollmore() > mtx_lock(&poll_mtx); > phase = 5; > if (residual_burst > 0) { > - schednetisrbits(1 << NETISR_POLL | 1 << NETISR_POLLMORE); > + /* optimize out unnecessary calls to the scheduler. */ > + atomic_set_rel_int(&netisr, > + (1 << NETISR_POLL | 1 << NETISR_POLLMORE)); > + swi_sched(net_ih, SWI_DELAY); > mtx_unlock(&poll_mtx); > /* will run immediately on return, followed by netisrs */ > return; > Index: src/sys/net/netisr.h > =================================================================== > RCS file: /home/ncvs/src/sys/net/netisr.h,v > retrieving revision 1.33 > diff -u -p -r1.33 netisr.h > --- src/sys/net/netisr.h 7 Jan 2005 01:45:35 -0000 1.33 > +++ src/sys/net/netisr.h 5 Feb 2007 00:50:30 -0000 > @@ -69,6 +69,7 @@ > #ifdef _KERNEL > > void legacy_setsoftnet(void); > +extern void *net_ih; > > extern volatile unsigned int netisr; /* scheduling bits for network */ > #define schednetisr(anisr) do { \ > Index: src/sys/net/netisr.c > =================================================================== > RCS file: /home/ncvs/src/sys/net/netisr.c,v > retrieving revision 1.18 > diff -u -p -r1.18 netisr.c > --- src/sys/net/netisr.c 28 Nov 2006 11:19:36 -0000 1.18 > +++ src/sys/net/netisr.c 5 Feb 2007 00:50:30 -0000 > @@ -82,7 +82,7 @@ struct netisr { > int ni_flags; > } netisrs[32]; > > -static void *net_ih; > +void *net_ih; > > /* > * Not all network code is currently capable of running MPSAFE; however,
State Changed From-To: open->feedback Think we're back to the drawing board here, at least for the current code base...
I haven't yet looked at the 7.0 code, but in the 6.0 code the call to schednetisrbits() in hardclock_device_poll() was necessary to get the netisr thread running. But in netisr_pollmore() the call to the scheduler was unnecessary because the netisr was already running. The patch by bms did rather more than I suggested (/replace two instance . . . in netisrpollmore()/) in that it it removed the call to schednetisrbits() in hardclock_device_poll() I'm about to go to work. I try to take a look at the 7.0 code later today. Arthur
I've looked at the HEAD version of kern_poll.c and it seems similar enough to 6.0 version. The call to schednetisrbits() in hardclock device_poll() needs to remain to ensure the netisr is scheduled to run. However, in netisr_pollmore() the two calls to schednetisrbits() can be replaced as I originally described. My reading of the diffs is that you replaced the call to schednetisrbits() in hardclock_device_poll() (which I didn't suggest) and replaced only one of the two calls to schednetisrbits() in netisr_pollmore() (which is only part of what I suggested.) Arthur
State Changed From-To: feedback->open Back to the net pool
Responsible Changed From-To: bms->net Back to the net pool
Responsible Changed From-To: net->freebsd-net Assign to freebsd-net instead of net, since that's the more usual name for net@ assignments.