Bug 165220

Summary: [ath] "ath_rx_tasklet: sc_inreset_cnt > 0; skipping" messages
Product: Base System Reporter: Adrian Chadd <adrian>
Component: wirelessAssignee: freebsd-wireless (Nobody) <wireless>
Status: Open ---    
Severity: Affects Only Me CC: andriyano-31, chronophob
Priority: Normal    
Version: 9.0-RELEASE   
Hardware: Any   
OS: Any   

Description Adrian Chadd freebsd_committer freebsd_triage 2012-02-17 03:00:21 UTC
There are far too many of the following messages showing up:


[100191] ath0: ath_rx_tasklet: sc_inreset_cnt > 0; skipping
[100191] ath0: ath_rx_tasklet: sc_inreset_cnt > 0; skipping
[100191] ath0: ath_rx_tasklet: sc_inreset_cnt > 0; skipping
[100191] ath0: ath_rx_tasklet: sc_inreset_cnt > 0; skipping
[100191] ath0: ath_start: sc_inreset_cnt > 0; bailing
[100191] ath0: ath_rx_tasklet: sc_inreset_cnt > 0; skipping
[100191] ath0: ath_rx_tasklet: sc_inreset_cnt > 0; skipping
[100191] ath0: ath_rx_tasklet: sc_inreset_cnt > 0; skipping

. What's happening is that a bunch of TX or RX completion is occuring at the same time as a channel change/reset. The previous code just had these run concurrently, causing all kinds of random and weird behaviour.

I introduced some debugging in FreeBSD-HEAD that tries to (a) stop these and (b) log when they occur, so I could actually try finding/figuring out what's going on.

This is one of these cases. :-)

Fix: 

The problem is that although interrupts are disabled in reset, the ath taskqueue may currently be running and this means existing TX completion and RX will occur.

I think the fix is to do this:

* grab the PCU lock
* disable interrupts
* grab the reset counter
* stop the ath taskqueue and wait for pending tasks to complete

. but is this still racy? A patch I've done to do the above in ath_reset() and ath_chan_change() reduces the instances of this quite heavily but it still occasionally turns up.
How-To-Repeat: Just run ath :-)
Comment 1 dfilter service freebsd_committer freebsd_triage 2012-02-17 03:23:16 UTC
Author: adrian
Date: Fri Feb 17 03:23:01 2012
New Revision: 231854
URL: http://svn.freebsd.org/changeset/base/231854

Log:
  Begin breaking out the txrx stop code into a locked and unlocked variant.
  
  PR:	kern/165220

Modified:
  head/sys/dev/ath/if_ath.c

Modified: head/sys/dev/ath/if_ath.c
==============================================================================
--- head/sys/dev/ath/if_ath.c	Fri Feb 17 03:01:29 2012	(r231853)
+++ head/sys/dev/ath/if_ath.c	Fri Feb 17 03:23:01 2012	(r231854)
@@ -1903,15 +1903,16 @@ ath_stop_locked(struct ifnet *ifp)
 
 #define	MAX_TXRX_ITERATIONS	1000
 static void
-ath_txrx_stop(struct ath_softc *sc)
+ath_txrx_stop_locked(struct ath_softc *sc)
 {
 	int i = MAX_TXRX_ITERATIONS;
 
 	ATH_UNLOCK_ASSERT(sc);
+	ATH_PCU_LOCK_ASSERT(sc);
+
 	/* Stop any new TX/RX from occuring */
 	taskqueue_block(sc->sc_tq);
 
-	ATH_PCU_LOCK(sc);
 	/*
 	 * Sleep until all the pending operations have completed.
 	 *
@@ -1925,7 +1926,6 @@ ath_txrx_stop(struct ath_softc *sc)
 		msleep(sc, &sc->sc_pcu_mtx, 0, "ath_txrx_stop", 1);
 		i--;
 	}
-	ATH_PCU_UNLOCK(sc);
 
 	if (i <= 0)
 		device_printf(sc->sc_dev,
@@ -1935,6 +1935,17 @@ ath_txrx_stop(struct ath_softc *sc)
 #undef	MAX_TXRX_ITERATIONS
 
 static void
+ath_txrx_stop(struct ath_softc *sc)
+{
+	ATH_UNLOCK_ASSERT(sc);
+	ATH_PCU_UNLOCK_ASSERT(sc);
+
+	ATH_PCU_LOCK(sc);
+	ath_txrx_stop_locked(sc);
+	ATH_PCU_UNLOCK(sc);
+}
+
+static void
 ath_txrx_start(struct ath_softc *sc)
 {
 
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 2 dfilter service freebsd_committer freebsd_triage 2012-02-17 03:46:48 UTC
Author: adrian
Date: Fri Feb 17 03:46:38 2012
New Revision: 231857
URL: http://svn.freebsd.org/changeset/base/231857

Log:
  Enforce some consistent ordering and handling of interrupt disable/enable
  with RX/TX halting.
  
  * Always disable/enable interrupts during a channel change, just to simply
    things.
  
  * Ensure that the ath taskqueue has completed and is paused before
    continuing.
  
  This dramatically reduces the instances of overlapping RX and reset
  conditions.
  
  PR:	kern/165220

Modified:
  head/sys/dev/ath/if_ath.c

Modified: head/sys/dev/ath/if_ath.c
==============================================================================
--- head/sys/dev/ath/if_ath.c	Fri Feb 17 03:39:06 2012	(r231856)
+++ head/sys/dev/ath/if_ath.c	Fri Feb 17 03:46:38 2012	(r231857)
@@ -1934,6 +1934,7 @@ ath_txrx_stop_locked(struct ath_softc *s
 }
 #undef	MAX_TXRX_ITERATIONS
 
+#if 0
 static void
 ath_txrx_stop(struct ath_softc *sc)
 {
@@ -1944,6 +1945,7 @@ ath_txrx_stop(struct ath_softc *sc)
 	ath_txrx_stop_locked(sc);
 	ATH_PCU_UNLOCK(sc);
 }
+#endif
 
 static void
 ath_txrx_start(struct ath_softc *sc)
@@ -2049,11 +2051,12 @@ ath_reset(struct ifnet *ifp, ATH_RESET_T
 	ATH_UNLOCK_ASSERT(sc);
 
 	ATH_PCU_LOCK(sc);
+	ath_hal_intrset(ah, 0);		/* disable interrupts */
+	ath_txrx_stop_locked(sc);	/* Ensure TX/RX is stopped */
 	if (ath_reset_grablock(sc, 1) == 0) {
 		device_printf(sc->sc_dev, "%s: concurrent reset! Danger!\n",
 		    __func__);
 	}
-	ath_hal_intrset(ah, 0);		/* disable interrupts */
 	ATH_PCU_UNLOCK(sc);
 
 	/*
@@ -2061,7 +2064,6 @@ ath_reset(struct ifnet *ifp, ATH_RESET_T
 	 * and block future ones from occuring. This needs to be
 	 * done before the TX queue is drained.
 	 */
-	ath_txrx_stop(sc);
 	ath_draintxq(sc, reset_type);	/* stop xmit side */
 
 	/*
@@ -5383,21 +5385,16 @@ ath_chan_set(struct ath_softc *sc, struc
 	struct ieee80211com *ic = ifp->if_l2com;
 	struct ath_hal *ah = sc->sc_ah;
 	int ret = 0;
-	int dointr = 0;
 
 	/* Treat this as an interface reset */
 	ATH_PCU_LOCK(sc);
+	ath_hal_intrset(ah, 0);		/* Stop new RX/TX completion */
+	ath_txrx_stop_locked(sc);	/* Stop pending RX/TX completion */
 	if (ath_reset_grablock(sc, 1) == 0) {
 		device_printf(sc->sc_dev, "%s: concurrent reset! Danger!\n",
 		    __func__);
 	}
-	if (chan != sc->sc_curchan) {
-		dointr = 1;
-		/* XXX only do this if inreset_cnt is 1? */
-		ath_hal_intrset(ah, 0);
-	}
 	ATH_PCU_UNLOCK(sc);
-	ath_txrx_stop(sc);
 
 	DPRINTF(sc, ATH_DEBUG_RESET, "%s: %u (%u MHz, flags 0x%x)\n",
 	    __func__, ieee80211_chan2ieee(ic, chan),
@@ -5466,10 +5463,10 @@ ath_chan_set(struct ath_softc *sc, struc
 			ath_beacon_config(sc, NULL);
 		}
 
-#if 0
 		/*
 		 * Re-enable interrupts.
 		 */
+#if 0
 		ath_hal_intrset(ah, sc->sc_imask);
 #endif
 	}
@@ -5478,8 +5475,7 @@ finish:
 	ATH_PCU_LOCK(sc);
 	sc->sc_inreset_cnt--;
 	/* XXX only do this if sc_inreset_cnt == 0? */
-	if (dointr)
-		ath_hal_intrset(ah, sc->sc_imask);
+	ath_hal_intrset(ah, sc->sc_imask);
 	ATH_PCU_UNLOCK(sc);
 
 	/* XXX do this inside of IF_LOCK? */
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 3 Adrian Chadd freebsd_committer freebsd_triage 2012-02-17 07:23:32 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-wireless

Change
Comment 4 dfilter service freebsd_committer freebsd_triage 2012-02-25 19:13:10 UTC
Author: adrian
Date: Sat Feb 25 19:12:54 2012
New Revision: 232163
URL: http://svn.freebsd.org/changeset/base/232163

Log:
  Attempt to further fix some of the concurrency/reset issues that occur.
  
  * ath_reset() is being called in softclock context, which may have the
    thing sleep on a lock.  To avoid this, since we really _shouldn't_
    be sleeping on any locks, break out the no-loss reset path into a tasklet
    and call that from:
  
    + ath_calibrate()
    + ath_watchdog()
  
    This has the added advantage that it'll end up also doing the frame
    RX cleanup from within the taskqueue context, rather than the softclock
    context.
  
  * Shuffle around the taskqueue_block() call to be before we grab the lock
    and disable interrupts.
  
    The trouble here is that taskqueue_block() doesn't block currently
    queued (but not yet running) tasks so calling it doesn't guarantee
    no further tasks (that weren't running on _A_ CPU at the time of this
    call) will complete.  Calling taskqueue_drain() on these tasks won't
    work because if any _other_ thread calls taskqueue_enqueue() for whatever
    reason, everything gets very angry and stops working.
  
    This slightly changes the race condition enough to let ath_rx_tasklet()
    run before we try disabling it, and thus quietens the warnings a bit.
  
    The (more) true solution will be doing something like the following:
  
    * having a taskqueue_blocked mask in ath_softc;
    * having an interrupt_blocked mask in ath_softc;
    * only calling taskqueue_drain() on each individual task _after_ the
      lock has been acquired - that way no further tasklet scheduling
      is going to occur.
    * Then once the tasks have been blocked _and_ the interrupt has been
      disabled, call taskqueue_drain() on each, ensuring that anything
      that _was_ scheduled or running is removed.
  
    The trouble is if something calls taskqueue_enqueue() on a task
    after taskqueue_blocked() has been called but BEFORE taskqueue_drain()
    has been called, ta_pending will be set to 1 and taskqueue_drain()
    will sit there stuck in msleep() until you hard-kill the machine.
  
  PR: kern/165382
  PR: kern/165220

Modified:
  head/sys/dev/ath/if_ath.c
  head/sys/dev/ath/if_athvar.h

Modified: head/sys/dev/ath/if_ath.c
==============================================================================
--- head/sys/dev/ath/if_ath.c	Sat Feb 25 18:48:06 2012	(r232162)
+++ head/sys/dev/ath/if_ath.c	Sat Feb 25 19:12:54 2012	(r232163)
@@ -159,6 +159,7 @@ static void	ath_beacon_proc(void *, int)
 static struct ath_buf *ath_beacon_generate(struct ath_softc *,
 			struct ieee80211vap *);
 static void	ath_bstuck_proc(void *, int);
+static void	ath_reset_proc(void *, int);
 static void	ath_beacon_return(struct ath_softc *, struct ath_buf *);
 static void	ath_beacon_free(struct ath_softc *);
 static void	ath_beacon_config(struct ath_softc *, struct ieee80211vap *);
@@ -392,6 +393,7 @@ ath_attach(u_int16_t devid, struct ath_s
 	TASK_INIT(&sc->sc_rxtask, 0, ath_rx_tasklet, sc);
 	TASK_INIT(&sc->sc_bmisstask, 0, ath_bmiss_proc, sc);
 	TASK_INIT(&sc->sc_bstucktask,0, ath_bstuck_proc, sc);
+	TASK_INIT(&sc->sc_resettask,0, ath_reset_proc, sc);
 
 	/*
 	 * Allocate hardware transmit queues: one queue for
@@ -1910,9 +1912,6 @@ ath_txrx_stop_locked(struct ath_softc *s
 	ATH_UNLOCK_ASSERT(sc);
 	ATH_PCU_LOCK_ASSERT(sc);
 
-	/* Stop any new TX/RX from occuring */
-	taskqueue_block(sc->sc_tq);
-
 	/*
 	 * Sleep until all the pending operations have completed.
 	 *
@@ -2050,6 +2049,9 @@ ath_reset(struct ifnet *ifp, ATH_RESET_T
 	ATH_PCU_UNLOCK_ASSERT(sc);
 	ATH_UNLOCK_ASSERT(sc);
 
+	/* Try to (stop any further TX/RX from occuring */
+	taskqueue_block(sc->sc_tq);
+
 	ATH_PCU_LOCK(sc);
 	ath_hal_intrset(ah, 0);		/* disable interrupts */
 	ath_txrx_stop_locked(sc);	/* Ensure TX/RX is stopped */
@@ -3163,6 +3165,23 @@ ath_beacon_start_adhoc(struct ath_softc 
 }
 
 /*
+ * Reset the hardware, with no loss.
+ *
+ * This can't be used for a general case reset.
+ */
+static void
+ath_reset_proc(void *arg, int pending)
+{
+	struct ath_softc *sc = arg;
+	struct ifnet *ifp = sc->sc_ifp;
+
+#if 0
+	if_printf(ifp, "%s: resetting\n", __func__);
+#endif
+	ath_reset(ifp, ATH_RESET_NOLOSS);
+}
+
+/*
  * Reset the hardware after detecting beacons have stopped.
  */
 static void
@@ -5387,6 +5406,12 @@ ath_chan_set(struct ath_softc *sc, struc
 	int ret = 0;
 
 	/* Treat this as an interface reset */
+	ATH_PCU_UNLOCK_ASSERT(sc);
+	ATH_UNLOCK_ASSERT(sc);
+
+	/* (Try to) stop TX/RX from occuring */
+	taskqueue_block(sc->sc_tq);
+
 	ATH_PCU_LOCK(sc);
 	ath_hal_intrset(ah, 0);		/* Stop new RX/TX completion */
 	ath_txrx_stop_locked(sc);	/* Stop pending RX/TX completion */
@@ -5526,18 +5551,10 @@ ath_calibrate(void *arg)
 			DPRINTF(sc, ATH_DEBUG_CALIBRATE,
 				"%s: rfgain change\n", __func__);
 			sc->sc_stats.ast_per_rfgain++;
-			/*
-			 * Drop lock - we can't hold it across the
-			 * ath_reset() call. Instead, we'll drop
-			 * out here, do a reset, then reschedule
-			 * the callout.
-			 */
-			callout_reset(&sc->sc_cal_ch, 1, ath_calibrate, sc);
 			sc->sc_resetcal = 0;
 			sc->sc_doresetcal = AH_TRUE;
-			ATH_UNLOCK(sc);
-			ath_reset(ifp, ATH_RESET_NOLOSS);
-			ATH_LOCK(sc);
+			taskqueue_enqueue(sc->sc_tq, &sc->sc_resettask);
+			callout_reset(&sc->sc_cal_ch, 1, ath_calibrate, sc);
 			return;
 		}
 		/*
@@ -6187,11 +6204,12 @@ ath_watchdog(void *arg)
 
 	/*
 	 * We can't hold the lock across the ath_reset() call.
+	 *
+	 * And since this routine can't hold a lock and sleep,
+	 * do the reset deferred.
 	 */
 	if (do_reset) {
-		ATH_UNLOCK(sc);
-		ath_reset(sc->sc_ifp, ATH_RESET_NOLOSS);
-		ATH_LOCK(sc);
+		taskqueue_enqueue(sc->sc_tq, &sc->sc_resettask);
 	}
 
 	callout_schedule(&sc->sc_wd_ch, hz);

Modified: head/sys/dev/ath/if_athvar.h
==============================================================================
--- head/sys/dev/ath/if_athvar.h	Sat Feb 25 18:48:06 2012	(r232162)
+++ head/sys/dev/ath/if_athvar.h	Sat Feb 25 19:12:54 2012	(r232163)
@@ -501,6 +501,7 @@ struct ath_softc {
 	struct ath_txq		*sc_cabq;	/* tx q for cab frames */
 	struct task		sc_bmisstask;	/* bmiss int processing */
 	struct task		sc_bstucktask;	/* stuck beacon processing */
+	struct task		sc_resettask;	/* interface reset task */
 	enum {
 		OK,				/* no change needed */
 		UPDATE,				/* update pending */
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 5 andriyano-31 2016-02-23 07:35:07 UTC
Same problem with AR9565 (Acer Aspire E5-571G) in 10.3, but these patches do not help
Comment 6 chronophob 2017-11-21 00:16:09 UTC
Hello! I don't know if you got this bug solved at any case but I had this in FreeBSD Release 11.1. In my system this started as soon as I enabled Linux support with [linux_enable="YES"] in /etc/rc.conf. Somehow it generated a conflict with IPv6 because having wireless connection with IPv4 only didn't generate this bug.

The best solution to solve this with is to enable Gateway in /etc/rc.conf with:

gateway_enable="YES"
ipv6_gateway_enable="YES"

to be extra sure I also added

ipv6_activate_all_interfaces="YES"
ip6addrctl_policy="ipv6_prefer"

to make this post complete this is also in my /etc/rc.conf

wlans_ath0="wlan0"
ifconfig_wlan0="WPA DHCP"
create_args_wlan0="country SE regdomain ETSI"

ifconfig_wlan0_ipv6="inet6 accept_rtadv"

- - -

If this does not work for your system you can check if Linux support is enabled and in that case turn it OFF, or keep it ON but only use wireless IPv4.

Hopefully this will help.


Yours sincerely,

JT (Aspiring Daemon)
Comment 7 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:47:37 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 8 Daniel Menelkir 2021-05-07 02:30:47 UTC
Happening to me using FreeBSD 13.0 with AR9485 Atheros. I've had this error (in comment #0) and makes me unable to connect. It works just fine if I disable ipv6.