Bug 42080

Summary: (No subject)
Product: Base System Reporter: Peter Edwards <pmedwards>
Component: kernAssignee: GNATS administrator <gnats-admin>
Status: Closed FIXED    
Severity: Affects Only Me CC: doug
Priority: Normal    
Version: 1.0-RELEASE   
Hardware: Any   
OS: Any   

Description Peter Edwards 2002-08-27 14:40:01 UTC
 This is a multi-part message in MIME format.
 ---------6ELL480NZK8OPLUEHSDOKZRS
 Content-Type: text/plain
 Content-Transfer-Encoding: 7bit
 
 After almost seeing the problem and retracting my near-miss on
 -hackers and being pointed to this PR by Doug, I decided to have a
 look at that area of code again, and I understand the race condition
 properly now
 
 Doug's patch requries aquires splbio() to protect, but the code
 seems to be going to great lengths to avoid that. (contrary to an
 existing comment)
 
 The following patch should protect against the race without requiring
 splbio.  (I don't have a vinum config handy to test it, this is
 more of a mental exercise for myself). It works by making sure the
 inner and outer loops around the BUF_STRATEGY call finish as soon
 as all the BUF_STRATEGYs have been completed, rather than always
 decending down the (possibly inactive) requests. (ie, it only loops on the
 groups as long as there are groups for which we will do IO, and it only loops
 on the elements in the groups as long as there are elements for which we will
 do IO)
 -- 
 Peter Edwards.
 
 
 ---------6ELL480NZK8OPLUEHSDOKZRS
 Content-Type: text/plain
 Content-Disposition: attachment; filename="patch.txt"
 
 Index: vinumrequest.c
 ===================================================================
 RCS file: /pub/FreeBSD/development/FreeBSD-CVS/src/sys/dev/vinum/vinumrequest.c,v
 retrieving revision 1.44.2.4
 diff -u -r1.44.2.4 vinumrequest.c
 --- vinumrequest.c	3 Feb 2002 07:10:26 -0000	1.44.2.4
 +++ vinumrequest.c	27 Aug 2002 13:33:26 -0000
 @@ -299,11 +299,11 @@
  int
  launch_requests(struct request *rq, int reviveok)
  {
 -    struct rqgroup *rqg;
 +    struct rqgroup *rqg, *nextrqg;
      int rqno;						    /* loop index */
      struct rqelement *rqe;				    /* current element */
      struct drive *drive;
 -    int rcount;						    /* request count */
 +    int iocount, activegroupcount;
  
      /*
       * First find out whether we're reviving, and the
 @@ -374,7 +374,10 @@
       * This loop happens without any participation
       * of the bottom half, so it requires no
       * protection.
 +     * XXX: the update of rqg->active must mirror the
 +     * calls to BUF_STRATEGY() below.
       */
 +    activegroupcount = 0;
      for (rqg = rq->rqg; rqg != NULL; rqg = rqg->next) {	    /* through the whole request chain */
  	rqg->active = rqg->count;			    /* they're all active */
  	for (rqno = 0; rqno < rqg->count; rqno++) {
 @@ -382,28 +385,31 @@
  	    if (rqe->flags & XFR_BAD_SUBDISK)		    /* this subdisk is bad, */
  		rqg->active--;				    /* one less active request */
  	}
 -	if (rqg->active)				    /* we have at least one active request, */
 +	if (rqg->active) {				    /* we have at least one active request, */
  	    rq->active++;				    /* one more active request group */
 +	    activegroupcount++;
 +	}
      }
  
      /*
       * Now fire off the requests.  In this loop the
       * bottom half could be completing requests
 -     * before we finish, so we need splbio() protection.
 +     * before we finish, so be careful to avoid manipulating rq, and avoid
 +     * accessing it if there's a possibility that the entire request may
 +     * have finished.
       */
 -    for (rqg = rq->rqg; rqg != NULL;) {			    /* through the whole request chain */
 +    for (rqg = rq->rqg; activegroupcount != 0; rqg = nextrqg) {
 +	nextrqg = rqg->next;
  	if (rqg->lockbase >= 0)				    /* this rqg needs a lock first */
  	    rqg->lock = lockrange(rqg->lockbase, rqg->rq->bp, &PLEX[rqg->plexno]);
 -	rcount = rqg->count;
 -	for (rqno = 0; rqno < rcount;) {
 +
 +	KASSERT(rqg->count >= rqg->active, ("vinum: overactive rqg"));
 +	iocount = rqg->count;
 +	if (iocount)
 +	    activegroupcount--;
 +	for (rqno = 0; iocount != 0;) {
  	    rqe = &rqg->rqe[rqno];
  
 -	    /*
 -	     * Point to next rqg before the bottom end
 -	     * changes the structures.
 -	     */
 -	    if (++rqno >= rcount)
 -		rqg = rqg->next;
  	    if ((rqe->flags & XFR_BAD_SUBDISK) == 0) {	    /* this subdisk is good, */
  		drive = &DRIVE[rqe->driveno];		    /* look at drive */
  		drive->active++;
 @@ -429,6 +435,7 @@
  #endif
  		/* fire off the request */
  		BUF_STRATEGY(&rqe->b, 0);
 +		iocount--;
  	    }
  	}
      }
 ---------6ELL480NZK8OPLUEHSDOKZRS--
Comment 1 Giorgos Keramidas freebsd_committer freebsd_triage 2002-08-28 22:35:42 UTC
State Changed
From-To: open->closed

Followup to kern/41740 misfiled as a new PR.