Bug 160992 - buf_ring(9) statistics accounting not MPSAFE
Summary: buf_ring(9) statistics accounting not MPSAFE
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-24 21:20 UTC by Arnaud
Modified: 2017-12-31 22:32 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Arnaud 2011-09-24 21:20:10 UTC
The following block of code, in `sys/sys/buf_ring.h':


       /*
        * If there are other enqueues in progress
        * that preceeded us, we need to wait for them
        * to complete
        */
       while (br->br_prod_tail != prod_head)
               cpu_spinwait();
       br->br_prod_bufs++;
       br->br_prod_bytes += nbytes;
       br->br_prod_tail = prod_next;
       critical_exit();


can be seen at runtime, memory-wise as:

      while (br->br_prod_tail != prod_head)
              cpu_spinwait();
      br->br_prod_tail = prod_next;
      br->br_prod_bufs++;
      br->br_prod_bytes += nbytes;
      critical_exit();


That is, there is no memory barrier to enforce completion of the
load/increment/store/load/load/addition/store operations before
updating what other thread spin on. 

Even if `br_prod_tail' is marked `volatile', there is no guarantee that
it will not be re-ordered wrt. non-volatile write (to `br_prod_bufs' and
`br_prod_bytes').


Confirmed by Kip Macy (kmacy@) in
http://lists.freebsd.org/pipermail/freebsd-hackers/2011-September/036454.html.

How-To-Repeat: code review.
Comment 1 Bruce Evans freebsd_committer freebsd_triage 2011-09-25 04:01:04 UTC
On Sat, 24 Sep 2011, Arnaud Lacombe wrote:

>> Description:
> The following block of code, in `sys/sys/buf_ring.h':
>
>
>       /*
>        * If there are other enqueues in progress
>        * that preceeded us, we need to wait for them
>        * to complete
>        */
>       while (br->br_prod_tail != prod_head)
>               cpu_spinwait();
>       br->br_prod_bufs++;
>       br->br_prod_bytes += nbytes;
>       br->br_prod_tail = prod_next;
>       critical_exit();
>
>
> can be seen at runtime, memory-wise as:
>
>      while (br->br_prod_tail != prod_head)
>              cpu_spinwait();
>      br->br_prod_tail = prod_next;
>      br->br_prod_bufs++;
>      br->br_prod_bytes += nbytes;
>      critical_exit();
>
> That is, there is no memory barrier to enforce completion of the
> load/increment/store/load/load/addition/store operations before
> updating what other thread spin on.

The counters are 64 bits, so it also does non-atomic increments of them
no 32-bit arches.

> Even if `br_prod_tail' is marked `volatile', there is no guarantee
> that it will not be re-ordered wrt. non-volatile write (to `br_prod_bufs'
> and `br_prod_bytes').

Using volatile is generally bogus.  Here it seems to mainly give
pessimizations and more opportunities for bad memory orders.  The i386
code for incrementing a 64-bit volatile x is:

 	movl	x, %eax
 	movl	x+4, %edx
 	addl	$1, %eax
 	adcl	$0, %edx
 	movl	%eax, x
 	movl	%edx, x+4

while for a 64-bit non-volatile it is:

 	addl	$1, x
 	adcl	$0, x+4

so volatile gives more caching in registers instead of less.  The following
are some of the bad memory orders possible:

with volatile:
        lo = br->br_prod_bytes.lo;
        hi = br->br_prod_bytes.hi;
        br->br_prod_tail = prod_next;
        br->br_prod_bufs++;
        lo += nbytes;
        hi += carry;
        br->br_prod_bytes.hi = hi;
        br->br_prod_bytes.lo = lo;

without volatile:

        br->br_prod_bytes.lo += nbytes;
        br->br_prod_tail = prod_next;
        br->br_prod_bufs++;
        br->br_prod_bytes.hi += carry;

I think the token method would make the nonatomic accesses to the
counters sufficiently atomic if it worked.  The necessary memory
barriers would probably have a memory clobber which effectively makes
all memory variables transiently volatile (where volatile actually
means non-volatile with respect to them changing -- holding the token
prevents them changing -- but actually means volatile with respect to
their memory accesses) so the effect of declaring the counters permanently
volatile would be reduced to a pessimization.  Even reads of them in
sysctls must hold the token to get a consistent snapshot.

Bruce
Comment 2 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:59:07 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped