Bug 194314 - [ixgbe] driver makes some dangerous assumptions with struct mbuf sizing with IXGBE_RX_COPY_LEN
Summary: [ixgbe] driver makes some dangerous assumptions with struct mbuf sizing with ...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: Normal Affects Many People
Assignee: Enji Cooper
URL:
Keywords: IntelNetworking
Depends on:
Blocks:
 
Reported: 2014-10-12 09:13 UTC by Enji Cooper
Modified: 2015-07-30 05:59 UTC (History)
8 users (show)

See Also:
ngie: mfc-stable10+
ngie: mfc-stable9+
ngie: mfc-stable8?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Enji Cooper freebsd_committer freebsd_triage 2014-10-12 09:13:04 UTC
The code in sys/dev/ixgbe/ixgbe.h assumes that MLEN is always > 160, and doesn't exceed the size of the mbuf. MSIZE is set to 256, so if MHLEN = MSIZE - sizeof(struct m_hdr) - sizeof(struct pkthdr) < 160, ixgbe will scribble over the header information in mbufs. Similarly, if IXGBE_RX_COPY_LEN is greater than the size of the mbuf, it will scribble over other memory, potentially in the same mbuf chain, or elsewhere.

This optimization needs better bounds checking/handling.

 155 /*
 156  * Used for optimizing small rx mbufs.  Effort is made to keep the copy
 157  * small and aligned for the CPU L1 cache.
 158  * 
 159  * MHLEN is typically 168 bytes, giving us 8-byte alignment.  Getting
 160  * 32 byte alignment needed for the fast bcopy results in 8 bytes being
 161  * wasted.  Getting 64 byte alignment, which _should_ be ideal for
 162  * modern Intel CPUs, results in 40 bytes wasted and a significant drop
 163  * in observed efficiency of the optimization, 97.9% -> 81.8%.
 164  */
 165 #define IXGBE_RX_COPY_LEN       160
 166 #define IXGBE_RX_COPY_ALIGN     (MHLEN - IXGBE_RX_COPY_LEN)

  60  * MLEN is data length in a normal mbuf.
  61  * MHLEN is data length in an mbuf with pktheader.
  62  * MINCLSIZE is a smallest amount of data that should be put into cluster.
  63  */
  64 #define MLEN            ((int)(MSIZE - sizeof(struct m_hdr)))
  65 #define MHLEN           ((int)(MLEN - sizeof(struct pkthdr)))
  66 #define MINCLSIZE       (MHLEN + 1)
Comment 1 Enji Cooper freebsd_committer freebsd_triage 2014-10-12 09:15:01 UTC
Credit goes to Anton Rang @ Isilon for finding the bug.
Comment 2 Adrian Chadd freebsd_committer freebsd_triage 2014-10-12 20:07:27 UTC
Hi!

Does he/Isilon have a patch to fix/address this?


-adrian
Comment 3 Enji Cooper freebsd_committer freebsd_triage 2014-10-13 04:21:19 UTC
(In reply to Adrian Chadd from comment #2)
> Hi!
> 
> Does he/Isilon have a patch to fix/address this?

We don't yet. This was a recently discovered issue on a driver that isn't available in HEAD yet...
Comment 4 Enji Cooper freebsd_committer freebsd_triage 2014-10-15 00:12:31 UTC
(In reply to Garrett Cooper from comment #3)
> We don't yet. This was a recently discovered issue on a driver that isn't
> available in HEAD yet...

Note:

This bug is only harmful if you modify any of the structures that comprise `struct mbuf` as the observation hasn't been violated on FreeBSD CURRENT [yet]. Isilon runs into this issue because we add additional fields to track mbufs.

Status update:

Isilon is waiting for a fix from upstream (Intel); we will work with Intel to get the fix tested and pushed into FreeBSD.
Comment 5 Enji Cooper freebsd_committer freebsd_triage 2014-10-15 00:14:15 UTC
(In reply to Garrett Cooper from comment #4)
> `struct mbuf` as the observation hasn't been violated on FreeBSD CURRENT

"as the observation hasn't" -> "; the conditions for the optimization to work haven't"
Comment 6 Eric Joyner 2014-10-16 21:19:08 UTC
Does that mean only the sizes of struct m_hdr and struct m_pkthdr change, or do you change MSIZE as well?
Comment 7 Enji Cooper freebsd_committer freebsd_triage 2014-10-16 21:29:42 UTC
(In reply to Eric Joyner from comment #6)
> Does that mean only the sizes of struct m_hdr and struct m_pkthdr change, or
> do you change MSIZE as well?

The size of struct m_ext and struct m_hdr is larger than stock FreeBSD, with certain compile-time options.

However, you can technically tweak MSIZE though so it's less than 160, which would also reproduce the situation. That scenario seems kind of silly though.
Comment 8 Eric Joyner 2014-10-16 21:37:31 UTC
So it sounds like we may just need to change IXGBE_RX_COPY_LEN from a fixed value of 160 to something that's calculated based on the length of struct m_hdr and struct m_pkthdr combined.

I think they got 160 by adding together the size of the two header structs (88 in FreeBSD amd64), adding some number (8) to make that size divisible by 32 (96 % 32 == 0), and then subtracted that from MSIZE (256) to get IXGBE_RX_COPY_LEN (160), while keeping the padding added as IXGBE_RX_COPY_ALIGN.
Comment 9 Eric Joyner 2014-10-16 23:41:02 UTC
You could try replacing the two lines in ixgbe.h with these:

#define IXGBE_RX_COPY_HDR       (sizeof(struct m_hdr) + sizeof(struct pkthdr))
#define IXGBE_RX_COPY_HDR_PADDED ((((IXGBE_RX_COPY_HDR - 1) / 32) + 1) * 32)
#define IXGBE_RX_COPY_LEN       (MSIZE - IXGBE_RX_COPY_HDR_PADDED)
#define IXGBE_RX_COPY_ALIGN     (IXGBE_RX_COPY_HDR_PADDED - IXGBE_RX_COPY_HDR)
Comment 10 Enji Cooper freebsd_committer freebsd_triage 2014-10-17 21:42:06 UTC
(In reply to Eric Joyner from comment #8)
> So it sounds like we may just need to change IXGBE_RX_COPY_LEN from a fixed
> value of 160 to something that's calculated based on the length of struct
> m_hdr and struct m_pkthdr combined.
> 
> I think they got 160 by adding together the size of the two header structs
> (88 in FreeBSD amd64), adding some number (8) to make that size divisible by
> 32 (96 % 32 == 0), and then subtracted that from MSIZE (256) to get
> IXGBE_RX_COPY_LEN (160), while keeping the padding added as
> IXGBE_RX_COPY_ALIGN.

A couple questions then:

- What architectures is this driver supported on?
- Is this optimization only valid for amd64?
- Why 32?
Comment 11 Enji Cooper freebsd_committer freebsd_triage 2014-10-18 00:15:59 UTC
(In reply to Garrett Cooper from comment #10)
...
> - What architectures is this driver supported on?

What I was really driving at with this question is "What architectures does Intel support this driver on?"
Comment 12 Eric Joyner 2014-10-20 18:13:38 UTC
(In reply to Garrett Cooper from comment #10)
> (In reply to Eric Joyner from comment #8)
> > So it sounds like we may just need to change IXGBE_RX_COPY_LEN from a fixed
> > value of 160 to something that's calculated based on the length of struct
> > m_hdr and struct m_pkthdr combined.
> > 
> > I think they got 160 by adding together the size of the two header structs
> > (88 in FreeBSD amd64), adding some number (8) to make that size divisible by
> > 32 (96 % 32 == 0), and then subtracted that from MSIZE (256) to get
> > IXGBE_RX_COPY_LEN (160), while keeping the padding added as
> > IXGBE_RX_COPY_ALIGN.
> 
> A couple questions then:
> 
> - What architectures is this driver supported on?
> - Is this optimization only valid for amd64?
> - Why 32?

- At least i386 and amd64, but it should theoretically work on anything with a PCI-E slot
- This optimization specifically was contributed by scottl/luigi, looking at the commit logs. You should ask either of those two for details.
- Doesn't the comment in the code you posted answer that?
Comment 13 Eric Joyner 2015-01-28 00:17:18 UTC
Has this error gone away with the code change, or did something else happen?
Comment 14 Enji Cooper freebsd_committer freebsd_triage 2015-01-28 00:42:26 UTC
(In reply to Eric Joyner from comment #13)

I haven't tested the change yet, but thanks for the reminder (I'll verify it this
week).

If it works, do I have Intel's blessing for committing the fix, or should jfv do
the commit?
Comment 15 Enji Cooper freebsd_committer freebsd_triage 2015-01-29 22:12:53 UTC
(In reply to Garrett Cooper,425-314-3911 from comment #14)

Hi Eric!

I confirmed that the fix you provided worked for our environment. Unfortunately struct mbuf lacks some of the fields we have in it so it doesn't work with FreeBSD proper. I believe this patch would work for CURRENT, based on my basic C program:

$ clang -Wall -Wno-c++1z-extensions -std=c11 -o ~/test_ixgbe_defines ~/test_ixgbe_defines.c
$ cat ~/test_ixgbe_defines.c 
#include <sys/cdefs.h>
#include <sys/param.h>
#include <sys/mbuf.h>
#include <assert.h>

#ifndef offsetof
#define offsetof        __offsetof
#endif

#define IXGBE_RX_COPY_HDR_PADDED        ((((MPKTHSIZE - 1) / 32) + 1) * 32)
#define IXGBE_RX_COPY_LEN       (MSIZE - IXGBE_RX_COPY_HDR_PADDED)
#define IXGBE_RX_COPY_ALIGN     (IXGBE_RX_COPY_HDR_PADDED - MPKTHSIZE)

static_assert(IXGBE_RX_COPY_LEN == 160);
static_assert(IXGBE_RX_COPY_ALIGN == (MHLEN - IXGBE_RX_COPY_LEN));

int
main(void)
{

        return (0);
}

I'm going to try this out internally, but could you and Adrian (or Randall, because Netflix runs CURRENT-ish and I believe uses ixgbe in some cases) please look at this as well to make sure that it works accordingly?

diff --git a/sys/dev/ixgbe/ixgbe.h b/sys/dev/ixgbe/ixgbe.h
index f7e9efa..1ddeebb 100644
--- a/sys/dev/ixgbe/ixgbe.h
+++ b/sys/dev/ixgbe/ixgbe.h
@@ -162,8 +162,10 @@
  * modern Intel CPUs, results in 40 bytes wasted and a significant drop
  * in observed efficiency of the optimization, 97.9% -> 81.8%.
  */
-#define IXGBE_RX_COPY_LEN      160
-#define IXGBE_RX_COPY_ALIGN    (MHLEN - IXGBE_RX_COPY_LEN)
+
+#define        IXGBE_RX_COPY_HDR_PADDED ((((MPKTHSIZE - 1) / 32) + 1) * 32)
+#define        IXGBE_RX_COPY_LEN       (MSIZE - IXGBE_RX_COPY_HDR_PADDED)
+#define        IXGBE_RX_COPY_ALIGN     (IXGBE_RX_COPY_HDR_PADDED - MPKTHSIZE)
 
 /* Keep older OS drivers building... */
 #if !defined(SYSCTL_ADD_UQUAD)
Comment 16 Enji Cooper freebsd_committer freebsd_triage 2015-02-20 03:34:02 UTC
Ping.
Comment 17 Enji Cooper freebsd_committer freebsd_triage 2015-02-26 21:52:53 UTC
Ping?

I'll post the CR on phabricator if need be so it gets reviewed/committed to head.
Comment 18 commit-hook freebsd_committer freebsd_triage 2015-02-28 14:58:07 UTC
A commit references this bug:

Author: ngie
Date: Sat Feb 28 14:57:58 UTC 2015
New revision: 279393
URL: https://svnweb.freebsd.org/changeset/base/279393

Log:
  Pad RX copy alignment calculation to avoid illegal memory accesses

  The optimization made in r239940 is valid for struct mbuf's current structure
  and size in FreeBSD, but hardcodes assumptions about sizes of struct mbuf,
  which are unfortunately broken if additional data is added to the beginning of
  struct mbuf

  X-MFC note (discussed with rwatson):

  This change requires the MPKTHSIZE definition, which is only available after
  head@r277203 and will not be MFCed as it breaks mbuf(9) KPI.

  A direct commit to stable/10 and merges to other branches to add the necessary
  definitions to work with the code as-is will be done to facilitate this MFC

  PR: 194314
  MFC after: 2 weeks
  Approved/Reviewed by: erj, jfv
  Sponsored by: EMC / Isilon Storage Division

Changes:
  head/sys/dev/ixgbe/ixgbe.h
Comment 19 commit-hook freebsd_committer freebsd_triage 2015-04-24 22:19:10 UTC
A commit references this bug:

Author: ngie
Date: Fri Apr 24 22:18:51 UTC 2015
New revision: 281954
URL: https://svnweb.freebsd.org/changeset/base/281954

Log:
  MFC r279393:

  Pad RX copy alignment calculation to avoid illegal memory accesses

  The optimization made in r239940 is valid for struct mbuf's current structure
  and size in FreeBSD, but hardcodes assumptions about sizes of struct mbuf,
  which are unfortunately broken if additional data is added to the beginning of
  struct mbuf

  X-MFC note (discussed with rwatson):

  This change requires the MPKTHSIZE definition, which is only available after
  head@r277203 and will not be MFCed as it breaks mbuf(9) KPI.

  A direct commit to stable/10 and merges to other branches to add the necessary
  definitions to work with the code as-is will be done to facilitate this MFC

  PR: 194314
  Approved/Reviewed by: erj, jfv
  Sponsored by: EMC / Isilon Storage Division

Changes:
_U  stable/10/
  stable/10/sys/dev/ixgbe/ixgbe.h
Comment 20 commit-hook freebsd_committer freebsd_triage 2015-05-13 10:44:25 UTC
A commit references this bug:

Author: ngie
Date: Wed May 13 10:43:50 UTC 2015
New revision: 282847
URL: https://svnweb.freebsd.org/changeset/base/282847

Log:
  MFstable/10 r281951,r281954:

  r281951:

  Backport MHSIZE/MPKTHSIZE equivalents from head

  These macros are equivalent to the ones on head, except they are only exposed
  when _KERNEL is defined, i.e. to kernel code, whereas the code on head is exposed
  to userland as well

  This is for improved forwards compatibility with mbuf(9) macros in head@r277203+,
  and is required for a clean MFC of r279393

  This is a direct commit to stable/10

  Differential Revision: https://reviews.freebsd.org/D2126
  Reviewed by: glebius, rwatson
  Sponsored by: EMC / Isilon Storage Division

  r281954:

  MFC r279393:

  Pad RX copy alignment calculation to avoid illegal memory accesses

  The optimization made in r239940 is valid for struct mbuf's current structure
  and size in FreeBSD, but hardcodes assumptions about sizes of struct mbuf,
  which are unfortunately broken if additional data is added to the beginning of
  struct mbuf

  X-MFC note (discussed with rwatson):

  This change requires the MPKTHSIZE definition, which is only available after
  head@r277203 and will not be MFCed as it breaks mbuf(9) KPI.

  A direct commit to stable/10 and merges to other branches to add the necessary
  definitions to work with the code as-is will be done to facilitate this MFC

  PR: 194314
  Approved/Reviewed by: erj, jfv
  Sponsored by: EMC / Isilon Storage Division

Changes:
_U  stable/9/
_U  stable/9/sys/
_U  stable/9/sys/dev/
_U  stable/9/sys/dev/ixgbe/
  stable/9/sys/dev/ixgbe/ixgbe.h
_U  stable/9/sys/sys/
  stable/9/sys/sys/mbuf.h
Comment 21 Sean Bruno freebsd_committer freebsd_triage 2015-06-30 18:58:06 UTC
Is this ticket closeable now?  I'm confused on the status of patches in flight and commits.
Comment 22 Enji Cooper freebsd_committer freebsd_triage 2015-07-30 05:59:05 UTC
(In reply to Sean Bruno from comment #21)

I meant to MFC it to stable/8, but the rest of the driver hasn't been backported back that far. Just going to close this bug.

Thanks for the followup :)!