Bug 237072 - netgraph(4): performance issue [on HardenedBSD]?
Summary: netgraph(4): performance issue [on HardenedBSD]?
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.2-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-net mailing list
URL:
Keywords: needs-qa, performance
Depends on:
Blocks:
 
Reported: 2019-04-07 14:35 UTC by Larry Rosenman
Modified: 2019-04-10 03:17 UTC (History)
6 users (show)

See Also:
koobs: mfc-stable12?
koobs: mfc-stable11?


Attachments
dmesg.boot from this hardware (14.58 KB, text/plain)
2019-04-07 14:52 UTC, Larry Rosenman
no flags Details
netgraph creation script that's run at boot (3.23 KB, text/plain)
2019-04-07 14:56 UTC, Larry Rosenman
no flags Details
Speed Test on the i3 (496.20 KB, image/png)
2019-04-08 23:39 UTC, Larry Rosenman
no flags Details
dmesg.boot from i3-7100U (FW6B) (7.52 KB, text/plain)
2019-04-08 23:46 UTC, Larry Rosenman
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Larry Rosenman freebsd_committer 2019-04-07 14:35:43 UTC
From my post to the #hardenedbsd IRC, but this *MAY* be a generic netgraph(4) perf issue.


<ler> Hey all: I'm having a performance issue with netgraph(4) on OPNSense (based on HBSD-11.2).  
<ler> I've got https://github.com/aus/pfatt loading / configuring netgraph(4), and a 1g/1g ATT Fiber link.
<ler> with this setup, my Download side is ~600 meg.
<ler> with my previous setup, a ubiquiti USG, I got ~900+ meg
<ler> I'm currently using a https://protectli.com/product/fw4a/ as the firewall
<ler> I also have a https://protectli.com/product/fw6b/ coming monday
<ler> I'm using OPNSense's 19.1.4-netmap kernel, and have done some of the em(4) tuning from their IPS/IDS forum topic.
<ler> The tuning DEFINITELY helped the UPLOAD side, but did nothing for the DOWNLOAD side.
<ler> I'm looking for ideas on how to troubleshoot this.
<ler> I'm more than happy to work with whoever, including making access available to the FW.
<ler> the manufacturer (protectli.com) has run iperf tests on the raw hardware (without the netgraph(4) stuff in play) and gets ~940meg across it.
<ler> (using OPNSense 19.1.4)
<ler> so that's why I'm pointing the finger at netgraph(4).
<ler> I see the same issue on a https://protectli.com/product/fw1/ 
<ler> Let me know what all you need/want, and I'll supply it.
- {Day changed to April 7, 2019}
<Ellenor> Can your installation be done without netgraph?
<ler> No. Needs to be there to do the 802.1X dance with the ATT ONT.  Read the README at https://github.com/aus/pfatt for more details.
<ler> And tag all the packets going to the ONT as VLAN0.
<Ellenor> Ok.
<Ellenor> l/ 45
<lattera> ler: for opnsense questions, #opnsense would be the right channel
<ler> This is more a hardendbsd perf issue.
<lattera> did you test in freebsd?
<ler> No, as this is a FW issue, and I refuse to try pfSense.
<ler> I have philosophical issues with them.
<lattera> understood. but, to help me out, I'd prefer if you tested your setup with vanilla fbsd. hbsd's only changes to the networking stack were to use ipv6 privacy extensions by default
<ler> and ASLR
<lattera> ASLR affects userland, not kernel, which is where the bulk of network performance occurs
<lattera> https://github.com/HardenedBSD/hardenedBSD/wiki#generic-system-hardening
<lattera> we do set net.inet.ip.random_id, too
<lattera> so you could unset that to start with, I guess
<ler> it's definitely netgraph(4) related, see the statement above re: protectli testing on the same HW/SW w/o netgraph(4), and getting expected perf.
<ler> I doubt that random_id is affecting this, for that same reason.
<lattera> agreed. can you file a bug report with us? https://github.com/HardenedBSD/hardenedBSD/issues
<ler> sure.
Comment 1 Larry Rosenman freebsd_committer 2019-04-07 14:50:27 UTC
summary:
-native (non-netgraph) performance on this hardware is the expected ~940Meg
-I need to add netgraph into the hot path to appease the ATT Fiber ONT
-adding netgraph(4) gives me ~600 meg download
-I'll attach a dmesg.boot, as well as the netgraph script. 
-netgraph diagram: https://www.lerctr.org/~ler/ng.png
-documentation on why: https://github.com/aus/pfatt
Comment 2 Larry Rosenman freebsd_committer 2019-04-07 14:52:05 UTC
Created attachment 203441 [details]
dmesg.boot from this hardware
Comment 3 Larry Rosenman freebsd_committer 2019-04-07 14:53:33 UTC
pciconf -lv:
$ sudo pciconf -lv
hostb0@pci0:0:0:0:	class=0x060000 card=0x22128086 chip=0x0f008086 rev=0x11 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Atom Processor Z36xxx/Z37xxx Series SoC Transaction Register'
    class      = bridge
    subclass   = HOST-PCI
vgapci0@pci0:0:2:0:	class=0x030000 card=0x22128086 chip=0x0f318086 rev=0x11 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Atom Processor Z36xxx/Z37xxx Series Graphics & Display'
    class      = display
    subclass   = VGA
ahci0@pci0:0:19:0:	class=0x010601 card=0x0f238086 chip=0x0f238086 rev=0x11 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Atom Processor E3800 Series SATA AHCI Controller'
    class      = mass storage
    subclass   = SATA
xhci0@pci0:0:20:0:	class=0x0c0330 card=0x0f358086 chip=0x0f358086 rev=0x11 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Atom Processor Z36xxx/Z37xxx, Celeron N2000 Series USB xHCI'
    class      = serial bus
    subclass   = USB
none0@pci0:0:26:0:	class=0x108000 card=0x0f188086 chip=0x0f188086 rev=0x11 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Atom Processor Z36xxx/Z37xxx Series Trusted Execution Engine'
    class      = encrypt/decrypt
pcib1@pci0:0:28:0:	class=0x060400 card=0x0f488086 chip=0x0f488086 rev=0x11 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = 'Atom Processor E3800 Series PCI Express Root Port 1'
    class      = bridge
    subclass   = PCI-PCI
pcib2@pci0:0:28:1:	class=0x060400 card=0x0f4a8086 chip=0x0f4a8086 rev=0x11 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = 'Atom Processor E3800 Series PCI Express Root Port 2'
    class      = bridge
    subclass   = PCI-PCI
pcib3@pci0:0:28:2:	class=0x060400 card=0x0f4c8086 chip=0x0f4c8086 rev=0x11 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = 'Atom Processor E3800 Series PCI Express Root Port 3'
    class      = bridge
    subclass   = PCI-PCI
pcib4@pci0:0:28:3:	class=0x060400 card=0x0f4e8086 chip=0x0f4e8086 rev=0x11 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = 'Atom Processor E3800 Series PCI Express Root Port 4'
    class      = bridge
    subclass   = PCI-PCI
isab0@pci0:0:31:0:	class=0x060100 card=0x0f1c8086 chip=0x0f1c8086 rev=0x11 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Atom Processor Z36xxx/Z37xxx Series Power Control Unit'
    class      = bridge
    subclass   = PCI-ISA
none1@pci0:0:31:3:	class=0x0c0500 card=0x0f128086 chip=0x0f128086 rev=0x11 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Atom Processor E3800 Series SMBus Controller'
    class      = serial bus
    subclass   = SMBus
em0@pci0:1:0:0:	class=0x020000 card=0x00008086 chip=0x150c8086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82583V Gigabit Network Connection'
    class      = network
    subclass   = ethernet
em1@pci0:2:0:0:	class=0x020000 card=0x00008086 chip=0x150c8086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82583V Gigabit Network Connection'
    class      = network
    subclass   = ethernet
em2@pci0:3:0:0:	class=0x020000 card=0x00008086 chip=0x150c8086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82583V Gigabit Network Connection'
    class      = network
    subclass   = ethernet
em3@pci0:4:0:0:	class=0x020000 card=0x00008086 chip=0x150c8086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82583V Gigabit Network Connection'
    class      = network
    subclass   = ethernet
$
Comment 4 Larry Rosenman freebsd_committer 2019-04-07 14:56:18 UTC
Created attachment 203442 [details]
netgraph creation script that's run at boot
Comment 5 Larry Rosenman freebsd_committer 2019-04-07 15:02:36 UTC
As stated in the description, I'm more than willing to supply access to the box to a qualified developer.

this is my *ONLY* Internet connection, and I really would love to get the full 1g/1g I know I can get from ATT Fiber on this hardware.

I know the rest of my infrastructure can do it, as with my previous firewall (a Ubiquiti USG) I got ~900/~900.
Comment 6 Larry Rosenman freebsd_committer 2019-04-07 15:10:34 UTC
see also:
https://github.com/HardenedBSD/hardenedBSD/issues/376
Comment 7 Larry Rosenman freebsd_committer 2019-04-08 17:22:52 UTC
Brent from Protectli is sending me an additional FW4A and memory/disk for my FW1.  This means I should be able to:
1) do testing here
2) give remote access to both to a dev with either pfSense, OPSNsense, or Raw FreeBSD on it 

Please let me know what else y'all need (desired tests, etc).
Comment 8 Larry Rosenman freebsd_committer 2019-04-08 23:38:41 UTC
I got the FW6B (i3-7100U) based box today, and moved the exact same SSD to it.

The performance is more what I would expect.

919Down/942Up.

I still want to investigate why the E3845 has such a sharp dropoff with netgraph(4) in play.
Comment 9 Larry Rosenman freebsd_committer 2019-04-08 23:39:09 UTC
Created attachment 203507 [details]
Speed Test on the i3
Comment 10 Larry Rosenman freebsd_committer 2019-04-08 23:46:35 UTC
Created attachment 203508 [details]
dmesg.boot from i3-7100U (FW6B)
Comment 11 Phillip R. Jaenke 2019-04-09 00:17:39 UTC
This does surprise me somewhat as the performance is definitely NOT as expected based on other comparative benchmarks. I think this may actually be a severe CPU performance regression.

The processor involved is an Atom E3845 @ 1.9GHz. Using Passmark as a base for comparison, we should expect performance to be approximately 55-60% of an Atom C2758 SoC, within margin of error on AMD GX-412HC (PCEngines APU2 w/i210,) and approximately 3x faster (minimum) than an AMD G-T40E (PCEngines APU1 w/Realtek.) This should reasonably apply to both forwarding and firewalling throughput.

Looking at BSDRP's results to establish reasonable expectations, what we instead see is the E3845 managing a peak throughput rate roughly comparable to the AMD G-T40E. Missing the expected mark by 40% or more.
Whereas switching to the i3-7100U (slightly faster than the C2758) results in a GREATER than 100% immediate performance gain (likely quantifiable as more than 150% total.) 

Based on that, I think this might actually be exposing some flavor of regression. Independent benchmarks of various E3845 appliances running 11.2 put the expected numbers for firewalling NAT at 800-900Mbps with no tuning (and encrypted traffic at a whole 300Mbps best case with the CPU completely pegged.) What was observed was 500-600Mbps with <50% total CPU utilization. It would seem to me that outside of a regression, one of those numbers (either, really) should be higher.
Comment 12 Larry Rosenman freebsd_committer 2019-04-09 00:28:59 UTC
Remember, brent@protectli.com has done iPerf on the bare HW with both pfSense and OPNSense, and gets the expected throughput. 

My issue comes when I add https://github.com/aus/pfatt and the netgraph(4) stuff.
Comment 13 Larry Rosenman freebsd_committer 2019-04-09 02:25:45 UTC
pciconf -lv from the i3-7100U:
$ pciconf -lv
hostb0@pci0:0:0:0:	class=0x060000 card=0x20158086 chip=0x59048086 rev=0x02 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers'
    class      = bridge
    subclass   = HOST-PCI
vgapci0@pci0:0:2:0:	class=0x030000 card=0x00008086 chip=0x59168086 rev=0x02 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'HD Graphics 620'
    class      = display
    subclass   = VGA
xhci0@pci0:0:20:0:	class=0x0c0330 card=0x72708086 chip=0x9d2f8086 rev=0x21 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Sunrise Point-LP USB 3.0 xHCI Controller'
    class      = serial bus
    subclass   = USB
none0@pci0:0:22:0:	class=0x078000 card=0x19998086 chip=0x9d3a8086 rev=0x21 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Sunrise Point-LP CSME HECI'
    class      = simple comms
ahci0@pci0:0:23:0:	class=0x010601 card=0x72708086 chip=0x9d038086 rev=0x21 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Sunrise Point-LP SATA Controller [AHCI mode]'
    class      = mass storage
    subclass   = SATA
pcib1@pci0:0:28:0:	class=0x060400 card=0x72708086 chip=0x9d108086 rev=0xf1 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = 'Sunrise Point-LP PCI Express Root Port'
    class      = bridge
    subclass   = PCI-PCI
pcib2@pci0:0:28:1:	class=0x060400 card=0x72708086 chip=0x9d118086 rev=0xf1 hdr=0x01
    vendor     = 'Intel Corporation'
    class      = bridge
    subclass   = PCI-PCI
pcib3@pci0:0:28:2:	class=0x060400 card=0x72708086 chip=0x9d128086 rev=0xf1 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = 'Sunrise Point-LP PCI Express Root Port'
    class      = bridge
    subclass   = PCI-PCI
pcib4@pci0:0:28:3:	class=0x060400 card=0x72708086 chip=0x9d138086 rev=0xf1 hdr=0x01
    vendor     = 'Intel Corporation'
    class      = bridge
    subclass   = PCI-PCI
pcib5@pci0:0:28:4:	class=0x060400 card=0x72708086 chip=0x9d148086 rev=0xf1 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = 'Sunrise Point-LP PCI Express Root Port'
    class      = bridge
    subclass   = PCI-PCI
pcib6@pci0:0:28:5:	class=0x060400 card=0x72708086 chip=0x9d158086 rev=0xf1 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = 'Sunrise Point-LP PCI Express Root Port'
    class      = bridge
    subclass   = PCI-PCI
isab0@pci0:0:31:0:	class=0x060100 card=0x72708086 chip=0x9d4e8086 rev=0x21 hdr=0x00
    vendor     = 'Intel Corporation'
    class      = bridge
    subclass   = PCI-ISA
none1@pci0:0:31:2:	class=0x058000 card=0x72708086 chip=0x9d218086 rev=0x21 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Sunrise Point-LP PMC'
    class      = memory
none2@pci0:0:31:4:	class=0x0c0500 card=0x72708086 chip=0x9d238086 rev=0x21 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Sunrise Point-LP SMBus'
    class      = serial bus
    subclass   = SMBus
em0@pci0:1:0:0:	class=0x020000 card=0x00008086 chip=0x150c8086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82583V Gigabit Network Connection'
    class      = network
    subclass   = ethernet
em1@pci0:2:0:0:	class=0x020000 card=0x00008086 chip=0x150c8086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82583V Gigabit Network Connection'
    class      = network
    subclass   = ethernet
em2@pci0:3:0:0:	class=0x020000 card=0x00008086 chip=0x150c8086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82583V Gigabit Network Connection'
    class      = network
    subclass   = ethernet
em3@pci0:4:0:0:	class=0x020000 card=0x00008086 chip=0x150c8086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82583V Gigabit Network Connection'
    class      = network
    subclass   = ethernet
em4@pci0:5:0:0:	class=0x020000 card=0x00008086 chip=0x150c8086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82583V Gigabit Network Connection'
    class      = network
    subclass   = ethernet
em5@pci0:6:0:0:	class=0x020000 card=0x00008086 chip=0x150c8086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82583V Gigabit Network Connection'
    class      = network
    subclass   = ethernet
$
Comment 14 Larry Rosenman freebsd_committer 2019-04-10 01:09:29 UTC
More information from Austin Robertson (aus on github):
Hey Larry,

I hope you don't mind me emailing you, but I've seen you've been posting about a similar issue I experienced in regards to netgraph performance with pfatt. (I had seen some freebsd.org bug report referrals in my Github repo's traffic analytics)

In my experience, the netgraph configuration in pfatt can max out a single core when reaching gigabit speeds. In some cases, the single core performance of the process can handle it. In other cases, it cannot and speed suffers.

In the case of another user, their C2758 CPU wasn't getting full gigabit performance. Upgrading to a beefier E3-1230v6 got them the full line speeds.

When being throttled by the CPU, I see a high percentage of interrupts (relative to core count) against the NIC via systat -vmstat. I suspect that the extra packet processing isn't hardware accelerated by the NIC and being handled in kernel space by netgraph. 

BSD and performance aren't my expertise, and you seem to be more savvy in those areas. I thought I'd pass along my experience. If you come up with a solution, I'd definitely like to here it!