After I enable RACK I notice high system load because of high interrupts.
I compile a kernel with these options:
I add in the /etc/sysctl.conf :
and in /boot/loader.conf :
After I reboot the servers I notice 5% - 45% interrupts (some servers show more and some less) using "top".
The issue happens with "kern.eventtimer.timer=LAPIC" too.
Then I disable RACK and switch back to LAPIC and after reboot "top" shows 0.0% - 0.1% interrupts.
Switching to "net.inet.tcp.functions_default=freebsd" without reboot the interrupts don't decrease.
Searching for similar issues I found this: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241958
With both LAPIC and HPET there are many interrups.
CPU 0: 39.0% user, 0.0% nice, 6.7% system, 47.2% interrupt, 7.1% idle
CPU 1: 40.6% user, 0.0% nice, 3.5% system, 46.1% interrupt, 9.8% idle
CPU 2: 44.5% user, 0.0% nice, 5.5% system, 43.7% interrupt, 6.3% idle
CPU 3: 42.5% user, 0.0% nice, 6.7% system, 42.9% interrupt, 7.9% idle
CPU 4: 38.2% user, 0.0% nice, 5.5% system, 48.4% interrupt, 7.9% idle
CPU 5: 40.2% user, 0.0% nice, 5.1% system, 47.2% interrupt, 7.5% idle
CPU 6: 42.5% user, 0.0% nice, 5.9% system, 44.1% interrupt, 7.5% idle
CPU 7: 44.5% user, 0.0% nice, 6.7% system, 44.5% interrupt, 4.3% idle
cpu0:timer 3814 3810
cpu1:timer 3868 3864
cpu2:timer 3947 3943
cpu3:timer 3916 3912
cpu4:timer 3780 3776
cpu5:timer 3917 3913
cpu6:timer 3860 3856
cpu7:timer 3814 3810
irq120: hpet0:t0 3047 3047
irq121: hpet0:t1 3053 3053
irq122: hpet0:t2 3146 3146
irq123: hpet0:t3 3099 3099
irq124: hpet0:t4 2942 2942
irq125: hpet0:t5 3120 3120
irq126: hpet0:t6 3037 3037
irq127: hpet0:t7 3049 3049
irq129: ahci0 101 101
irq130: em0:irq0 401 401
Are you experiencing this in a VM? If yes, what the the virtualiser, what is the host OS?
Do you experience the load also when using the kernel you described but:
* Don't have loaded the RACK stack at all.
* Have loaded the RACK stack, but not using it at all (so don't set net.inet.tcp.functions_default=rack)
We had a report earlier, but could never reproduce it.
These are dedicated servers with no virtualisation.
em0: <Intel(R) PRO/1000 Network Connection> mem 0xf7000000-0xf701ffff irq 16 at device 31.6 on pci0
1) tcp_rack_load="YES" && "net.inet.tcp.functions_default=freebsd" && reboot = problem does NOT exist
2) tcp_rack_load="NO" && "net.inet.tcp.functions_default=freebsd" && reboot = problem does NOT exist
3) tcp_rack_load="YES" && "net.inet.tcp.functions_default=rack" && reboot = the problem EXIST.
Then if without reboot I "net.inet.tcp.functions_default=freebsd" the problem still EXIST.
But when after I change to "freebsd" I run "kldunload -f tcp_rack" the problem does NOT exist.
So loading the RACK module, but not loading it, does not trigger the issue.
Getting rid of all RACK based TCP connections seems to resolve the issue.
Can you do the following experiment?
1) Load RACK and use net.inet.tcp.functions_default=freebsd at boot time. You should not experience the problem.
2) Switch the stack for new connections to RACK by using sysctl net.inet.tcp.functions_default=rack. You can check the stack being used by using sockstat -SPtcp. The problem should now show up once new connections are established.
3) Switch the stack for new connections to the base stack by using sysctl net.inet.tcp.functions_default=freebsd. Either wait until the connections using RACK have been closed or kill them by using tcpdrop -S rack. This should resolve the issue.
If the behaviour is as I think it is, the problem is only there if you have active RACK connections.
To avoid reboot I did this which I believe is the same:
I restart all services so new connections (mostly nginx) use "rack".
I wait for few minutes and interrupts increase to ~5%
tcpdrop -S rack
Interrupts decrease to ~ 0.3% (looks like less connections = less interrupts)
New connections still use "rack", so I restart again all services.
Now all connections use "freebsd" and interrupts decrease to ~ 0.0%
Also something I notice is that during the issue if I run "sysctl kern.eventtimer.timer=HPET" then interrupts immediately increase. If I run "sysctl kern.eventtimer.timer=LAPIC" the interrupts immediately decrease.
So it looks like that active rack connections cause the issue.
(In reply to Christos Chatzaras from comment #5)
Thanks for testing, I guess you confirmed what I assumed is causing the behaviour.
I have no idea why this happens and if it is sort of expected or not. I'll try to reproduce it locally and will report. I'll also bring it up on a biweekly transport call.
I was able to reproduce the issue with a "test" server && 10 VPS running Linux. On each VPS I run wrk (benchmarking tool):
wrk -c 1000 -d 3600s http://url
This creates 10000 concurrent connections to "test" server.
Also in my nginx.conf I replace "keepalive_timeout 60;" with "keepalive_timeout 0;" so connections are not reused which helps to show more interrupts faster.
Then I kill all "wrk" and after 1 minute:
sockstat -sSPtcp | grep rack | wc -l
At that moment "top" shows 20% interrupts and "netstat 1" show 1-10 packets / sec which I believe is my ssh session, so no activity. Also at that moment tcp "rack" states were FIN_WAIT_1 and CLOSING.
After few minutes most "rack" connections close and "top" shows ~ 0.7% interrupts. At that moment I had 4 stuck connections in LAST_ACK state which I drop using "tcpdrop -s LAST_ACK" and finally "top" shows ~ 0.0% interrupts.
If it helps I can give root access to the "test" server.
(In reply to Christos Chatzaras from comment #7)
According to rrs@ (who wrote the HPTS system), the interrupt issue is know. There is code, which reduces the interrupt load, but that code has not yet been committed to the tree.
Regarding stuck connections: Do they clear up after some time or do they keep around?
HPTS system is "kern.eventtimer.timer=HPET", right? With HPET I see (with "top") ~ 2 times more interrupts in comparison with LAPIC.
I tried again to reproduce the issue with a STABLE/13 kernel and it exists too. Next I will try with a CURRENT kernel as I see some fixes for RACK.
Regarding the "stuck" connection in LAST_ACK state someone tries to brute force SSH and "sshguard" blocks the connections (I have 4 "stuck" connection for SSH at the moment). Also I have 2 "stuck" connections in port 80 from "wrk benchmark". I found this: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=25986 which is old PR but @johalun replied that he can see this issue few months ago. But they are not talking about "rack" so this is not related. The only relation with "rack" is that these "stuck" connections create interrupts. At the moment these connections "stuck" for more than 35 minutes. Also I see some "stuck" connections in other servers that use "freebsd" stack.
(In reply to Christos Chatzaras from comment #9)
> HPTS system is "kern.eventtimer.timer=HPET", right? With HPET I see (with "top") ~ 2 > times more interrupts in comparison with LAPIC.
No. HPTS is a system for high resolution timing, which can be used by TCP. The configuration parameters you are referring to are generic time sources.
Please note that when using HPTS with RACK, more events are generated and handled compared to the RACK stack. That is what HPTS is for. However, the interrupt load
can be reduced by some optimisations. These are the optimisations rrs@ is referring to.
> I tried again to reproduce the issue with a STABLE/13 kernel and it exists too. Next > I will try with a CURRENT kernel as I see some fixes for RACK.
releng/13 and stable/13 should be very similar with respect to RACK. All improvements
are only in current right now.
> Regarding the "stuck" connection in LAST_ACK state someone tries to brute force SSH and "sshguard" blocks the connections (I have 4 "stuck" connection for SSH at the moment). Also I have 2 "stuck" connections in port 80 from "wrk benchmark". I found this: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=25986 which is old PR but @johalun replied that he can see this issue few months ago. But they are not talking about "rack" so this is not related. The only relation with "rack" is that these "stuck" connections create interrupts. At the moment these connections "stuck" for more than 35 minutes. Also I see some "stuck" connections in other servers that use "freebsd" stack.
I don't know how sshguard works, but I would be interested in understanding why that
SSHGuard ( https://www.freshports.org/security/sshguard ) is a tool that checks /var/log/auth.log , /var/log/maillog, etc and blocks IPs that try to brute force passwords. When someone tries to login for example to SSH multiple times using wrong password it blocks their IP using IPFW (or PF). As the remote host is not "reachable" any more we can't receive any packets from it which I believe makes the connection to stuck in LAST_ACK state.
I tried to use a CURRENT kernel with 13.0 userland, I connect successfully using SSH, but after I run the "wrk benchmark" the server hang. Maybe I have to build a CURRENT userland too. I ask datacenter to connect a KVM to see if monitor shows more information.
Created attachment 225360 [details]
The CURRENT kernel (with 13.0 userland) panic because of LRO. I disable LRO and I successfully boot the server. But during the "wrk benchmark" it panic (tcp_hpts).
I have MFCed to stable/13 some performance improvements committed by rrs@ to main.
Can you retest to see if the load is now less than before?
(In reply to Michael Tuexen from comment #13)
Thank you. At the moment I don't have a test system available. When I have I will redo the tests and report the results.