Created attachment 246389 [details] Kernel config After we upgraded some of our hosts from FreeBSD 12.4 to 13.2 we started seeing some of our physical hosts freezing randomly and without panicing. We were able to narrow it down to some IPFW rules. The setup is the following: - Host A: Recently upgraded physical host with FreeBSD 13.2 - Host B: Also a physical host with FreeBSD 13.2 which runs a webserver, serving our own pkg-repos (10.1.1.20). Webserver doesn't matter - we tried Apache, nginx, thttpd. Filetype also doesn't matter (.pkg, .txt, whatever :)). We also tried with multiple hosts. Host A has following IPFW rule: ipfw add 1000 allow ip from me to 10.1.1.20/32 uid 0 Host B has the following IPFW rule: ipfw add 2000 allow tcp from any to 10.1.1.20 80,443 keep-state We can reproduce the freeze by repeatedly fetching a file on Host A from Host B (we initially triggered the bug when running "pkg upgrade"): [root@host-a] $ while true; do curl -v http://10.1.1.20/test.txt --output /dev/null; done Sometimes immediately, sometimes after a few seconds the network connection of Host A is lost. Sometimes we are still able to log in through a local shell. Sometimes after a few seconds, sometimes after 1-2 minutes the host freezes completely. There is no kernel panic and nothing in the logs. Host B is still running fine and doesn't freeze. Whats interesting: - Freezes do NOT happen if the "uid 0" selector from Host A's rule is removed. - Freezes do NOT happen if the "keep-state" of Host B's rule is removed. - Freezes do NOT happen with our virtual servers - only physical hosts are affected. After building and installing the kernel with debug options, we were finally able to cause a panic and get some more informations. The Kernel has been built with the following options: makeoptions DEBUG=-g options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_SKIPSPIN options DIAGNOSTIC You can find the full kernel config attached (config.txt). PANIC: rw_rlock: wlock already held for tcpinp @ /usr/src/sys/netinet/in_pcb.c:2529 Attached you find: - HW-Info.txt: Hardware information of one of the hosts that freezes. Other hosts that freeze (and also Host B of the example abvoe) are physical too and also use the same NIC-Driver (ix). - info.txt: File written @ Panic - Contains FreeBSD version info and so on. - config.txt: Kernel config (see above). - ddb.txt: Contains the ddb-Dump - reduced to the relevant stuff (panic, backtrace, locks). The full ddb-Dump (containing all procs) can be provided if needed. Any help in further debuging or fixing this would be highly appreciated!
Created attachment 246390 [details] File written @ Panic - Contains FreeBSD version info and so on
Created attachment 246391 [details] Hardware information of one of the hosts that freezes. Other hosts that freeze (and also Host B of the example abvoe) are physical too and also use the same NIC-Driver (ix).
Created attachment 246392 [details] Contains the ddb-Dump - reduced to the relevant stuff (panic, backtrace, locks). The full ddb-Dump (containing all procs) can be provided if needed.
I've just added Dimitry Andric, Gordon Bergling and Michael Tuexen - in the hope you guys maybe have an idea or can help to triage. Reason: You guys mostly commited to sys/netpfil/ipfw and sys/netinet. Thanks for any help in advance!
(In reply to Dani I. from comment #4) It seems related to ipfw, so I added kp@. I also added glebius@, since inp locks seems to be affected and he worked on it.
(In reply to Michael Tuexen from comment #5) you may have mixed up firewalls there :) ipfw is the one maintained by smart people. I do pf.
(In reply to Kristof Provost from comment #6) Kristof, yes I mixed them up. Sorry for the noise. Feel free to unsubscribe yourself from the PR.
I'd really appreciate if you check out FreeBSD 14.0-RELEASE. The inpcb synchronization has underwent total rewrite in there. I'd be really interested to see if the bug has gone. The ipfw/uid checks hasn't undergo much change, so it could be the bug is still there. However, debugging the problem on 14.0-RELEASE is much more probable to end up in problem resolution rather than on 13.2, where old inpcb database is in use.
(In reply to Michael Tuexen from comment #5) Thanks for triaging! As kp@ isn't responsible for ipfw - do you know who could maybe help there? Thanks! (In reply to Gleb Smirnoff from comment #8) For us it's currently the "worst" option to upgrade to 14.0, as we're already having issues with 13.2 and we didn't make very good experiences with early RELENG-Releases in the past. As we didn't have these issues with 12.4, we'd prefer to debug this further with 13.2 and would also be able to revert commits, test patches and try debug builds. We also saw that there weren't that much changes for ipfw between 12.4 and 13.2 - so maybe the changes are somewhere else in the network stack. The question is, where to search and what could have an impact here. We'd really appreciate any help to narrow it down and find releated changes between 12.4 and 13.2 that might cause the issue.(In reply to Gleb Smirnoff from comment #8)
Created attachment 246450 [details] Panic caused by sysctl net.link.ether.ipfw=1 A small update: 1: We're also able to trigger a panic when setting the following sysctl: # sysctl net.link.ether.ipfw=1 After setting the sysctl the system crashes after a few seconds. The panic seems to be the same but the locks shown in the dump seem to be different. 2: This isn't specific to http-traffic, more to tcp-traffic in general - we can also reproduce this with ssh for example. 3: While debugging and trying out, we tried setting "hw.ix.enable_rss" to 0 in loader.conf (default=1), which didn't have any impact (panic still occurs).
I'm fairly sure the problem isn't fixed in 14.0. tcp_output() and ip_output() may transmit a packet while holding the inpcb lock, and the pfil hook in ether_output_frame() doesn't know this, so when ipfw does a jail/user/group match, it tries to look up the inpcb and re-locks it, resulting in recursion. Actually, I am a bit surprised that TCP/IP layers hold onto the PCB lock while calling into lower layers.
Normally inp pointer is passed down to ipfw and if it is passed, the lookup is avoided. So in the trace with tcp_respond the pointer was somehow lost and ipfw looked up the same inpcb again. I'll try to reproduce it first on 13 and if I can, then will check 14.
(In reply to Gleb Smirnoff from comment #12) Oh, I was looking at the crash from the net.link.ether.ipfw=1 case, where the problem is clear: ether_output_frame() cannot pass an inp at all. But yes, you're right, we should be passing the inpcb to the ip_output() pfil hook, so it seems there are two bugs here.
I assumed that ether_ipfw & uid/gid rules problem has been there since very beginning and is well known, maybe even documented. :)
(In reply to Mark Johnston from comment #11) (In reply to Gleb Smirnoff from comment #14) Hey guys, thanks already for assisting and helping! That is really very much appreciated! Please just get back to us if we can help in any way! I'm not sure what time range you mean when you say "I assumed that ether_ipfw & uid/gid rules problem has been there since very beginning and is well known, maybe even documented. :)", but as mentioned in the initial bug report, we didn't not have this issue at all with FreeBSD 12.4. The same hardware and ipfw rules (with uid matching) worked like a charm and without any problem. So there must have been some changes between 12.4 and 13.2 which causes this "new" behavior.
Did you have net.link.ether.ipfw=1 on FreeBSD 12?
(In reply to Gleb Smirnoff from comment #16) Ah sorry, that wasn't clear in my last comment. No, we have just used "net.link.ether.ipfw=1" while testing with out 13.2 hosts - we do not have it set in productive environment. What i meant were the fw rules from our initial report, which worked well and without a problem on 12.4: Host A has following IPFW rule: ipfw add 1000 allow ip from me to 10.1.1.20/32 uid 0 Host B has the following IPFW rule: ipfw add 2000 allow tcp from any to 10.1.1.20 80,443 keep-state
Right. I was saying that a bug with Layer2 ipfw and uid/gid rules is a lifetime bug, I believe. I haven't yet started looking into the problem. But a script to quickly reproduce the problem would be appreacited, if you already have one. Or just a sequence of commands (as a recipe).
Basically a simple setup like this should be enough: Host 1: pkg install thttpd mkdir /webdata chown root:www /webdata curl https://download.freebsd.org/ftp/releases/amd64/amd64/13.2-RELEASE/kernel.txz --output /webdata/kernel.txz /etc/rc.conf: thttpd_enable="YES" thttpd_flags="-d /webdata-r -u www -l /dev/null -h 10.1.1.20" "basic" ipfw ruleset with default deny / allow any: ipfw add 1000 allow tcp from any to 10.1.1.20 80 keep-state Host 2 (the one that freezes): "basic" ipfw ruleset with default deny / allow any: ipfw add 1000 allow ip from me to 10.1.1.20/32 uid 0 while true; do curl http://10.1.1.20/kernel.txz --output /dev/null; done Just to mention it again: We only experience the issue with this setup on PHYSICAL hosts (HW-Info see: https://bz-attachments.freebsd.org/attachment.cgi?id=246391). We do not have this issue with virtual machines running @ OpenStack and using the vtnet driver.
Dani, I tried to reach you via email, but outlook.com blocks email from me. I'm interested in obtaining the core file that corresponds to the backtrace that contains tcp_respond.
(In reply to Gleb Smirnoff from comment #20) Sadly the core.txt.0 file only contains: /var/crash/vmcore.0 not found Any idea? I could only provide you the full ddb.txt, but i'm not sure if that would help. If i'd be able to generate a core dump, i surely could send it to your freebsd mail?
Tiny update: If we configure the IPFW rules like this (keep-state on both hosts), we can't trigger the crash anymore: Host A has following IPFW rule: ipfw add 1000 allow ip from me to 10.1.1.20/32 uid 0 keep-state Host B has the following IPFW rule: ipfw add 2000 allow tcp from any to 10.1.1.20 80,443 keep-state
Having a core will really help to get some progress. Can you please re-check your dump configuration and see if you can generate a core? https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/#kerneldebug-obtain
(In reply to Gleb Smirnoff from comment #23) We've finally managed to get a coredump. The dump is 630MB (compress with zstd) or 9.6GB uncompressed. Do you offer a save way to transfer the dump to you? Do you need anything else than the vmcore.0-file?
I will need /boot/kernel/kernel and all loaded modules as well as their corresponding .debug parts from /usr/lib/debug/boot/kernel. Can you please contact me via email to glebius@FreeBSD.org with some other email address? I was trying to reach you, but outlook.com blocks email from my personal mail server (it is glebi.us).
(In reply to Gleb Smirnoff from comment #25) You've got mail :). If you didn't receive anything, please get back to me.
https://reviews.freebsd.org/D43065
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=513f2e2e7180202167ca2963d815d2a4c3ac0af9 commit 513f2e2e7180202167ca2963d815d2a4c3ac0af9 Author: Gleb Smirnoff <glebius@FreeBSD.org> AuthorDate: 2023-12-19 19:24:17 +0000 Commit: Gleb Smirnoff <glebius@FreeBSD.org> CommitDate: 2023-12-19 19:24:17 +0000 tcp: always set tcp_tun_port to a correct value The tcp_tun_port field that is used to pass port value between UDP and TCP in case of tunneling is a generic field that used to pass data between network layers. It can be contaminated on entry, e.g. by a VLAN tag set by a NIC driver. Explicily set it, so that it is zeroed out in a normal not-tunneled TCP. If it contains garbage, tcp_twcheck() later can enter wrong block of code and treat the packet as incorrectly tunneled one. On main and stable/14 that will end up with sending incorrect responses, but on stable/13 with ipfw(8) and pcb-matching rules it may end up in a panic. This is a minimal conservative patch to be merged to stable branches. Later we may redesign this. PR: 275169 Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D43065 sys/netinet/tcp_input.c | 1 + 1 file changed, 1 insertion(+)
^Triage: assign to committer that resolved; set possible MFC flags.
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=943814893bda5d74d8a2cace0a613dd5733c2845 commit 943814893bda5d74d8a2cace0a613dd5733c2845 Author: Gleb Smirnoff <glebius@FreeBSD.org> AuthorDate: 2023-12-19 19:24:17 +0000 Commit: Gleb Smirnoff <glebius@FreeBSD.org> CommitDate: 2024-01-02 19:06:31 +0000 tcp: always set tcp_tun_port to a correct value The tcp_tun_port field that is used to pass port value between UDP and TCP in case of tunneling is a generic field that used to pass data between network layers. It can be contaminated on entry, e.g. by a VLAN tag set by a NIC driver. Explicily set it, so that it is zeroed out in a normal not-tunneled TCP. If it contains garbage, tcp_twcheck() later can enter wrong block of code and treat the packet as incorrectly tunneled one. On main and stable/14 that will end up with sending incorrect responses, but on stable/13 with ipfw(8) and pcb-matching rules it may end up in a panic. This is a minimal conservative patch to be merged to stable branches. Later we may redesign this. PR: 275169 Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D43065 (cherry picked from commit 513f2e2e7180202167ca2963d815d2a4c3ac0af9) sys/netinet/tcp_input.c | 1 + 1 file changed, 1 insertion(+)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=7bad06cd0d8bf63a47cee1b209fd6c1864fb2da5 commit 7bad06cd0d8bf63a47cee1b209fd6c1864fb2da5 Author: Gleb Smirnoff <glebius@FreeBSD.org> AuthorDate: 2023-12-19 19:24:17 +0000 Commit: Gleb Smirnoff <glebius@FreeBSD.org> CommitDate: 2024-01-02 19:07:15 +0000 tcp: always set tcp_tun_port to a correct value The tcp_tun_port field that is used to pass port value between UDP and TCP in case of tunneling is a generic field that used to pass data between network layers. It can be contaminated on entry, e.g. by a VLAN tag set by a NIC driver. Explicily set it, so that it is zeroed out in a normal not-tunneled TCP. If it contains garbage, tcp_twcheck() later can enter wrong block of code and treat the packet as incorrectly tunneled one. On main and stable/14 that will end up with sending incorrect responses, but on stable/13 with ipfw(8) and pcb-matching rules it may end up in a panic. This is a minimal conservative patch to be merged to stable branches. Later we may redesign this. PR: 275169 Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D43065 (cherry picked from commit 513f2e2e7180202167ca2963d815d2a4c3ac0af9) sys/netinet/tcp_input.c | 1 + 1 file changed, 1 insertion(+)