Created attachment 246887 [details] extract from /var/log/messages with KASAN output The broadcom bwn Wifi driver causes a kernel panic soon after a connection with an encrypted network is established. Soon means: immediate to approx. 30sec. after the connection is established. - This did not happen with FreeBSD RELEASE 13.x - This does not happen when connecting to an unencrypted network - It crashes both with CCMP and TKIP encryption Hardware: ————————- Macbook Pro 3,1 with Broadcom BCM4321 wireless card Setup: ______ At boot time, the bwn_v4_ucode firmware module is not loaded, the machine goes up and running. When subsequently the firmware is loaded and the interface restarted, it crashes after the connection is established. kldload bwn_v4_ucode service netif restart The WLAN is a 2,4GHz 802.11g network. wpa_supplicant.conf: network={ ssid="SomeNet" psk="SomePassword" pairwise=TKIP # or CCMP } The attached messages extract contains KASAN output indicating trouble in the ieee_80211 crypto module.
What is the actual kernel panic?
Do you have IEEE80211_DEBUG enabled? Also, if you have src+obj code, can you lookup and verify my assuptions based on main w/o kasan in kernel: ieee80211_crypto_ccmp.c:669 ff: 68 #1 0xffffffff81580899 at ccmp_encap+0xe99 (memcpy or debug error after?) 193 #2 0xffffffff815808df at ccmp_encap+0xedf (also around the memcpy, stats?) ieee80211_crypto_ccmp.c:299: 62 #1 0xffffffff81580953 at ccmp_encap+0xf53 (m_adj?) I am just curious as the distribution later is quite wide, which indeed seems to indicate that the data buffer at that point isn't valid. 25 #1 0xffffffff80599a6c at rijndaelEncrypt+0x25c 25 #1 0xffffffff80599af7 at rijndaelEncrypt+0x2e7 25 #1 0xffffffff80599b7c at rijndaelEncrypt+0x36c 24 #1 0xffffffff80599c01 at rijndaelEncrypt+0x3f1 26 #1 0xffffffff80599ca5 at rijndaelEncrypt+0x495 27 #1 0xffffffff80599d2c at rijndaelEncrypt+0x51c 21 #1 0xffffffff80599db1 at rijndaelEncrypt+0x5a1 20 #1 0xffffffff80599e36 at rijndaelEncrypt+0x626 Strangely the ccmp code has not seen functional changes in years.
Created attachment 246904 [details] vmcore backtrace with crypto_tkip used
Created attachment 246905 [details] vmcore backtrace with crypto_ccmp used
- The actual panic is caused by a page fault. - IEEE80211_DEBUG is enabled, the sysctl entry is present. For me it seems that the root cause could be somewhere in the common crypto code, since both CCMP and TKIP dump core. Pls. find attached two vmcore backtraces. I am not used working with gdb, but I will what Ican to help resolving the issue.
It would be useful (for me at least) to see the gdb backtrace from a KASAN report. That is, either use addr2line to answer Bjoern's question from comment 2[*], or get a kernel dump from a KASAN kernel. [*] I would also like to see what "ieee80211_encap+0x512" is. To get this info: $ sudo pkg install binutils $ echo "ieee80211_encap+0x512" | /usr/local/bin/addr2line -e /usr/lib/debug/boot/kernel/kernel.debug assuming that your kernel is installed at /boot/kernel. (If it was installed to, for example, /boot/kernel.test, the debug file path would be /usr/lib/debug/boot/kernel.test/kernel.debug.)
Given I would assume the release kernel wouldn't ship with KASAN on, is this indeed releng/14.0 or is this stable/14 ? What hash are you on according to uname -v? And are you inded on 11g or on 11ng?
(In reply to Bjoern A. Zeeb from comment #7) I checked out the releng/14.0 branch and compiled a GENERIC kernel with, KASAN BWN_DEBUG and BWN_GPL_PHY enabled, config file: include GENERIC-KASAN ident MACBOOKPRO31-KASAN options BWN_GPL_PHY options BWN_DEBUG uname -v: FreeBSD 14.0-RELEASE-p2 #2 releng/14.0-n265396-06497fbd52e2: Fri Dec 8 20:57:05 CET 2023 root@hilschubookBSD:/usr/obj/usr/src/amd64.amd64/sys/MACBOOKPRO31-KASAN (I had to recompile the KASAN Kernel because I unfortunately lost the old one) WLAN: The access point is configured for 802.11gn, so both are possible. I can try to figure out whether a g or n connection is established next time before the system crashes
(In reply to Mark Johnston from comment #6) "ieee80211_encap+0x512" - the line is: /usr/src/sys/net80211/ieee80211_output.c:1666
(In reply to Bjoern A. Zeeb from comment #2) ieee80211_crypto_ccmp.c:669 ff: 68 #1 0xffffffff81580899 at ccmp_encap+0xe99 (memcpy or debug error after?) -> ieee80211_crypto_ccmp.c:325 193 #2 0xffffffff815808df at ccmp_encap+0xedf (also around the memcpy, stats?) -> ieee80211_crypto_ccmp.c:? ieee80211_crypto_ccmp.c:299: 62 #1 0xffffffff81580953 at ccmp_encap+0xf53 (m_adj?) -> ieee80211_crypto_ccmp.c:325 25 #1 0xffffffff80599a6c at rijndaelEncrypt+0x25c -> rijndael-alg-fst.c:956 25 #1 0xffffffff80599af7 at rijndaelEncrypt+0x2e7 -> rijndael-alg-fst.c:962 25 #1 0xffffffff80599b7c at rijndaelEncrypt+0x36c -> rijndael-alg-fst.c:968 24 #1 0xffffffff80599c01 at rijndaelEncrypt+0x3f1 -> rijndael-alg-fst.c:974 26 #1 0xffffffff80599ca5 at rijndaelEncrypt+0x495 -> rijndael-alg-fst.c:986 27 #1 0xffffffff80599d2c at rijndaelEncrypt+0x51c -> rijndael-alg-fst.c:992 21 #1 0xffffffff80599db1 at rijndaelEncrypt+0x5a1 -> rijndael-alg-fst.c:998 20 #1 0xffffffff80599e36 at rijndaelEncrypt+0x626 -> rijndael-alg-fst.c:1004
(In reply to Frank Hilgendorf from comment #9) So that corresponds to: 1660 /* NB: this could be optimized 'cuz of ieee80211_mbuf_adjust */ 1661 m_adj(m, sizeof(struct ether_header) - sizeof(struct llc)); 1662 llc = mtod(m, struct llc *); 1663 llc->llc_dsap = llc->llc_ssap = LLC_SNAP_LSAP; 1664 llc->llc_control = LLC_UI; 1665 llc->llc_snap.org_code[0] = 0; 1666 llc->llc_snap.org_code[1] = 0; <--- starts here 1667 llc->llc_snap.org_code[2] = 0; 1668 llc->llc_snap.ether_type = eh.ether_type; which seems to make sense, since KASAN reports 2 1-byte writes followed by a 2-byte write. That is strange. I suspect the bug is elsewhere. Perhaps something (bwn?) concurrently freed the mbuf chain?
I repeated the test 5 times. The 5 reports first entry was always 'ieee80211_encap+0x512'. The bug seems to work deterministic. If it was kind of a race condition, where thread B frees the mbuf currently processed by thread A, I would expect multiple KASAN reports to start with different code lines because they should not be so exactly synchronized that thread B would always hit exactly the same point of execution of thread A. One more observation: Running the test with an unencrypted WLAN connection produces no KASAN report at all.
(In reply to Frank Hilgendorf from comment #12) I think you're right. In that case, it'd be useful to see a kernel dump from KASAN, i.e., configure KASAN to panic when a report is generated (which is the default setting). In particular, I'd want to see the output of "p *m" from the ieee80211_encap() frame.
Hmm, for bwn vap->iv_ic->ic_headroom is pretty big, it's equal to sizeof(struct bwn_txhdr), which is 118 bytes. So, ieee80211_mbuf_adjust() is going to set needed_space = 118 + 24 + 8 for CCMP. If ieee80211_mbuf_adjust() needs to allocate a new mbuf header, the header will have MHLEN - 150 = 10 bytes of space. The assertion "needed_space <= MHLEN" should actually be "needed_space + sizeof(struct ether_header) <= MHLEN", and I suspect that that assertion would fail for bwn. I'm not sure what changed in 14.0 to break - maybe struct mbuf layout changes that made MHLEN smaller?
(In reply to Mark Johnston from comment #14) Oops, my comment about the assertion is wrong. I don't really understand the frame layout. But the problem is clear: ieee80211_mbuf_adjust() may return an mbuf with just MHLEN - 150 == 10 bytes left in its buffer, and ieee80211_encap() then calls m_adj(m, 6), leaving 4 bytes of space, but sizeof(struct llc) is 8 bytes. And, in stable/13 sizeof(struct pkthdr) is 8 bytes smaller, so there was enough space then, but now it's not enough.
Created attachment 246957 [details] output of gdb, ieee80211_encap() *m in reply to comment #13
(In reply to Frank Hilgendorf from comment #16) Right, so here m_data is 0xfffffe006f8cb8fc, which is 4 bytes before the end of the buffer, but m_len is 8, which can't be right. ieee80211_encap() is assuming that ieee80211_mbuf_adjust() returns an mbuf with at least sizeof(ether_header) byte of space, and that assumption is false now. Hopefully Bjoern can comment on how this should be solved.
(In reply to Mark Johnston from comment #17) I'll take it and have a look. Probably also with an assertion rather than a comment...
(In reply to Bjoern A. Zeeb from comment #18) And just as a semi-related comment given I hit it as well with rtw88 before (but hitting different assertions which also were changed during 14-CURRENT): https://lists.freebsd.org/archives/freebsd-transport/2022-February/000012.html / 3d09d310d9981dde1d6e51fed6ecf9576480b9f7 / 232d323ef227109acce37f5a0d62492673666ee2 / which was later improved by c414347bc572219a0c849853ade6e664a6092f33 allowing us to grow. In my initial email I already called bwi/bwn out for the large ic_headroom.
(In reply to Bjoern A. Zeeb from comment #19) Possible solution? I was wondering what the bwn driver needs the huge amount of space in each data mbuf for. Ok, the headroom in the mbufs claimed by the bwn driver is intended for a 'struct bwn_txhdr'. This seems to be driver/hardware related stuff that is not going to be sent out onto the network. Looking into the bwn driver code, I found no piece of code using this space in the mbuf, not writing nor reading. Maybe the reserved space was not used at all? To check it out, in if_bwn.c I set ic->ic_headroom to 0 and... now the bwn driver works without complaints. I cannot check this with other broadcom chips, but unless there is something I overlooked it might solve the problem.
(In reply to Frank Hilgendorf from comment #20) The net80211 code needs adjusting anyway as even if bwn now works other drivers exist. In the non-native LinuxKPI based drivers I work around the problem by copying data which is annoying too. So even if the code now works the next driver will come or avail mbuf size may shrink again and make it go beyond assumptions as well. I have a draft for net80211 but let me look into bwn for a minute. Hmm it indeed looks like the code DMAs the txhdr separately from the mbuf data later rather than merging the txhdr with the mbuf and then dmaing it all in possibly a single segment. Seems it's been doing this since inception on FreeBSD and ic_headroom indeed was never used but only set. Good spotting! Can I take your ic_headroom = 0 as a patch you provided and commit it with you as Author and then I can do net80211 bits in peace and quite as a separate thing?
For the records: bwi, malo, and mwl all do seem to use the ic_headroom and M_PREPEND(). And mt76, rtw88 and rtw89 would love to.
(In reply to Bjoern A. Zeeb from comment #21) Yes, please proceed.
(In reply to Bjoern A. Zeeb from comment #21) It is just a patch for Freebsd, but a solution for me :-) .
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=59dba901f227420647e4856f03cb782a3375c220 commit 59dba901f227420647e4856f03cb782a3375c220 Author: Frank Hilgendorf <frank.hilgendorf@posteo.de> AuthorDate: 2023-12-13 23:48:08 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2023-12-13 23:54:05 +0000 bwn: remove unused ic_headroom Unlike bwi(4), bwn(4) does not rely on ic_headroom (despite having it set) but splits the bwn_txhdr (first) segment into its own transaction. Remove ic_headroom to avoid net80211 troubles with not enough space in the mbuf around ieee80211_mbuf_adjust(). PR: 275616 MFC after: 3 days sys/dev/bwn/if_bwn.c | 2 -- 1 file changed, 2 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=4e219655d0f7e1134c040468c0e7d91c671e57fb commit 4e219655d0f7e1134c040468c0e7d91c671e57fb Author: Frank Hilgendorf <frank.hilgendorf@posteo.de> AuthorDate: 2023-12-13 23:48:08 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-18 18:31:14 +0000 bwn: remove unused ic_headroom Unlike bwi(4), bwn(4) does not rely on ic_headroom (despite having it set) but splits the bwn_txhdr (first) segment into its own transaction. Remove ic_headroom to avoid net80211 troubles with not enough space in the mbuf around ieee80211_mbuf_adjust(). PR: 275616 (cherry picked from commit 59dba901f227420647e4856f03cb782a3375c220) sys/dev/bwn/if_bwn.c | 2 -- 1 file changed, 2 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=0f032804b6ec70c8f8ee4fe1ed83a3356946b5e8 commit 0f032804b6ec70c8f8ee4fe1ed83a3356946b5e8 Author: Frank Hilgendorf <frank.hilgendorf@posteo.de> AuthorDate: 2023-12-13 23:48:08 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 08:01:59 +0000 bwn: remove unused ic_headroom Unlike bwi(4), bwn(4) does not rely on ic_headroom (despite having it set) but splits the bwn_txhdr (first) segment into its own transaction. Remove ic_headroom to avoid net80211 troubles with not enough space in the mbuf around ieee80211_mbuf_adjust(). PR: 275616 (cherry picked from commit 59dba901f227420647e4856f03cb782a3375c220) sys/dev/bwn/if_bwn.c | 2 -- 1 file changed, 2 deletions(-)
A commit in branch releng/13.3 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=dc50f981e17c39a486616315d142785a20f87612 commit dc50f981e17c39a486616315d142785a20f87612 Author: Frank Hilgendorf <frank.hilgendorf@posteo.de> AuthorDate: 2023-12-13 23:48:08 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 16:05:54 +0000 bwn: remove unused ic_headroom Unlike bwi(4), bwn(4) does not rely on ic_headroom (despite having it set) but splits the bwn_txhdr (first) segment into its own transaction. Remove ic_headroom to avoid net80211 troubles with not enough space in the mbuf around ieee80211_mbuf_adjust(). PR: 275616 Approved by: re (cperciva) (cherry picked from commit 59dba901f227420647e4856f03cb782a3375c220) (cherry picked from commit 0f032804b6ec70c8f8ee4fe1ed83a3356946b5e8) sys/dev/bwn/if_bwn.c | 2 -- 1 file changed, 2 deletions(-)
^Triage: committed and MFCed.
Re-open as the underlying problem is still not fixed (hence the change in title).