rtw88 experience issues with skb allocation after few hours of uptime, spamming console with "skb allocation failed" and causing networking to stop ( can not ping wireless access point even ) Dec 21 11:35:39 kernel: rtw880: ERROR lkpi_80211_txq_tx_one: skb alloc failed Dec 21 11:42:13 kernel: rx routine starvation Dec 21 11:42:13 kernel: rtw880: ERROR lkpi_80211_txq_tx_one: skb alloc failed Dec 21 11:42:40 syslogd: last message repeated 9 times Dec 21 11:44:54 syslogd: last message repeated 621 times It happens with rtw880@pci0:2:0:0: class=0x028000 rev=0x00 hdr=0x00 vendor=0x10ec device=0xc821 subvendor=0x1a3b subdevice=0x3040 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8821CE 802.11ac PCIe Wireless Network Adapter' class = network wifi adapter, another important point that it is vanilla CURRENT amd64 main-n274315-da2c88dfcf4f-dirty with the patch applied from https://reviews.freebsd.org/D48161 , sysctl compat.linuxkpi.skb.mem_limit is set to 1
Oleg provided me with some data; I also have a Dtrace script to run and a possible patch (not finalized) based on a hunch (so it may or may not solve the problem). I'll try to follow-up here the next days. PS: the referenced patch from D48161 is in main as-of today.
(In reply to oleg.nauman from comment #0) It should be noted, as said by Oleg in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283142#c20, that this part of the bug is quite old, and not tied to master + D48161. The same happens on a 14.1-RELEASE (releng/14.1-n267679-10e31f0946d8).
Here it seems to happen only after memory has been exhausted at least once: after a reboot there *always* (see (1)) have to happen a bunch of "kernel: pid xxx (xxxxx), jid 0, uid 1001, was killed: failed to reclaim memory" (2) before any "skb alloc failed" appears. Could it be that the leaks only happen after some allocations have been confined to the swap, and then even when mem gets freed the driver is unable to regain its ability to allocate elsewhere than to swap? (please forgive my FreeBSD and VM newbiness if my hypothesis is filled with inaccuracies and nowhere near reality) (1) { bunzip2 < messages.0.bz2 ; cat messages ; } | egrep alloc\|memory | awk '/real memory/{s="0 boot"}/failed to reclaim memory/{s="1 killed"}/skb alloc failed/{s="2 skb"}old!=s{if(old)print n" x\t"old;n=0;old=s}{++n}' (2) Repetitively building a full LLVM for all its architectures + clang + compiler-rt, with make -j4, on a 8 GB laptop.
(In reply to Guillaume Outters from comment #3) Hello, I can not confirm that "skb allocation failure" happens only when virtual memory is exhausted Could you please directly confirm that you also observing "skb allocation failure" messages? Unfortunately Bjoern can not reproduce it so I was under assumptions so it is somehow triggered by my load pattern ( laptop which I am using as my desktop ) Thank you
(In reply to oleg.nauman from comment #4) Yes, I have the same: Dec 31 16:39:14 pasdfric kernel: rtw880: ERROR lkpi_80211_txq_tx_one: skb alloc failed as you, with a sysctl compat.linuxkpi.skb.mem_limit to 1 too. In my case they start interleaved with, or sometimes hours after, some: Dec 31 16:35:26 pasdfric kernel: pid 54454 (clang++), jid 0, uid 1001, was killed: failed to reclaim memory But once I started having some "skb alloc failed", the situation degrades for rtw88, *even when pressure on memory has decreased* (no "failed to reclaim memory" anymore). So it is inexact that it occurs "when VM is exhausted", it is more that it occurs "when the system got through at least one VM exhaust since booting". If I take another session of December 31 from /var/log/messages, from boot to crash, I get: 18:47:20 boot 19:41:08 two kills after VM exhaust 20:19:51 first skb alloc failed (repeated hundreds of times until a reboot) In this case there were no interleaves of skb alloc with the VM exhaust, it came after. Maybe the problem is not about going into virtual mem, but instead just crossing the 4 GB border, which would explain why the problem occurs even after the memory pressure diminished. I would say that: 1. either a fisrt failed allocation, due to memory exhaust, let the driver in a dirty state it is unable to recover from even after memory became available again (the structure it wanted to allocate was crucial for later operations); but if it was the case you should find memory exhaust related logs in your /var/log/messages, between the boot and the first skb alloc failed 2. or that, once the kernel allocates at adresses > 4 GB, LinuxKPI declines the allocs
Hello, It would be nice if you provide Bjoern with output vmstat -m | egrep -E "lkpiskb|Memory" when it happens again
Whoops, I got a new one! After an uptime of more than 2 days without any problem, I once again had two killed processes yesterday evening (while building GHC, the Haskell compiler), and then this morning while trying a big rsync, an "skb alloc failed" loop happened. Jan 8 16:01:44 Boot Jan 10 22:26:56 pasdfric kernel: pid 9508 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory Jan 10 22:36:53 pasdfric kernel: pid 78425 (ghc), jid 0, uid 1001, was killed: failed to reclaim memory Jan 11 11:20:18 pasdfric kernel: rtw880: ERROR lkpi_80211_txq_tx_one: skb alloc failed Jan 11 11:20:19 pasdfric kernel: rtw880: ERROR lkpi_80211_txq_tx_one: skb alloc failed Jan 11 11:21:38 pasdfric kernel: rtw880: ERROR lkpi_80211_txq_tx_one: skb alloc failed […] At 14:00 my rsync still ran, but at about 10 KB/s. Here is the vmstat -m (with everything lkpi-related): Type Use Memory Req Size(s) lkpi80211 35 42112 39 128,256,1024,2048,32768 lkpikmalloc 7531 17555504 260476706 16,32,64,128,256,384,512,1024,2048,4096,8192,16384,32768,65536 lkpifw 12 384 24 32,128 lkpiskb 580877 2391855104 8175104 If I read well that makes *2 GB* for lkpiskb??? After a reboot, the rsync finished really quickly (can't estimate at which speed, but many MB/s)
Created attachment 256704 [details] vmstat req and mem during scp before / after going rogue
OK, I was wrong thinking the compiler eating mem induced rtw88 into error: now I'm convinced that rtw88 ate so much mem that clang hadn't enough anymore. Twice this week I observed lpkiskb going rogue, unfortunately without detecting the precise moment it became Mr Hyde. But (as seen in https://bz-attachments.freebsd.org/attachment.cgi?id=256704), I managed to test a loop of a 2 MB scp every minute, while watching vmstat -m | grep lkpiskb, with surprising results: A. Before going rogue Each scp added to the req counter, but memory stayed remarkably stable, at 16 MB. I didn't even noticed any temporary increase of mem use during scps (and statistically, my probe every 10 seconds should have happened multiple times simultaneously with my 4 seconds lasting download). B. After having gone rogue Now the reqs are still there (hopefully!), but **mem use increases of ~ 3,4 MB** per scp of 2 MB (/!\ those are file bytes, not bandwidth; and I don't know the bandwidth / file bytes ratio of scp) And having the server lag helped saw something more interesting, due to multiple probes occurring during the transfer: the 3,4 MB didn't follow a monotonic growth, but instead the mem use increased considerably, and _only some_ mem got freed once scp finished. -------------------------------------------------------------------------------- Here a one-go increase (probes at :45 and :56, scp occured around between :47 and :51) (+ 3,4 MB): 2025-01-14 01:06:45 lkpiskb 329256 1361215488 9513310 +2 +8192 +212 2025-01-14 01:06:56 lkpiskb 330093 1364643840 9517636 +837 +3428352 +4326 2025-01-14 01:07:06 lkpiskb 330095 1364652032 9517845 +2 +8192 +209 Here a two-go (+ 5 MB - 1,6 MB): 2025-01-14 01:07:47 lkpiskb 330115 1364733952 9518743 +5 +20480 +223 2025-01-14 01:07:57 lkpiskb 331360 1369833472 9522027 +1245 +5099520 +3284 2025-01-14 01:08:07 lkpiskb 330958 1368186880 9523298 -402 -1646592 +1271 2025-01-14 01:08:17 lkpiskb 330964 1368211456 9523527 +6 +24576 +229 Another one-go (+ 6,8 MB - 3,4 MB): 2025-01-14 01:10:00 lkpiskb 331828 1371750400 9529814 +2 +8192 +217 2025-01-14 01:10:10 lkpiskb 333506 1378623488 9534152 +1678 +6873088 +4338 2025-01-14 01:10:20 lkpiskb 332671 1375203328 9534368 -835 -3420160 +216 2025-01-14 01:10:30 lkpiskb 332674 1375215616 9534584 +3 +12288 +216 Here periodic security on the server made it really hard for scp to finish, and let rtw88 free its mem: 2025-01-14 07:21:34 lkpiskb 586598 2415288320 11209247 +291 +1191936 +933 2025-01-14 07:21:57 lkpiskb 587114 2417401856 11210910 +516 +2113536 +1663 2025-01-14 07:22:23 lkpiskb 587579 2419306496 11212416 +465 +1904640 +1506 2025-01-14 07:22:45 lkpiskb 587877 2420527104 11213510 +298 +1220608 +1094 2025-01-14 07:23:07 lkpiskb 588170 2421727232 11214571 +293 +1200128 +1061 2025-01-14 07:23:29 lkpiskb 588474 2422972416 11215673 +304 +1245184 +1102 2025-01-14 07:23:49 lkpiskb 587441 2418741248 11216292 -1033 -4231168 +619 -------------------------------------------------------------------------------- while true ; do date "+%Y-%m-%d %H:%M:%S" ; vmstat -m | grep lkpi ; sleep 10 ; done | tee -a ~/tmp/depuisboot.log awk < ~/tmp/depuisboot.log '/^2025-/{h=$0}/lkpiskb/{variation="+"(-use+$2)"\t+"(-mem+$3)"\t+"(-req+$4);gsub(/[+]-/,"-",variation);gsub(/ +/,"\t");print h"\t"$0"\t"variation;use=0+$2;mem=0+$3;req=0+$4}' | less
This is a shot in the dark and a hack but can you two try to apply the changes from https://reviews.freebsd.org/D48474 and see if that helps? If it does we can find a proper solution not abusing an skb field...
(In reply to Bjoern A. Zeeb from comment #10) Hello Bjoern, I will test it tomorrow. Thank you
(In reply to Bjoern A. Zeeb from comment #10) I would be glad to add my laptop to the test battery, but I will need a bit of guidance on how a total newbie can run this patched driver (until now I only used FreeBSD on preinstalled dedicated servers, or on VirtualBox guests). Ideally I'd test it live, with a bit of kldunload / kldload, remaking only in modules/rtw88 and modules/linuxkpi_wlan. However, this only works for if_rtw88.ko (ifconfig wlan0 down ; kldunload if_rtw88.ko ; kldload /usr/obj/usr/src/amd64.amd64/sys/modules/rtw88/if_rtw88.ko ; sleep 1 ; ifconfig wlan0 up) On the other hand, the linuxkpi_wlan.ko materialized in /boot/kernel seems to be an empty shell, because kldload rejects it with a dmesg telling it's already part of the kernel (where I find the "skb allocation failed" string by the way). But in the GENERIC conf for the kernel, I see no dedicated setting for linuxkpi_wlan, only the main COMPAT_LINUXKPI switch that certainly should not go away. Do I have any other way than compiling a full kernel to test the linuxkpi_wlan part? I'd be grateful for any link for an "how-to get linuxkpi_wlan out of the kernel, to a module". Or a simple "no way" so that I go recompile my world without discuss^H^H^H^H^H^H^Hloosing my and your time anymore.
(In reply to Bjoern A. Zeeb from comment #10) It caused kernel panic after 1 hour 47 minutes of uptime Jan 17 08:21:23 kernel: rtw880: ERROR lkpi_80211_txq_tx_one: skb alloc failed 48 + 163, lsta 0xfffff80003654000 sta 0xfffff80003654080 ni 0xfffffe0117b8d000 Jan 17 08:21:23 kernel: rtw880: tid 0 ltxq 0xfffff8000395f400 seen_dequeue 1 stopped 0 skb_queue_len 0 Jan 17 08:21:23 kernel: rtw880: tid 1 ltxq 0xfffff8000395f500 seen_dequeue 1 stopped 0 skb_queue_len 0 Jan 17 08:21:23 kernel: rtw880: tid 2 ltxq 0xfffff8000395f600 seen_dequeue 0 stopped 0 skb_queue_len 0 Jan 17 08:21:23 kernel: rtw880: tid 3 ltxq 0xfffff8000395f700 seen_dequeue 0 stopped 0 skb_queue_len 0 Jan 17 08:21:23 kernel: rtw880: tid 4 ltxq 0xfffff8000395f800 seen_dequeue 0 stopped 0 skb_queue_len 0 Jan 17 08:21:23 kernel: rtw880: tid 5 ltxq 0xfffff8000395f900 seen_dequeue 0 stopped 0 skb_queue_len 0 Jan 17 08:21:23 kernel: rtw880: tid 6 ltxq 0xfffff8000395fa00 seen_dequeue 0 stopped 0 skb_queue_len 0 Jan 17 08:21:23 kernel: rtw880: tid 7 ltxq 0xfffff8000395fb00 seen_dequeue 0 stopped 0 skb_queue_len 0 Jan 17 08:21:23 kernel: rtw880: tid 8 ltxq 0xfffff8000395fc00 seen_dequeue 0 stopped 0 skb_queue_len 0 Jan 17 08:21:23 kernel: rtw880: tid 9 ltxq 0xfffff8000395fd00 seen_dequeue 0 stopped 0 skb_queue_len 0 Jan 17 08:21:23 kernel: rtw880: tid 10 ltxq 0xfffff8000395fe00 seen_dequeue 0 stopped 0 skb_queue_len 0 Jan 17 08:21:23 kernel: rtw880: tid 11 ltxq 0xfffff8000395e100 seen_dequeue 0 stopped 0 skb_queue_len 0 Jan 17 08:21:23 kernel: rtw880: tid 12 ltxq 0xfffff80003761600 seen_dequeue 0 stopped 0 skb_queue_len 0 Jan 17 08:21:23 kernel: rtw880: tid 13 ltxq 0xfffff80003706100 seen_dequeue 0 stopped 0 skb_queue_len 0 Jan 17 08:21:23 kernel: rtw880: tid 14 ltxq 0xfffff80003706000 seen_dequeue 0 stopped 0 skb_queue_len 0 Jan 17 08:21:23 kernel: rtw880: tid 15 ltxq 0xfffff800034afe00 seen_dequeue 0 stopped 0 skb_queue_len 0 Jan 17 08:21:23 kernel: Jan 17 08:21:23 syslogd: last message repeated 1 times Jan 17 08:21:23 kernel: Fatal trap 12: page fault while in kernel mode Jan 17 08:21:23 kernel: cpuid = 1; apic id = 01 Jan 17 08:21:23 kernel: fault virtual address = 0xffffffffffffffb0 Jan 17 08:21:23 kernel: fault code = supervisor read data, page not present Jan 17 08:21:23 kernel: instruction pointer = 0x20:0xffffffff806fbafc Jan 17 08:21:23 kernel: stack pointer = 0x28:0xfffffe00e466edb0 Jan 17 08:21:23 kernel: frame pointer = 0x28:0xfffffe00e466ee00 Jan 17 08:21:23 kernel: code segment = base rx0, limit 0xfffff, type 0x1b Jan 17 08:21:23 kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 Jan 17 08:21:23 kernel: processor eflags = interrupt enabled, resume, IOPL = 0 Jan 17 08:21:23 kernel: current process = 0 (thread taskq) Jan 17 08:21:23 kernel: rdi: ffffffff80c48880 rsi: fffffe00e466ec40 rdx: 0000000000000043 Jan 17 08:21:23 kernel: rcx: ffffffffffffff80 r8: 0000000000000043 r9: 00000000000000f0 Jan 17 08:21:23 kernel: rax: 0000000000000000 rbx: fffff80003654000 rbp: fffffe00e466ee00 Jan 17 08:21:23 kernel: r10: 0000000000000042 r11: fffffffffffff5cf r12: fffffe0118cb3000 Jan 17 08:21:23 kernel: r13: fffffe0117b8d000 r14: fffff8054596e200 r15: 0000000000000010 Jan 17 08:21:23 kernel: trap number = 12 Jan 17 08:21:23 kernel: panic: page fault Jan 17 08:21:23 kernel: cpuid = 1 Jan 17 08:21:23 kernel: time = 1737094737 Jan 17 08:21:23 kernel: Uptime: 1h47m51s ------------------------------------------ __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 57 __asm("movq %%gs:%c1,%0" : "=r" (td) (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 td = <optimized out> #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:404 error = 0 coredump = <optimized out> #2 0xffffffff80510839 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:524 once = 0 #3 0xffffffff80510cdb in vpanic (fmt=0xffffffff808946e7 "%s", ap=ap@entry=0xfffffe00e466ec70) at /usr/src/sys/kern/kern_shutdown.c:979 buf = "page fault", '\000' <repeats 245 times> __pc = 0x0 __pc = 0x0 __pc = 0x0 other_cpus = {__bits = {253, 0 <repeats 15 times>}} td = 0xfffff8010282a740 bootopt = <unavailable> newpanic = <optimized out> #4 0xffffffff80510b73 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:892 ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0xfffffe00e466eca0, reg_save_area = 0xfffffe00e466ec40}} #5 0xffffffff8083d396 in trap_fatal (frame=<optimized out>, eva=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:960 type = <optimized out> #6 0xffffffff8083d396 in trap_pfault (frame=0xfffffe00e466ecf0, usermode=<optimized out>, signo=<optimized out>, ucode=<optimized out>) __pc = 0x0 __pc = 0x0 __pc = 0x0 td = <optimized out> p = <optimized out> eva = 18446744073709551536 map = <optimized out> ftype = <optimized out> rv = <optimized out> #7 <signal handler called> No locals. #8 lkpi_80211_txq_tx_one (lsta=lsta@entry=0xfffff80003654000, m=0xfffff8054596e200) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:3830 tid = 16 skb_alloc_failures = 1 '\001' control = {sta = 0xfffffe00e466ed38} ni = 0xfffffe0117b8d000 k = <optimized out> wh = <optimized out> ic = 0xfffffe0118cb3000 lhw = 0xfffffe0118ca3200 hw = <optimized out> c = 0xfffffe0118cb34a4 skb = <optimized out> sta = <optimized out> ltxq = 0xffffffffffffff80 buf = <optimized out> lvif = <optimized out> vif = <optimized out> hdr = <optimized out> tid = <optimized out> ac = <optimized out> info = <optimized out> #9 0xffffffff806fcce3 in lkpi_80211_txq_task (ctx=0xfffff80003654000, pending=<optimized out>) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:3999 mq = {mq_head = {stqh_first = 0x0, stqh_last = 0xfffffe00e466ee10}, mq_len = 0, mq_maxlen = 50} m = 0xfffffe00e466ec40 lsta = <optimized out> shall_tx = <optimized out> #10 0xffffffff805708a2 in taskqueue_run_locked ( queue=queue@entry=0xfffff8000129d000) at /usr/src/sys/kern/subr_taskqueue.c:517 et = {et_link = {tqe_next = 0xfffffe00e466eef0, tqe_prev = 0x0}, et_td = 0x0, et_section = {bucket = 0}, et_old_priority = 0 '\000'} tb = {tb_running = 0xfffff80003654018, tb_seq = 511873, tb_canceling = false, tb_link = {le_next = 0x0, le_prev = 0xfffff8000129d010}} in_net_epoch = false pending = 1 task = <optimized out> #11 0xffffffff80571a72 in taskqueue_thread_loop ( arg=arg@entry=0xffffffff80eedd50 <taskqueue_thread>) at /usr/src/sys/kern/subr_taskqueue.c:829 tqp = <optimized out> tq = 0xfffff8000129d000 #12 0xffffffff804d254b in fork_exit ( callout=0xffffffff805719b0 <taskqueue_thread_loop>, arg=0xffffffff80eedd50 <taskqueue_thread>, frame=0xfffffe00e466ef40) at /usr/src/sys/kern/kern_fork.c:1152 __pc = 0x0 __pc = 0x0 td = 0xfffff8010282a740 p = 0xffffffff80e7da98 <proc0> dtd = <optimized out> #13 <signal handler called> No locals. #14 0x4b92e29a0de050f2 in ?? () No symbol table info available. Backtrace stopped: Cannot access memory at address 0x116bd3ff1f0b16f2 Thank you
(In reply to oleg.nauman from comment #13) I'll just put it here given I am a bit short on time the next days to look at this further. Oleg commented out the loop dumping the queues in LinuxKPI upon allocation error and with that could run the test with the remaining patch to the end. Unfortunately my shot in the dark was not the right place and the "leak" still happens. I'll go back and see if we can find out different ways as I find time (hopefully soon).
(regarding my question on how to start playing with the kernel and if linux_kpi was externalizable as a module, I resolved to learn how to compile and start a full custom kernel, which is a bit tedious but works) I could notice the fully binary behaviour of alloc, by tracing in linux_80211.c at: - lkpi_80211_txq_tx_one(), after the dev_alloc_skb() of https://github.com/freebsd/freebsd-src/blob/main/sys/compat/linuxkpi/common/src/linux_80211.c#L3816 - linuxkpi_ieee80211_free_txskb(), before the _lkpi_ieee80211_free_txskb() Test procedure: reboot around 14:40 with the traced kernel, run two loops of: while true ; do scp ... ; vmstat -m | grep skb ; sleep 20 ; done One scp lasts 14 s, the other one 10 s, to create interferences (so that from time to time they run simultaneously). I added two web browsing sessions, one around 15:00, the other around 17:40. MINUTES AFTER THE SECOND WEB BROWSING SESSION (interleaved with the scp's), VMSTAT STARTED REPORTING INCREASES WITHOUT FREEING ANYMORE. The increase speed matches that of the consumed bandwidth (1 MB of download -> 1 MB of leaking SK buff mem). Here is a typical pattern BEFORE the tipping point, each SKB gets freed rapidly: Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c51ed000 Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe00c51ed000 Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c501e000 Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 206 -> 0xfffffe00c51ed000 Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe00c501e000 Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c5173000 Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c52ae000 Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe00c51ed000 Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe008fdc2000 Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe00c5173000 Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe00c52ae000 Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe008fdc2000 Here is the pattern AT the tipping point: Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe008fdc2000 Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe008fdc2000 Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c51ed000 Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe00c51ed000 Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe008fdc2000 Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c51ed000 Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe008fdc2000 Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c52ae000 Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c5279000 Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c52fc000 Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c4fd3000 Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c4fca000 Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c525e000 Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c53ff000 Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c4d26000 Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c5341000 Now I'll just have to go up in linuxkpi_ieee80211_free_txskb()'s callers to see what calls it habitually, and doesn't call it anymore after the tipping point...
(In reply to Guillaume Outters from comment #15) This is interesting. The reason I was thinking that it cannot be here is that the TX SKBs have an mbuf attached to them which is only freed (as you can see in lkpi_ieee80211_free_txskb and the function called from there) just before the SKB is freed. Given we do not seem to leak mbufs according to netstat -m output provided before it seemed unlikely that these SKBs are leaked. But also 170 bytes is really not much each time. Did you also instrument the RX path? Are your SCPs pushing or pulling data? as in do you copy a file off from the rtw88 device or do you copy a file to the rtw88 device? BTW. you do not have to patch the kernel for this. Dtrace provides adequate tracing functionality in this case. Here's a sample I shared earlier on which you can probably use as a start: # dtrace -s rtw88-skb.d -o rtw88-skb.results --- 8< 8< 8< rtw88-skb.d ----------------------------------------------- #!/usr/sbin/dtrace -s /*- * SPDX-License-Identifier: BSD-2-Clause * Copyright (c) 2024 Bjoern A. Zeeb * * 20241224 -- rtw88 SKB leak debugging. */ #pragma D option quiet #pragma D option switchrate=2 #pragma D option bufsize=4M fbt:kernel:linuxkpi_alloc_skb:entry { @[ probefunc, stack() ] = count(); } /* fbt:kernel:linuxkpi_alloc_skb:return */ fbt:kernel:linuxkpi_kfree_skb:entry { @[ probefunc, stack() ] = count(); } /* fbt:kernel:linuxkpi_kfree_skb:return */ /* end */ --- 8< 8< 8< ----------------------------------------------------------- You could add the relevant functions and see where things are no longer called. Given you are on a fairly low bandwidth it's likely fine but eventually the overhead will be noticeable and DTrace may drop records.
This may or may not be the same problem (and I do have four different Realtek cards in this system) but another possible data point: db> show malloc Type InUse MemUse Requests lkpiskb 212076 885_168K 6581552 <<<< bounce 8336 33344K 8336 mbuf_tag 203410 12721K 10820863 <<< ... db> show uma Zone Size Used Free Requests Sleeps Bucket Total Mem XFree mbuf 256 229803 1084 41147484 0 254 59107072 0 <<<< mbuf_cluster 2048 18161 1651 9708177 0 254 40574976 0 malloc-64 64 204469 1541 278960 0 254 13184640 0 vmem btag 56 224005 402 230802 0 254 12566792 0 ... It started like (but no SKB alloc failures in the log): rtw881: failed to dequeue 3967 skb TX queue 5, BD=0xffffffff, rp 127 -> 4095^M rtw881: failed to dequeue 4094 skb TX queue 6, BD=0xffffffff, rp 0 -> 4095^M rtw881: failed to dequeue 4008 skb TX queue 1, BD=0xffffffff, rp 86 -> 4095^M rtw881: failed to dequeue 4094 skb TX queue 0, BD=0xffffffff, rp 0 -> 4095^M rtw881: failed to dequeue 4094 skb TX queue 3, BD=0xffffffff, rp 0 -> 4095^M rtw881: failed to dequeue 4094 skb TX queue 2, BD=0xffffffff, rp 0 -> 4095^M rtw881: failed to dequeue 4091 skb TX queue 7, BD=0xffffffff, rp 3 -> 4095^M rtw881: unhandled firmware c2h interrupt^M rtw881: pci bus timeout, check dma status^M rtw881: pci bus timeout, check dma status^M rtw880: failed to send h2c command^M rtw881: pci bus timeout, check dma status^M rtw881: pci bus timeout, check dma status^M rtw881: pci bus timeout, check dma status^M rtw881: pci bus timeout, check dma status^M ... rtw881 is: [109.713728] rtw881: <rtw_8821ce> port 0x3000-0x30ff mem 0xd0100000-0xd010ffff irq 41 at device 0.0 on pci9^M [109.732101] rtw88/rtw8821c_fw.bin: could not load firmware image, error 8^M [109.737883] Trying to load binary firmware from /boot/firmware/rtw88/rtw8821c_fw.bin^M [109.759739] firmware: '/boot/firmware/rtw88/rtw8821c_fw.bin' version 0: 139472 bytes loaded at 0xfffffe0087a82000^M [109.768918] rtw88/rtw8821c_fw.bin: Loaded binary firmware using /boot/firmware/rtw88/rtw8821c_fw.bin^M [109.777329] rtw881: successfully loaded firmware image 'rtw88/rtw8821c_fw.bin'^M [109.783534] rtw881: Firmware version 24.11.0, H2C version 12^M db> trace 100239 Tracing pid 0 tid 100239 td 0xfffff80005fbb740 kdb_alt_break_internal() at kdb_alt_break_internal+0x15d/frame 0xfffffe0068f7d940 kdb_alt_break() at kdb_alt_break+0xb/frame 0xfffffe0068f7d950 uart_intr_rxready() at uart_intr_rxready+0xf4/frame 0xfffffe0068f7d980 uart_intr() at uart_intr+0x138/frame 0xfffffe0068f7d9c0 intr_event_handle() at intr_event_handle+0xc8/frame 0xfffffe0068f7da30 intr_execute_handlers() at intr_execute_handlers+0x63/frame 0xfffffe0068f7da60 Xapic_isr1() at Xapic_isr1+0xdc/frame 0xfffffe0068f7da60 --- interrupt, rip = 0xffffffff8106e0d8, rsp = 0xfffffe0068f7db30, rbp = 0xfffffe0068f7db40 --- spinlock_exit() at spinlock_exit+0x38/frame 0xfffffe0068f7db40 sbuf_printf_drain() at sbuf_printf_drain+0xaa/frame 0xfffffe0068f7db80 sbuf_drain() at sbuf_drain+0x57/frame 0xfffffe0068f7dba0 sbuf_finish() at sbuf_finish+0x8c/frame 0xfffffe0068f7dbc0 device_printf() at device_printf+0x10b/frame 0xfffffe0068f7dd10 rtw_pci_napi_poll() at rtw_pci_napi_poll+0x20e/frame 0xfffffe0068f7de20 <<<< lkpi_napi_task() at lkpi_napi_task+0x1f/frame 0xfffffe0068f7de40 taskqueue_run_locked() at taskqueue_run_locked+0x1c2/frame 0xfffffe0068f7dec0 taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe0068f7def0 fork_exit() at fork_exit+0x82/frame 0xfffffe0068f7df30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0068f7df30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Sadly I cannot dump here to gather more info later but I'll go and check and keep an eye on this.
Short into the dark II: The patch to apply is here (you can use arc or download it from the right side) https://reviews.freebsd.org/D48723 set compat.linuxkpi.rtw88_debug_mask=0x08000000 in loader.conf to also get the logging if you want.
So I am currently doing minutely transfers (TX) off the rtw881 instance to another machine. Coming back after a while I see on the 1 minute update differences for vmstat -m for lkpiskb (but not mbuf-tags): Use: % expr 4071 - 3997 74 Reqs: % expr 17869937 - 17557380 312557 Memory: % expr 54423552 - 54120448 303104 # It's exactly one page a time! % expr 74 \* 4096 303104 The previous logging doesn't catch so it's not that though that's just another error path without an skb free call (there are more in the driver I fixed in the past): I wonder if we are leaking somewhere else inside rtw88 slowly; we should go and see where skb_allocations are matching a page (like the rsvd page or similar). It seems it's a small number (0.023676%). [and while I am writing rtw881 exploded again] rtw881: failed to dequeue 3908 skb TX queue 1, BD=0xffffffff, rp 183 -> 4095^M rtw881: pci bus timeout, check dma status^M rtw881: pci bus timeout, check dma status^M rtw881: pci bus timeout, check dma status^M [private note from somehwere later in the the trace: WARNING bw > RTW_CHANNEL_WIDTH_40 || nrx >= 4 failed at /sys/contrib/dev/rtw88/rtw8822c.c:4412] db> show malloc Type InUse MemUse Requests lkpiskb 3109 49300K 19622568 bounce 8336 33344K 8336 BPF 34 6150K 48 So maybe using dtrace to find allocations with size of a page (4096) would be interesting to see where they are coming from and how many they are. I'll let that run over night and see in the morning. It's just another hunch (the crystal ball currently doesn't seem to work very well).
(In reply to Bjoern A. Zeeb from comment #18) I have applied patch from https://reviews.freebsd.org/D48723 with compat.linuxkpi.rtw88_debug_mask=0x08000000 set but kernel does not produce any new error messages , networking still hangs after few hours of uptime with messages like below ( it is from https://reviews.freebsd.org/D48474 ): Jan 31 15:48:51 kernel: [15609] rtw880: ERROR lkpi_80211_txq_tx_one: skb alloc failed 48 + 141, lsta 0xfffff80003108800 sta 0xfffff80003108880 ni 0xfffffe0119d22000 Thank you
(In reply to Bjoern A. Zeeb from comment #16, comment #17, comment #19) > This may or may not be the same problem (and I do have four different Realtek cards in this system) but another possible data point: > [...] > It started like (but no SKB alloc failures in the log): > rtw881: failed to dequeue 3967 skb TX queue 5, BD=0xffffffff, rp 127 -> 4095^M "skb alloc failed" is not the primary symptom, rather it's a consequence once the situation has degraded too much. For me the first symptom is `vmstat -m | grep skb` growing at the pace of network activity; sometimes as soon as 10 mn after the reboot (but sometimes far later than that; I still don't understand what triggers the change). At boot I always have a disturbingly flat value of 16781312 (16 MB + 1 buffer of 4096 b); some activity will make it eat 1 or 2 or more buffers, but it always ends up returning to this value. Then as long as I see it going over 18 MB, I know that something has gone rogue, and that _any new activity will make unreleased skbuffs grow proportionally to network traffic_. In my 5 lasts attempts, for a 10 MiB nc transfer (either from or to my laptop), it had grown 19.43 MiB, then 18.60, 18.52, 14.52, 15.04, after having temporarily topped at 23.20 MiB, 45.36, 42.29, 21.64, 26.52 (from a baseline which was the vmstat -m before the transfer, at around 2.2 GiB; that is, before first transfer vmstat was at 2.2 GiB, during the transfer it peaked à 2.22320i GiB, and after the transfer it went back to "only" 2.21943 GiB) During that phase, I noticed some sluggishness or small Wayland freezes (3 to dozens of seconds) from time to time; as if (but that's just a guess) it did a lookup over all the already allocated mem (to look for the oldest buffer to free? For a free mem block where to allocate a new buffer? To relocate buffers? To reach the end of a linked-list pool of buffers, looking for one to reuse?). The "skb alloc failed" only occurs ONCE IT CANNOT ALLOCATE ANYMORE IN THE FIRST 4 GB OF RAM (due to compat.linuxkpi.skb.mem_limit=1, as I understand), after having grown MB by MB (and being in concurrence with userspace processes: this afternoon after some "skb alloc failed", I quit a long-running Firefox, and as a result got 1 or 2 hours without "skb alloc failed"). > Did you also instrument the RX path? > Are your SCPs pushing or pulling data? as in do you copy a file off from the rtw88 device or do you copy a file to the rtw88 device? > But also 170 bytes is really not much each time. I just instrumented those functions, because that file was the one I could grep 'skb_free|free_skb' in, and I didn't look further. And my original tests were on pushes (via scp). But today with my 10 MiB of a .gz file pushed and pulled via nc, I could measure the same non-freed allocations: ALL of the ~3400 allocations traced for one transfer were between 126 and 196 bytes long (but it's not an absolute limit: another test later saw 4 out of 4000 at 243, 249, 368 and 434 bytes). However, the `vmstat -m | grep skb` increase was way more more than the sum of all those traced packets; for a 10 MiB transfer that resulted in a 15 - 20 MiB increase of skb allocated space, only 650 KB were traced by my artisanal probe. On the other hand, I don't know how the allocator works: IF FOR EACH 170 b ALLOCATION REQUESTED, THE ALLOCATOR RETURNS A FULL 4096 B PAGE, THEN THIS EXPLAINS OUR 15 - 20 MB INCREASE (I saw from 3500 to 7000 calls through my trace, which multiplied by 4 K give 14.3 to 28.7 MiB: it would perfectly match with vmstat report!). > BTW. you do not have to patch the kernel for this. Dtrace provides adequate tracing functionality in this case. > Here's a sample I shared earlier on which you can probably use as a start: > [...] Nice! It's been a long time I say to myself that I should look into Dtrace, but I never did; you're adding to the good reasons to do that in 2025. > Coming back after a while I see on the 1 minute update differences for vmstat -m for lkpiskb (but not mbuf-tags): > # It's exactly one page a time! > % expr 74 \* 4096 > 303104 From my experience (see my first block of this reply), during the "non problematic phase", a continuous use of network (a transfer) made skb allocated by multiples of 4 KB, but as soon as network's pressure lowered, they were released and I reached back to 16 MB + 4 KB. So this may be normal to have a small, permanent increase... as long as the dequeue has opportunities to run, and, more important, as long as it does not just resigns. Now I'll have to reboot to post this long comment: any HTTP request now gets at least an "skb alloc failed", network isn't usable anymore.
(In reply to Guillaume Outters from comment #21) The 16M ring + 1rsvd page would match as a start. Thanks a lot for the detailed answer. I just realized why we do see the 4K increases but I did not see any 4k allocations. contigmalloc in linux_skb due to the memory limit as-is in the tree. There are changes to come to change that for certain behavior. Can you also keep an eye on the mbuf-tag numbers (as that'll tells RX vs. TX apart). On main I use (to alos get the header; stable/14 might still be different): vmstat -m | egrep -e '(Use|lkpi|mbuf)
Created attachment 257182 [details] Dtrace and vmstat after a 10 MiB receive
Created attachment 257183 [details] Dtrace and vmstat after a 10 MiB send
(In reply to Bjoern A. Zeeb from comment #16) With your Dtrace I got a flagrant difference between the "before" and "after" tipping point. -------------------------------- -- The paths to allocation and freeing There are 2 paths to linuxkpi_alloc_skb, let's call them A as Alloc: - A1 lkpi_80211_txq_task (explicitely Tx) - A2 lkpi_napi_task > rtw_pci_napi_poll (which would be the RX path?) And 3 paths to linuxkpi_kfree_skb (D for Dealloc): - D1 linux_work_fn > rtw_c2h_work > rtw_fw_c2h_cmd_handle > rtw_tx_report_handle > linuxkpi_ieee80211_tx_status Note that there is a "branch" at rtw_c2h_work, with: D1.1 rtw_c2h_work+0x62 leading to the full path above, at the tail of which linuxkpi_kfree_skb is called (by linuxkpi_ieee80211_tx_status) D1.2 rtw_c2h_work+0x6a does a direct call to linuxkpi_kfree_skb - D2 lkpi_napi_task > rtw_pci_napi_poll > linuxkpi_ieee80211_rx (explicitely Rx) - D3 [softclock_thread > softclock_call_cc] > rtw_tx_report_purge_timer contrary to all of the above which start with a [taskqueue_thread_loop > taskqueue_run_locked], this one's name suggests seems to be called on a periodic -------------------------------- -- My tests I did a first test just after the reboot (I don't remember if it was mostly Rx or Tx), with interesting results (see (2) below for the full result): - 283 allocs through A1 got freed by D1.1 - 1144 allocs through A2, of which: - 861 freed by D2 - 283 freed by D1.2 Note how the 283 matches: there's an interesting mix of allocations handled by the "other side"'s dealloc (and due to this mix I couldn't say for sure that 1 is Tx and 2 is Rx). Now after some time running, and vmstat -m starting to show an increase in skb mem consumption, I had totally different paths: Be it in Tx or Rx (see the details in ): - during the transfer I got 17870 A2, some of which got freed by a D2. But I had some A1 too, with: A2 = A1 + D2 This is **really surprising**, as I intended a balanced A1 + A2 = D2 (the sum of allocations = the sum of deallocations) The increase of vmstat was of exactly 2 * A1 * 4 KB This explains very well: to achieve balance we should have freed as many as we had allocated: (A1 + A2) - (D2) = 0, with an expected D2 = A1 + A2; However here D2 = A2 - A1, thus our balance is of A1 + A2 - D2 = A1 + A2 - (A2 - A1) = 2 * A1 - if waiting after the transfer, apart from some negligible new allocations on this mode (A2 = A1 + D2), we see a **new process** (D3, with a rtw_tx_report_purge_timer on softclock_thread) running to free a bit of memory; although it seems to be dedicated to garbage collecting, **it doesn't keep up with the pace** -------------------------------- -- My god! It looks like SOME SKBUFFERS GET **ALLOCATED** INSTEAD OF **FREED**. -------------------------------- Notes --- (1) Test procedure vms() { vmstat -m | egrep -e '(Use|lkpi|mbuf)' ; } dt() { sudo dtrace -s rtw88-skb.d | egrep -v 'kernel.0xffff|fork_exit|taskqueue_thread_loop|softclock_thread' ; } for p in "rx b:/tmp/1 /tmp/" "tx /tmp/1 b:/tmp/" # /tmp/1 contains the first 10000000 bytes of a gzip file. do set -- $p { dt $1.during & vms echo "# scp ($1) of a 10 MiB file" time scp $2 $3 2>&1 sudo killall dtrace vms dt $1.after & echo "# Sleep 20" sleep 20 sudo killall dtrace vms } | tee rtw88-skb.results.pourri.$1 done --- (2) Before linuxkpi_alloc_skb kernel`linuxkpi_dev_alloc_skb+0xd kernel`lkpi_80211_txq_task+0x1ec kernel`taskqueue_run_locked+0x182 kernel`taskqueue_thread_loop+0xc2 283 linuxkpi_kfree_skb kernel`linuxkpi_ieee80211_tx_status_ext+0x163 kernel`linuxkpi_ieee80211_tx_status+0x45 if_rtw88.ko`rtw_tx_report_handle+0x136 if_rtw88.ko`rtw_fw_c2h_cmd_handle+0x15a if_rtw88.ko`rtw_c2h_work+0x62 kernel`linux_work_fn+0xe4 kernel`taskqueue_run_locked+0x182 kernel`taskqueue_thread_loop+0xc2 283 linuxkpi_alloc_skb kernel`linuxkpi_dev_alloc_skb+0xd if_rtw88.ko`rtw_pci_napi_poll+0x254 kernel`lkpi_napi_task+0xf kernel`taskqueue_run_locked+0x182 kernel`taskqueue_thread_loop+0xc2 1144 linuxkpi_kfree_skb if_rtw88.ko`rtw_c2h_work+0x6a kernel`linux_work_fn+0xe4 kernel`taskqueue_run_locked+0x182 kernel`taskqueue_thread_loop+0xc2 283 linuxkpi_kfree_skb kernel`linuxkpi_ieee80211_rx+0x5a3 if_rtw88.ko`rtw_pci_napi_poll+0x31b kernel`lkpi_napi_task+0xf kernel`taskqueue_run_locked+0x182 kernel`taskqueue_thread_loop+0xc2 861