Bug 283903 - rtw88: possible skb leak
Summary: rtw88: possible skb leak
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: wireless (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Bjoern A. Zeeb
URL:
Keywords:
Depends on:
Blocks: 273621
  Show dependency treegraph
 
Reported: 2025-01-07 15:39 UTC by oleg.nauman
Modified: 2025-02-02 02:13 UTC (History)
3 users (show)

See Also:


Attachments
vmstat req and mem during scp before / after going rogue (113.74 KB, image/png)
2025-01-14 23:23 UTC, Guillaume Outters
no flags Details
Dtrace and vmstat after a 10 MiB receive (2.79 KB, text/plain)
2025-02-02 02:12 UTC, Guillaume Outters
no flags Details
Dtrace and vmstat after a 10 MiB send (2.62 KB, text/plain)
2025-02-02 02:13 UTC, Guillaume Outters
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description oleg.nauman 2025-01-07 15:39:50 UTC
rtw88 experience issues with skb allocation after few hours of uptime, spamming console with "skb allocation failed" and causing networking to stop ( can not ping wireless access point even )

Dec 21 11:35:39 kernel: rtw880: ERROR lkpi_80211_txq_tx_one: skb alloc failed
Dec 21 11:42:13 kernel: rx routine starvation
Dec 21 11:42:13 kernel: rtw880: ERROR lkpi_80211_txq_tx_one: skb alloc failed
Dec 21 11:42:40 syslogd: last message repeated 9 times
Dec 21 11:44:54 syslogd: last message repeated 621 times

It happens with

rtw880@pci0:2:0:0:      class=0x028000 rev=0x00 hdr=0x00 vendor=0x10ec device=0xc821 subvendor=0x1a3b subdevice=0x3040
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8821CE 802.11ac PCIe Wireless Network Adapter'
    class      = network

wifi adapter,

another important point that it is vanilla CURRENT amd64 main-n274315-da2c88dfcf4f-dirty with the patch applied from https://reviews.freebsd.org/D48161 ,

sysctl compat.linuxkpi.skb.mem_limit is set to 1
Comment 1 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-01-07 17:57:21 UTC
Oleg provided me with some data;  I also have a Dtrace script to run and a possible patch (not finalized) based on a hunch (so it may or may not solve the problem).  I'll try to follow-up here the next days.

PS: the referenced patch from D48161 is in main as-of today.
Comment 2 Guillaume Outters 2025-01-08 23:26:09 UTC
(In reply to oleg.nauman from comment #0)

It should be noted, as said by Oleg in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283142#c20, that this part of the bug is quite old, and not tied to master + D48161. The same happens on a 14.1-RELEASE (releng/14.1-n267679-10e31f0946d8).
Comment 3 Guillaume Outters 2025-01-09 00:00:06 UTC
Here it seems to happen only after memory has been exhausted at least once:
after a reboot there *always* (see (1)) have to happen a bunch of "kernel: pid xxx (xxxxx), jid 0, uid 1001, was killed: failed to reclaim memory" (2) before any "skb alloc failed" appears.

Could it be that the leaks only happen after some allocations have been confined to the swap, and then even when mem gets freed the driver is unable to regain its ability to allocate elsewhere than to swap? (please forgive my FreeBSD and VM newbiness if my hypothesis is filled with inaccuracies and nowhere near reality) 

(1)
{ bunzip2 < messages.0.bz2 ; cat messages ; } | egrep alloc\|memory | awk '/real memory/{s="0 boot"}/failed to reclaim memory/{s="1 killed"}/skb alloc failed/{s="2 skb"}old!=s{if(old)print n" x\t"old;n=0;old=s}{++n}'

(2)
Repetitively building a full LLVM for all its architectures + clang + compiler-rt, with make -j4, on a 8 GB laptop.
Comment 4 oleg.nauman 2025-01-09 08:38:51 UTC
(In reply to Guillaume Outters from comment #3)
 
Hello,

I can not confirm that "skb allocation failure" happens only when virtual memory is exhausted
Could you please directly confirm that you also observing "skb allocation failure" messages? Unfortunately Bjoern can not reproduce it so I was under assumptions so it is somehow triggered by my load pattern ( laptop which I am using as my desktop )


Thank you
Comment 5 Guillaume Outters 2025-01-09 10:40:53 UTC
(In reply to oleg.nauman from comment #4)

Yes, I have the same:
Dec 31 16:39:14 pasdfric kernel: rtw880: ERROR lkpi_80211_txq_tx_one: skb alloc failed
as you, with a sysctl compat.linuxkpi.skb.mem_limit to 1 too.

In my case they start interleaved with, or sometimes hours after, some:
Dec 31 16:35:26 pasdfric kernel: pid 54454 (clang++), jid 0, uid 1001, was killed: failed to reclaim memory

But once I started having some "skb alloc failed", the situation degrades for rtw88, *even when pressure on memory has decreased* (no "failed to reclaim memory" anymore).

So it is inexact that it occurs "when VM is exhausted",
it is more that it occurs "when the system got through at least one VM exhaust since booting".

If I take another session of December 31 from /var/log/messages, from boot to crash, I get:
18:47:20 boot
19:41:08 two kills after VM exhaust
20:19:51 first skb alloc failed
(repeated hundreds of times until a reboot)
In this case there were no interleaves of skb alloc with the VM exhaust, it came after.

Maybe the problem is not about going into virtual mem,
but instead just crossing the 4 GB border,
which would explain why the problem occurs even after the memory pressure diminished.

I would say that:
1. either a fisrt failed allocation, due to memory exhaust, let the driver in a dirty state it is unable to recover from even after memory became available again (the structure it wanted to allocate was crucial for later operations); but if it was the case you should find memory exhaust related logs in your /var/log/messages, between the boot and the first skb alloc failed
2. or that, once the kernel allocates at adresses > 4 GB, LinuxKPI declines the allocs
Comment 6 oleg.nauman 2025-01-09 14:44:11 UTC
Hello,

It would be nice if you provide Bjoern with output

vmstat -m | egrep -E "lkpiskb|Memory"

when it happens again
Comment 7 Guillaume Outters 2025-01-11 15:35:29 UTC
Whoops, I got a new one!
After an uptime of more than 2 days without any problem, I once again had two killed processes yesterday evening (while building GHC, the Haskell compiler), and then this morning while trying a big rsync, an "skb alloc failed" loop happened.

Jan  8 16:01:44 Boot
Jan 10 22:26:56 pasdfric kernel: pid 9508 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory
Jan 10 22:36:53 pasdfric kernel: pid 78425 (ghc), jid 0, uid 1001, was killed: failed to reclaim memory
Jan 11 11:20:18 pasdfric kernel: rtw880: ERROR lkpi_80211_txq_tx_one: skb alloc failed
Jan 11 11:20:19 pasdfric kernel: rtw880: ERROR lkpi_80211_txq_tx_one: skb alloc failed
Jan 11 11:21:38 pasdfric kernel: rtw880: ERROR lkpi_80211_txq_tx_one: skb alloc failed
[…]

At 14:00 my rsync still ran, but at about 10 KB/s.
Here is the vmstat -m (with everything lkpi-related):

            Type  Use Memory Req Size(s)
       lkpi80211   35 42112   39 128,256,1024,2048,32768
     lkpikmalloc 7531 17555504 260476706 16,32,64,128,256,384,512,1024,2048,4096,8192,16384,32768,65536
          lkpifw   12   384   24 32,128
         lkpiskb 580877 2391855104 8175104

If I read well that makes *2 GB* for lkpiskb???
After a reboot, the rsync finished really quickly (can't estimate at which speed, but many MB/s)
Comment 8 Guillaume Outters 2025-01-14 23:23:50 UTC
Created attachment 256704 [details]
vmstat req and mem during scp before / after going rogue
Comment 9 Guillaume Outters 2025-01-15 00:01:37 UTC
OK, I was wrong thinking the compiler eating mem induced rtw88 into error:
now I'm convinced that rtw88 ate so much mem that clang hadn't enough anymore.

Twice this week I observed lpkiskb going rogue, unfortunately without detecting the precise moment it became Mr Hyde.

But (as seen in https://bz-attachments.freebsd.org/attachment.cgi?id=256704),
I managed to test a loop of a 2 MB scp every minute, while watching vmstat -m | grep lkpiskb,
with surprising results:

A. Before going rogue

Each scp added to the req counter, but memory stayed remarkably stable, at 16 MB.
I didn't even noticed any temporary increase of mem use during scps (and statistically, my probe every 10 seconds should have happened multiple times simultaneously with my 4 seconds lasting download).

B. After having gone rogue

Now the reqs are still there (hopefully!), but **mem use increases of ~ 3,4 MB** per scp of 2 MB (/!\ those are file bytes, not bandwidth; and I don't know the bandwidth / file bytes ratio of scp)

And having the server lag helped saw something more interesting, due to multiple probes occurring during the transfer:
the 3,4 MB didn't follow a monotonic growth, but instead the mem use increased considerably, and _only some_ mem got freed once scp finished.  

--------------------------------------------------------------------------------

Here a one-go increase (probes at :45 and :56, scp occured around between :47 and :51) (+ 3,4 MB):
2025-01-14 01:06:45             lkpiskb 329256  1361215488      9513310         +2      +8192   +212
2025-01-14 01:06:56             lkpiskb 330093  1364643840      9517636         +837    +3428352        +4326
2025-01-14 01:07:06             lkpiskb 330095  1364652032      9517845         +2      +8192   +209

Here a two-go (+ 5 MB - 1,6 MB):
2025-01-14 01:07:47             lkpiskb 330115  1364733952      9518743         +5      +20480  +223
2025-01-14 01:07:57             lkpiskb 331360  1369833472      9522027         +1245   +5099520        +3284
2025-01-14 01:08:07             lkpiskb 330958  1368186880      9523298         -402    -1646592        +1271
2025-01-14 01:08:17             lkpiskb 330964  1368211456      9523527         +6      +24576  +229

Another one-go (+ 6,8 MB - 3,4 MB):
2025-01-14 01:10:00             lkpiskb 331828  1371750400      9529814         +2      +8192   +217
2025-01-14 01:10:10             lkpiskb 333506  1378623488      9534152         +1678   +6873088        +4338
2025-01-14 01:10:20             lkpiskb 332671  1375203328      9534368         -835    -3420160        +216
2025-01-14 01:10:30             lkpiskb 332674  1375215616      9534584         +3      +12288  +216

Here periodic security on the server made it really hard for scp to finish, and let rtw88 free its mem:
2025-01-14 07:21:34             lkpiskb 586598  2415288320      11209247                +291    +1191936        +933
2025-01-14 07:21:57             lkpiskb 587114  2417401856      11210910                +516    +2113536        +1663
2025-01-14 07:22:23             lkpiskb 587579  2419306496      11212416                +465    +1904640        +1506
2025-01-14 07:22:45             lkpiskb 587877  2420527104      11213510                +298    +1220608        +1094
2025-01-14 07:23:07             lkpiskb 588170  2421727232      11214571                +293    +1200128        +1061
2025-01-14 07:23:29             lkpiskb 588474  2422972416      11215673                +304    +1245184        +1102
2025-01-14 07:23:49             lkpiskb 587441  2418741248      11216292                -1033   -4231168        +619

--------------------------------------------------------------------------------

while true ; do date "+%Y-%m-%d %H:%M:%S" ; vmstat -m | grep lkpi ; sleep 10 ; done | tee -a ~/tmp/depuisboot.log
awk < ~/tmp/depuisboot.log '/^2025-/{h=$0}/lkpiskb/{variation="+"(-use+$2)"\t+"(-mem+$3)"\t+"(-req+$4);gsub(/[+]-/,"-",variation);gsub(/ +/,"\t");print h"\t"$0"\t"variation;use=0+$2;mem=0+$3;req=0+$4}' | less
Comment 10 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-01-15 22:47:42 UTC
This is a shot in the dark and a hack but can you two try to apply the changes from

https://reviews.freebsd.org/D48474

and see if that helps?  If it does we can find a proper solution not abusing an skb field...
Comment 11 oleg.nauman 2025-01-16 08:07:18 UTC
(In reply to Bjoern A. Zeeb from comment #10)

Hello Bjoern,

I will test it tomorrow.

Thank you
Comment 12 Guillaume Outters 2025-01-16 22:43:01 UTC
(In reply to Bjoern A. Zeeb from comment #10)

I would be glad to add my laptop to the test battery,
but I will need a bit of guidance on how a total newbie can run this patched driver (until now I only used FreeBSD on preinstalled dedicated servers, or on VirtualBox guests).

Ideally I'd test it live, with a bit of kldunload / kldload, remaking only in modules/rtw88 and modules/linuxkpi_wlan.
However, this only works for if_rtw88.ko (ifconfig wlan0 down ; kldunload if_rtw88.ko ; kldload /usr/obj/usr/src/amd64.amd64/sys/modules/rtw88/if_rtw88.ko ; sleep 1 ; ifconfig wlan0 up)
On the other hand, the linuxkpi_wlan.ko materialized in /boot/kernel seems to be an empty shell, because kldload rejects it with a dmesg telling it's already part of the kernel (where I find the "skb allocation failed" string by the way).
But in the GENERIC conf for the kernel, I see no dedicated setting for linuxkpi_wlan, only the main COMPAT_LINUXKPI switch that certainly should not go away.

Do I have any other way than compiling a full kernel to test the linuxkpi_wlan part?

I'd be grateful for any link for an "how-to get linuxkpi_wlan out of the kernel, to a module". Or a simple "no way" so that I go recompile my world without discuss^H^H^H^H^H^H^Hloosing my and your time anymore.
Comment 13 oleg.nauman 2025-01-17 06:47:31 UTC
(In reply to Bjoern A. Zeeb from comment #10)

 It caused kernel panic after 1 hour 47 minutes of uptime

Jan 17 08:21:23 kernel: rtw880: ERROR lkpi_80211_txq_tx_one: skb alloc failed 48 + 163, lsta 0xfffff80003654000 sta 0xfffff80003654080 ni 0xfffffe0117b8d000
Jan 17 08:21:23 kernel: rtw880:   tid 0 ltxq 0xfffff8000395f400 seen_dequeue 1 stopped 0 skb_queue_len 0
Jan 17 08:21:23 kernel: rtw880:   tid 1 ltxq 0xfffff8000395f500 seen_dequeue 1 stopped 0 skb_queue_len 0
Jan 17 08:21:23 kernel: rtw880:   tid 2 ltxq 0xfffff8000395f600 seen_dequeue 0 stopped 0 skb_queue_len 0
Jan 17 08:21:23 kernel: rtw880:   tid 3 ltxq 0xfffff8000395f700 seen_dequeue 0 stopped 0 skb_queue_len 0
Jan 17 08:21:23 kernel: rtw880:   tid 4 ltxq 0xfffff8000395f800 seen_dequeue 0 stopped 0 skb_queue_len 0
Jan 17 08:21:23 kernel: rtw880:   tid 5 ltxq 0xfffff8000395f900 seen_dequeue 0 stopped 0 skb_queue_len 0
Jan 17 08:21:23 kernel: rtw880:   tid 6 ltxq 0xfffff8000395fa00 seen_dequeue 0 stopped 0 skb_queue_len 0
Jan 17 08:21:23 kernel: rtw880:   tid 7 ltxq 0xfffff8000395fb00 seen_dequeue 0 stopped 0 skb_queue_len 0
Jan 17 08:21:23 kernel: rtw880:   tid 8 ltxq 0xfffff8000395fc00 seen_dequeue 0 stopped 0 skb_queue_len 0
Jan 17 08:21:23 kernel: rtw880:   tid 9 ltxq 0xfffff8000395fd00 seen_dequeue 0 stopped 0 skb_queue_len 0
Jan 17 08:21:23 kernel: rtw880:   tid 10 ltxq 0xfffff8000395fe00 seen_dequeue 0 stopped 0 skb_queue_len 0
Jan 17 08:21:23 kernel: rtw880:   tid 11 ltxq 0xfffff8000395e100 seen_dequeue 0 stopped 0 skb_queue_len 0
Jan 17 08:21:23 kernel: rtw880:   tid 12 ltxq 0xfffff80003761600 seen_dequeue 0 stopped 0 skb_queue_len 0
Jan 17 08:21:23 kernel: rtw880:   tid 13 ltxq 0xfffff80003706100 seen_dequeue 0 stopped 0 skb_queue_len 0
Jan 17 08:21:23 kernel: rtw880:   tid 14 ltxq 0xfffff80003706000 seen_dequeue 0 stopped 0 skb_queue_len 0
Jan 17 08:21:23 kernel: rtw880:   tid 15 ltxq 0xfffff800034afe00 seen_dequeue 0 stopped 0 skb_queue_len 0
Jan 17 08:21:23 kernel:
Jan 17 08:21:23 syslogd: last message repeated 1 times
Jan 17 08:21:23 kernel: Fatal trap 12: page fault while in kernel mode
Jan 17 08:21:23 kernel: cpuid = 1; apic id = 01
Jan 17 08:21:23 kernel: fault virtual address   = 0xffffffffffffffb0
Jan 17 08:21:23 kernel: fault code              = supervisor read data, page not present
Jan 17 08:21:23 kernel: instruction pointer     = 0x20:0xffffffff806fbafc
Jan 17 08:21:23 kernel: stack pointer           = 0x28:0xfffffe00e466edb0
Jan 17 08:21:23 kernel: frame pointer           = 0x28:0xfffffe00e466ee00
Jan 17 08:21:23 kernel: code segment            = base rx0, limit 0xfffff, type 0x1b
Jan 17 08:21:23 kernel:                         = DPL 0, pres 1, long 1, def32 0, gran 1
Jan 17 08:21:23 kernel: processor eflags        = interrupt enabled, resume, IOPL = 0
Jan 17 08:21:23 kernel: current process         = 0 (thread taskq)
Jan 17 08:21:23 kernel: rdi: ffffffff80c48880 rsi: fffffe00e466ec40 rdx: 0000000000000043
Jan 17 08:21:23 kernel: rcx: ffffffffffffff80  r8: 0000000000000043  r9: 00000000000000f0
Jan 17 08:21:23 kernel: rax: 0000000000000000 rbx: fffff80003654000 rbp: fffffe00e466ee00
Jan 17 08:21:23 kernel: r10: 0000000000000042 r11: fffffffffffff5cf r12: fffffe0118cb3000
Jan 17 08:21:23 kernel: r13: fffffe0117b8d000 r14: fffff8054596e200 r15: 0000000000000010
Jan 17 08:21:23 kernel: trap number             = 12
Jan 17 08:21:23 kernel: panic: page fault
Jan 17 08:21:23 kernel: cpuid = 1
Jan 17 08:21:23 kernel: time = 1737094737
Jan 17 08:21:23 kernel: Uptime: 1h47m51s

------------------------------------------

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
57              __asm("movq %%gs:%c1,%0" : "=r" (td)
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
        td = <optimized out>
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:404
        error = 0
        coredump = <optimized out>
#2  0xffffffff80510839 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:524
        once = 0
#3  0xffffffff80510cdb in vpanic (fmt=0xffffffff808946e7 "%s", 
    ap=ap@entry=0xfffffe00e466ec70) at /usr/src/sys/kern/kern_shutdown.c:979
        buf = "page fault", '\000' <repeats 245 times>
        __pc = 0x0
        __pc = 0x0
        __pc = 0x0
        other_cpus = {__bits = {253, 0 <repeats 15 times>}}
        td = 0xfffff8010282a740
        bootopt = <unavailable>
        newpanic = <optimized out>
#4  0xffffffff80510b73 in panic (fmt=<unavailable>)
    at /usr/src/sys/kern/kern_shutdown.c:892
        ap = {{gp_offset = 16, fp_offset = 48, 
            overflow_arg_area = 0xfffffe00e466eca0, 
            reg_save_area = 0xfffffe00e466ec40}}
#5  0xffffffff8083d396 in trap_fatal (frame=<optimized out>, 
    eva=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:960
        type = <optimized out>
#6  0xffffffff8083d396 in trap_pfault (frame=0xfffffe00e466ecf0, 
    usermode=<optimized out>, signo=<optimized out>, ucode=<optimized out>)
        __pc = 0x0
        __pc = 0x0
        __pc = 0x0
        td = <optimized out>
        p = <optimized out>
        eva = 18446744073709551536
        map = <optimized out>
        ftype = <optimized out>
        rv = <optimized out>
#7  <signal handler called>
No locals.
#8  lkpi_80211_txq_tx_one (lsta=lsta@entry=0xfffff80003654000, 
    m=0xfffff8054596e200)
    at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:3830
        tid = 16
        skb_alloc_failures = 1 '\001'
        control = {sta = 0xfffffe00e466ed38}
        ni = 0xfffffe0117b8d000
        k = <optimized out>
        wh = <optimized out>
        ic = 0xfffffe0118cb3000
        lhw = 0xfffffe0118ca3200
        hw = <optimized out>
        c = 0xfffffe0118cb34a4
        skb = <optimized out>
        sta = <optimized out>
        ltxq = 0xffffffffffffff80
        buf = <optimized out>
        lvif = <optimized out>
        vif = <optimized out>
        hdr = <optimized out>
        tid = <optimized out>
        ac = <optimized out>
        info = <optimized out>
#9  0xffffffff806fcce3 in lkpi_80211_txq_task (ctx=0xfffff80003654000, 
    pending=<optimized out>)
    at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:3999
        mq = {mq_head = {stqh_first = 0x0, stqh_last = 0xfffffe00e466ee10}, 
          mq_len = 0, mq_maxlen = 50}
        m = 0xfffffe00e466ec40
        lsta = <optimized out>
        shall_tx = <optimized out>
#10 0xffffffff805708a2 in taskqueue_run_locked (
    queue=queue@entry=0xfffff8000129d000)
    at /usr/src/sys/kern/subr_taskqueue.c:517
        et = {et_link = {tqe_next = 0xfffffe00e466eef0, tqe_prev = 0x0}, 
          et_td = 0x0, et_section = {bucket = 0}, et_old_priority = 0 '\000'}
        tb = {tb_running = 0xfffff80003654018, tb_seq = 511873, 
          tb_canceling = false, tb_link = {le_next = 0x0, 
            le_prev = 0xfffff8000129d010}}
        in_net_epoch = false
        pending = 1
        task = <optimized out>
#11 0xffffffff80571a72 in taskqueue_thread_loop (
    arg=arg@entry=0xffffffff80eedd50 <taskqueue_thread>)
    at /usr/src/sys/kern/subr_taskqueue.c:829
        tqp = <optimized out>
        tq = 0xfffff8000129d000
#12 0xffffffff804d254b in fork_exit (
    callout=0xffffffff805719b0 <taskqueue_thread_loop>, 
    arg=0xffffffff80eedd50 <taskqueue_thread>, frame=0xfffffe00e466ef40)
    at /usr/src/sys/kern/kern_fork.c:1152
        __pc = 0x0
        __pc = 0x0
        td = 0xfffff8010282a740
        p = 0xffffffff80e7da98 <proc0>
        dtd = <optimized out>
#13 <signal handler called>
No locals.
#14 0x4b92e29a0de050f2 in ?? ()
No symbol table info available.
Backtrace stopped: Cannot access memory at address 0x116bd3ff1f0b16f2


 Thank you
Comment 14 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-01-23 20:17:40 UTC
(In reply to oleg.nauman from comment #13)

I'll just put it here given I am a bit short on time the next days to look at this further.

Oleg commented out the loop dumping the queues in LinuxKPI upon allocation error and with that could run the test with the remaining patch to the end.
Unfortunately my shot in the dark was not the right place and the "leak" still happens.

I'll go back and see if we can find out different ways as I find time (hopefully soon).
Comment 15 Guillaume Outters 2025-01-29 10:10:21 UTC
(regarding my question on how to start playing with the kernel and if linux_kpi was externalizable as a module, I resolved to learn how to compile and start a full custom kernel, which is a bit tedious but works)

I could notice the fully binary behaviour of alloc, by tracing in linux_80211.c at:
- lkpi_80211_txq_tx_one(), after the dev_alloc_skb() of https://github.com/freebsd/freebsd-src/blob/main/sys/compat/linuxkpi/common/src/linux_80211.c#L3816
- linuxkpi_ieee80211_free_txskb(), before the _lkpi_ieee80211_free_txskb()

Test procedure:
reboot around 14:40 with the traced kernel, run two loops of:
while true ; do scp ... ; vmstat -m | grep skb ; sleep 20 ; done
One scp lasts 14 s, the other one 10 s, to create interferences (so that from time to time they run simultaneously).
I added two web browsing sessions, one around 15:00, the other around 17:40.

MINUTES AFTER THE SECOND WEB BROWSING SESSION (interleaved with the scp's),
VMSTAT STARTED REPORTING INCREASES WITHOUT FREEING ANYMORE.
The increase speed matches that of the consumed bandwidth (1 MB of download -> 1 MB of leaking SK buff mem).

Here is a typical pattern BEFORE the tipping point, each SKB gets freed rapidly:

Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c51ed000
Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe00c51ed000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c501e000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 206 -> 0xfffffe00c51ed000
Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe00c501e000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c5173000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c52ae000
Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe00c51ed000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe008fdc2000
Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe00c5173000
Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe00c52ae000
Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe008fdc2000

Here is the pattern AT the tipping point:

Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe008fdc2000
Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe008fdc2000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c51ed000
Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe00c51ed000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe008fdc2000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c51ed000
Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe008fdc2000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c52ae000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c5279000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c52fc000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c4fd3000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c4fca000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c525e000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c53ff000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c4d26000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 -> 0xfffffe00c5341000

Now I'll just have to go up in linuxkpi_ieee80211_free_txskb()'s callers to see what calls it habitually, and doesn't call it anymore after the tipping point...
Comment 16 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-01-29 15:12:20 UTC
(In reply to Guillaume Outters from comment #15)

This is interesting.  The reason I was thinking that it cannot be here is that the TX SKBs have an mbuf attached to them which is only freed (as you can see in lkpi_ieee80211_free_txskb and the function called from there) just before the SKB is freed.  Given we do not seem to leak mbufs according to netstat -m output provided before it seemed unlikely that these SKBs are leaked.
But also 170 bytes is really not much each time.

Did you also instrument the RX path?

Are your SCPs pushing or pulling data?  as in do you copy a file off from the rtw88 device or do you copy a file to the rtw88 device?


BTW. you do not have to patch the kernel for this.  Dtrace provides adequate tracing functionality in this case.

Here's a sample I shared earlier on which you can probably use as a start:

# dtrace -s rtw88-skb.d -o rtw88-skb.results

--- 8< 8< 8< rtw88-skb.d -----------------------------------------------
#!/usr/sbin/dtrace -s
/*-
 * SPDX-License-Identifier: BSD-2-Clause
 * Copyright (c) 2024 Bjoern A. Zeeb
 *
 * 20241224 -- rtw88 SKB leak debugging.
 */

#pragma D option quiet
#pragma D option switchrate=2
#pragma D option bufsize=4M

fbt:kernel:linuxkpi_alloc_skb:entry
{
        @[ probefunc, stack() ] = count();
}

/* fbt:kernel:linuxkpi_alloc_skb:return */

fbt:kernel:linuxkpi_kfree_skb:entry
{
        @[ probefunc, stack() ] = count();
}

/* fbt:kernel:linuxkpi_kfree_skb:return */

/* end */
--- 8< 8< 8< -----------------------------------------------------------

You could add the relevant functions and see where things are no longer called.

Given you are on a fairly low bandwidth it's likely fine but eventually the overhead will be noticeable and DTrace may drop records.
Comment 17 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-01-29 23:35:35 UTC
This may or may not be the same problem (and I do have four different Realtek cards in this system) but another possible data point:

db> show malloc
              Type        InUse        MemUse     Requests
           lkpiskb       212076      885_168K      6581552 <<<<
            bounce         8336        33344K         8336
          mbuf_tag       203410        12721K     10820863  <<<
...

db> show uma
              Zone   Size    Used    Free    Requests  Sleeps  Bucket  Total Mem    XFree
              mbuf    256  229803    1084    41147484       0     254   59107072        0 <<<<
      mbuf_cluster   2048   18161    1651     9708177       0     254   40574976        0
         malloc-64     64  204469    1541      278960       0     254   13184640        0
         vmem btag     56  224005     402      230802       0     254   12566792        0
...


It started like (but no SKB alloc failures in the log):

rtw881: failed to dequeue 3967 skb TX queue 5, BD=0xffffffff, rp 127 -> 4095^M
rtw881: failed to dequeue 4094 skb TX queue 6, BD=0xffffffff, rp 0 -> 4095^M
rtw881: failed to dequeue 4008 skb TX queue 1, BD=0xffffffff, rp 86 -> 4095^M
rtw881: failed to dequeue 4094 skb TX queue 0, BD=0xffffffff, rp 0 -> 4095^M
rtw881: failed to dequeue 4094 skb TX queue 3, BD=0xffffffff, rp 0 -> 4095^M
rtw881: failed to dequeue 4094 skb TX queue 2, BD=0xffffffff, rp 0 -> 4095^M
rtw881: failed to dequeue 4091 skb TX queue 7, BD=0xffffffff, rp 3 -> 4095^M
rtw881: unhandled firmware c2h interrupt^M
rtw881: pci bus timeout, check dma status^M
rtw881: pci bus timeout, check dma status^M
rtw880: failed to send h2c command^M
rtw881: pci bus timeout, check dma status^M
rtw881: pci bus timeout, check dma status^M
rtw881: pci bus timeout, check dma status^M
rtw881: pci bus timeout, check dma status^M
...

rtw881 is:
[109.713728] rtw881: <rtw_8821ce> port 0x3000-0x30ff mem 0xd0100000-0xd010ffff irq 41 at device 0.0 on pci9^M
[109.732101] rtw88/rtw8821c_fw.bin: could not load firmware image, error 8^M
[109.737883] Trying to load binary firmware from /boot/firmware/rtw88/rtw8821c_fw.bin^M
[109.759739] firmware: '/boot/firmware/rtw88/rtw8821c_fw.bin' version 0: 139472 bytes loaded at 0xfffffe0087a82000^M
[109.768918] rtw88/rtw8821c_fw.bin: Loaded binary firmware using /boot/firmware/rtw88/rtw8821c_fw.bin^M
[109.777329] rtw881: successfully loaded firmware image 'rtw88/rtw8821c_fw.bin'^M
[109.783534] rtw881: Firmware version 24.11.0, H2C version 12^M

db> trace 100239
Tracing pid 0 tid 100239 td 0xfffff80005fbb740
kdb_alt_break_internal() at kdb_alt_break_internal+0x15d/frame 0xfffffe0068f7d940
kdb_alt_break() at kdb_alt_break+0xb/frame 0xfffffe0068f7d950
uart_intr_rxready() at uart_intr_rxready+0xf4/frame 0xfffffe0068f7d980
uart_intr() at uart_intr+0x138/frame 0xfffffe0068f7d9c0
intr_event_handle() at intr_event_handle+0xc8/frame 0xfffffe0068f7da30
intr_execute_handlers() at intr_execute_handlers+0x63/frame 0xfffffe0068f7da60
Xapic_isr1() at Xapic_isr1+0xdc/frame 0xfffffe0068f7da60
--- interrupt, rip = 0xffffffff8106e0d8, rsp = 0xfffffe0068f7db30, rbp = 0xfffffe0068f7db40 ---
spinlock_exit() at spinlock_exit+0x38/frame 0xfffffe0068f7db40
sbuf_printf_drain() at sbuf_printf_drain+0xaa/frame 0xfffffe0068f7db80
sbuf_drain() at sbuf_drain+0x57/frame 0xfffffe0068f7dba0
sbuf_finish() at sbuf_finish+0x8c/frame 0xfffffe0068f7dbc0
device_printf() at device_printf+0x10b/frame 0xfffffe0068f7dd10
rtw_pci_napi_poll() at rtw_pci_napi_poll+0x20e/frame 0xfffffe0068f7de20   <<<<
lkpi_napi_task() at lkpi_napi_task+0x1f/frame 0xfffffe0068f7de40
taskqueue_run_locked() at taskqueue_run_locked+0x1c2/frame 0xfffffe0068f7dec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe0068f7def0
fork_exit() at fork_exit+0x82/frame 0xfffffe0068f7df30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0068f7df30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

Sadly I cannot dump here to gather more info later but I'll go and check and keep an eye on this.
Comment 18 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-01-30 00:59:35 UTC
Short into the dark II:

The patch to apply is here (you can use arc or download it from the right side) https://reviews.freebsd.org/D48723

set

compat.linuxkpi.rtw88_debug_mask=0x08000000

in loader.conf to also get the logging if you want.
Comment 19 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-01-30 02:44:11 UTC
So I am currently doing minutely transfers (TX) off the rtw881 instance to another machine.

Coming back after a while I see on the 1 minute update differences for vmstat -m for lkpiskb (but not mbuf-tags):

Use:
% expr 4071 - 3997
74

Reqs:
% expr 17869937 - 17557380
312557

Memory:
% expr 54423552 - 54120448
303104

# It's exactly one page a time!
% expr 74 \* 4096
303104

The previous logging doesn't catch so it's not that though that's just another error path without an skb free call (there are more in the driver I fixed in the past):

I wonder if we are leaking somewhere else inside rtw88 slowly; we should go and see where skb_allocations are matching a page (like the rsvd page or similar).
It seems it's a small number (0.023676%).

[and while I am writing rtw881 exploded again]

rtw881: failed to dequeue 3908 skb TX queue 1, BD=0xffffffff, rp 183 -> 4095^M
rtw881: pci bus timeout, check dma status^M
rtw881: pci bus timeout, check dma status^M
rtw881: pci bus timeout, check dma status^M
[private note from somehwere later in the the trace: WARNING bw > RTW_CHANNEL_WIDTH_40 || nrx >= 4 failed at /sys/contrib/dev/rtw88/rtw8822c.c:4412]

db> show malloc
              Type        InUse        MemUse     Requests
           lkpiskb         3109        49300K     19622568
            bounce         8336        33344K         8336
               BPF           34         6150K           48


So maybe using dtrace to find allocations with size of a page (4096) would be interesting to see where they are coming from and how many they are.
I'll let that run over night and see in the morning.   It's just another hunch (the crystal ball currently doesn't seem to work very well).
Comment 20 oleg.nauman 2025-01-31 16:43:09 UTC
(In reply to Bjoern A. Zeeb from comment #18)

 I have applied patch from https://reviews.freebsd.org/D48723  with compat.linuxkpi.rtw88_debug_mask=0x08000000 set but kernel does not produce any new error messages , networking still hangs after few hours of uptime with messages like below ( it is from https://reviews.freebsd.org/D48474 ):

 Jan 31 15:48:51 kernel: [15609] rtw880: ERROR lkpi_80211_txq_tx_one: skb alloc failed 48 + 141, lsta 0xfffff80003108800 sta 0xfffff80003108880 ni 0xfffffe0119d22000




Thank you
Comment 21 Guillaume Outters 2025-01-31 22:54:32 UTC
(In reply to Bjoern A. Zeeb from comment #16, comment #17, comment #19)

> This may or may not be the same problem (and I do have four different Realtek cards in this system) but another possible data point:
> [...]
> It started like (but no SKB alloc failures in the log):
> rtw881: failed to dequeue 3967 skb TX queue 5, BD=0xffffffff, rp 127 -> 4095^M

"skb alloc failed" is not the primary symptom, rather it's a consequence once the situation has degraded too much.

For me the first symptom is `vmstat -m | grep skb` growing at the pace of network activity; sometimes as soon as 10 mn after the reboot (but sometimes far later than that; I still don't understand what triggers the change).

At boot I always have a disturbingly flat value of 16781312 (16 MB + 1 buffer of 4096 b); some activity will make it eat 1 or 2 or more buffers, but it always ends up returning to this value.

Then as long as I see it going over 18 MB, I know that something has gone rogue, and that _any new activity will make unreleased skbuffs grow proportionally to network traffic_.

In my 5 lasts attempts, for a 10 MiB nc transfer (either from or to my laptop), it had grown 19.43 MiB, then 18.60, 18.52, 14.52, 15.04,
after having temporarily topped at 23.20 MiB, 45.36, 42.29, 21.64, 26.52
(from a baseline which was the vmstat -m before the transfer, at around 2.2 GiB;
that is, before first transfer vmstat was at 2.2 GiB, during the transfer it peaked à 2.22320i GiB, and after the transfer it went back to "only" 2.21943 GiB)

During that phase, I noticed some sluggishness or small Wayland freezes (3 to dozens of seconds) from time to time; as if (but that's just a guess) it did a lookup over all the already allocated mem (to look for the oldest buffer to free? For a free mem block where to allocate a new buffer? To relocate buffers? To reach the end of a linked-list pool of buffers, looking for one to reuse?).

The "skb alloc failed" only occurs ONCE IT CANNOT ALLOCATE ANYMORE IN THE FIRST 4 GB OF RAM (due to compat.linuxkpi.skb.mem_limit=1, as I understand), after having grown MB by MB (and being in concurrence with userspace processes: this afternoon after some "skb alloc failed", I quit a long-running Firefox, and as a result got 1 or 2 hours without "skb alloc failed").

> Did you also instrument the RX path?
> Are your SCPs pushing or pulling data?  as in do you copy a file off from the rtw88 device or do you copy a file to the rtw88 device?
> But also 170 bytes is really not much each time.

I just instrumented those functions, because that file was the one I could grep 'skb_free|free_skb' in, and I didn't look further.

And my original tests were on pushes (via scp).
But today with my 10 MiB of a .gz file pushed and pulled via nc,
I could measure the same non-freed allocations: ALL of the ~3400 allocations traced for one transfer were between 126 and 196 bytes long (but it's not an absolute limit: another test later saw 4 out of 4000 at 243, 249, 368 and 434 bytes).

However, the `vmstat -m | grep skb` increase was way more more than the sum of all those traced packets; for a 10 MiB transfer that resulted in a 15 - 20 MiB increase of skb allocated space, only 650 KB were traced by my artisanal probe.
On the other hand, I don't know how the allocator works: IF FOR EACH 170 b ALLOCATION REQUESTED, THE ALLOCATOR RETURNS A FULL 4096 B PAGE, THEN THIS EXPLAINS OUR 15 - 20 MB INCREASE (I saw from 3500 to 7000 calls through my trace, which multiplied by 4 K give 14.3 to 28.7 MiB: it would perfectly match with vmstat report!).

> BTW. you do not have to patch the kernel for this.  Dtrace provides adequate tracing functionality in this case.
> Here's a sample I shared earlier on which you can probably use as a start:
> [...]

Nice! It's been a long time I say to myself that I should look into Dtrace, but I never did; you're adding to the good reasons to do that in 2025.

> Coming back after a while I see on the 1 minute update differences for vmstat -m for lkpiskb (but not mbuf-tags):
> # It's exactly one page a time!
> % expr 74 \* 4096
> 303104

From my experience (see my first block of this reply), during the "non problematic phase", a continuous use of network (a transfer) made skb allocated by multiples of 4 KB, but as soon as network's pressure lowered, they were released and I reached back to 16 MB + 4 KB.

So this may be normal to have a small, permanent increase... as long as the dequeue has opportunities to run, and, more important, as long as it does not just resigns. 

Now I'll have to reboot to post this long comment: any HTTP request now gets at least an "skb alloc failed", network isn't usable anymore.
Comment 22 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-01-31 23:28:46 UTC
(In reply to Guillaume Outters from comment #21)

The 16M ring + 1rsvd page would match as a start.

Thanks a lot for the detailed answer.

I just realized why we do see the 4K increases but I did not see any 4k allocations.  contigmalloc in linux_skb due to the memory limit as-is in the tree.  There are changes to come to change that for certain behavior.

Can you also keep an eye on the mbuf-tag numbers (as that'll tells RX vs. TX apart).  On main I use (to alos get the header; stable/14 might still be different):   vmstat -m | egrep -e '(Use|lkpi|mbuf)
Comment 23 Guillaume Outters 2025-02-02 02:12:56 UTC
Created attachment 257182 [details]
Dtrace and vmstat after a 10 MiB receive
Comment 24 Guillaume Outters 2025-02-02 02:13:26 UTC
Created attachment 257183 [details]
Dtrace and vmstat after a 10 MiB send
Comment 25 Guillaume Outters 2025-02-02 02:13:49 UTC
(In reply to Bjoern A. Zeeb from comment #16)

With your Dtrace I got a flagrant difference between the "before" and "after" tipping point.

--------------------------------
-- The paths to allocation and freeing

There are 2 paths to linuxkpi_alloc_skb, let's call them A as Alloc:
- A1 lkpi_80211_txq_task (explicitely Tx)
- A2 lkpi_napi_task > rtw_pci_napi_poll (which would be the RX path?)
And 3 paths to linuxkpi_kfree_skb (D for Dealloc):
- D1 linux_work_fn > rtw_c2h_work > rtw_fw_c2h_cmd_handle > rtw_tx_report_handle > linuxkpi_ieee80211_tx_status
     Note that there is a "branch" at rtw_c2h_work, with:
     D1.1 rtw_c2h_work+0x62 leading to the full path above, at the tail of which linuxkpi_kfree_skb is called (by linuxkpi_ieee80211_tx_status)
     D1.2 rtw_c2h_work+0x6a does a direct call to linuxkpi_kfree_skb
- D2 lkpi_napi_task > rtw_pci_napi_poll > linuxkpi_ieee80211_rx (explicitely Rx)
- D3 [softclock_thread > softclock_call_cc] > rtw_tx_report_purge_timer
     contrary to all of the above which start with a [taskqueue_thread_loop > taskqueue_run_locked], this one's name suggests seems to be called on a periodic 

--------------------------------
-- My tests

I did a first test just after the reboot (I don't remember if it was mostly Rx or Tx),
with interesting results (see (2) below for the full result):
- 283 allocs through A1 got freed by D1.1
- 1144 allocs through A2, of which:
  - 861 freed by D2
  - 283 freed by D1.2
Note how the 283 matches: there's an interesting mix of allocations handled by the "other side"'s dealloc (and due to this mix I couldn't say for sure that 1 is Tx and 2 is Rx).  

Now after some time running, and vmstat -m starting to show an increase in skb mem consumption, I had totally different paths:

Be it in Tx or Rx (see the details in ):
- during the transfer I got 17870 A2, some of which got freed by a D2. But I had some A1 too, with:
  A2 = A1 + D2
  This is **really surprising**, as I intended a balanced A1 + A2 = D2 (the sum of allocations = the sum of deallocations)
  The increase of vmstat was of exactly 2 * A1 * 4 KB
  This explains very well: to achieve balance we should have freed as many as we had allocated: (A1 + A2) - (D2) = 0, with an expected D2 = A1 + A2;
  However here D2 = A2 - A1, thus our balance is of A1 + A2 - D2 = A1 + A2 - (A2 - A1) = 2 * A1
- if waiting after the transfer, apart from some negligible new allocations on this mode (A2 = A1 + D2),
  we see a **new process** (D3, with a rtw_tx_report_purge_timer on softclock_thread) running to free a bit of memory;
  although it seems to be dedicated to garbage collecting, **it doesn't keep up with the pace**

--------------------------------
-- My god!

It looks like SOME SKBUFFERS GET **ALLOCATED** INSTEAD OF **FREED**. 

--------------------------------
Notes

--- (1) Test procedure

vms() { vmstat -m | egrep -e '(Use|lkpi|mbuf)' ; }
dt() { sudo dtrace -s rtw88-skb.d | egrep -v 'kernel.0xffff|fork_exit|taskqueue_thread_loop|softclock_thread' ; }
for p in "rx b:/tmp/1 /tmp/" "tx /tmp/1 b:/tmp/" # /tmp/1 contains the first 10000000 bytes of a gzip file.
do
    set -- $p
    {
        dt $1.during &
        vms
        echo "# scp ($1) of a 10 MiB file"
        time scp $2 $3 2>&1
        sudo killall dtrace
        vms
        dt $1.after &
        echo "# Sleep 20"
        sleep 20
        sudo killall dtrace
        vms
    } | tee rtw88-skb.results.pourri.$1
done

--- (2) Before

  linuxkpi_alloc_skb
              kernel`linuxkpi_dev_alloc_skb+0xd
              kernel`lkpi_80211_txq_task+0x1ec
              kernel`taskqueue_run_locked+0x182
              kernel`taskqueue_thread_loop+0xc2
              283
  linuxkpi_kfree_skb                                
              kernel`linuxkpi_ieee80211_tx_status_ext+0x163
              kernel`linuxkpi_ieee80211_tx_status+0x45
              if_rtw88.ko`rtw_tx_report_handle+0x136
              if_rtw88.ko`rtw_fw_c2h_cmd_handle+0x15a
              if_rtw88.ko`rtw_c2h_work+0x62
              kernel`linux_work_fn+0xe4
              kernel`taskqueue_run_locked+0x182
              kernel`taskqueue_thread_loop+0xc2
              283

  linuxkpi_alloc_skb                           
              kernel`linuxkpi_dev_alloc_skb+0xd
              if_rtw88.ko`rtw_pci_napi_poll+0x254
              kernel`lkpi_napi_task+0xf
              kernel`taskqueue_run_locked+0x182
              kernel`taskqueue_thread_loop+0xc2
             1144
  linuxkpi_kfree_skb
              if_rtw88.ko`rtw_c2h_work+0x6a
              kernel`linux_work_fn+0xe4
              kernel`taskqueue_run_locked+0x182
              kernel`taskqueue_thread_loop+0xc2
              283
  linuxkpi_kfree_skb
              kernel`linuxkpi_ieee80211_rx+0x5a3
              if_rtw88.ko`rtw_pci_napi_poll+0x31b
              kernel`lkpi_napi_task+0xf
              kernel`taskqueue_run_locked+0x182
              kernel`taskqueue_thread_loop+0xc2
              861