In bus_dmamap_unload(), the counters for free_bpages and reserved_bpages appear to be vulnerable to unprotected read-modify-write operations that result in accounting that looks like a page leak. This was noticed on a 2GB quad core i.MX6 system that has more than one device attached via FTDI based USB serial connection. This system happens to be using FTDI US4232H quad port chips, but the problem is more general. There is a latency timer setting in FTDI chips that is used to set the interval at which short packets of data are flushed from the USB endpoint by the FTDI chip (which has some internal buffer memory). The default latency is 16 ms. We had set the latency to 4 ms to get data more quickly. We started noticing problems with slower USB responses and eventually the network stack would be affected as well. In the system in question, it fairly reliably "locked up" (couldn't ssh any more, trouble spawning processes when logged in on the serial port). In the locked up state, the usb/usbus0.xplr thread of the usb system process was hung and the system could not process usb messages (this i.MX6 system has an ehci USB controller). The typical stack dump for usbus0.xplr when things were hung is: 13 100029 usb usbus0.xplr sched_switch+0x9d4 mi_switch+0x184 sleepq_wait+0x2c _cv_wait+0x1bc usbd_do_request_flags+0x4bc usbd_req_get_port_status+0x44 uhub_explore+0xc4 uhub_explore+0x8f8 uhub_explore+0x8f8 usb_bus_explore+0x150 usb_process+0x124 fork_exit+0xc0 swi_exit+0 Once we noticed that hw.busdma.zone0.free_pages was steadily decrementing - eventually down to zero - we started investigating (dtrace was helpful here) why there appeared to be a leak of bounce pages. That's when we found what appears to be the vulnerability in bus_dmamap_unload(). This code has been this way for more than a decade, but it takes a lot of transactions for this to occur and was particularly hard to find. For a long time, we would work around the problem by detecting the symptoms and just reboot this system to recover - hardly ideal. It would take weeks to months depending on USB traffic load. Adjusting the FTDI latency timer to 0 ms (force packet delivery on every high speed microframe) finally made this happen more quickly. Even if someone else has enough traffic to experience the same problem, I will submit a patch for review. Early results seem promising (in particular the free bounce page accounting now does not show what looks like a leak). This was originally noticed quite a while ago on 11.x, but it has been confirmed on 13.x as well. As an indication of system load, with the 0 ms latency timer we see more than 100 bounce pages per second (based on hw.busdma.zone0.total_bounced), and the load due to interrupts is about 15%. The high rate of bounce page and interrupt activity gives lots of good opportunity for preemption at just the right time to trigger the accounting leak. Work sponsored by: Microchip Technology, Inc.
Created attachment 234876 [details] [patch] protect bounce page counters with global bounce page mutex This patch protects the otherwise unprotected bpage counters in bus_dmamap_unload() with the global bounce_lock mutex.
Can you check the value of "map->pagesreserved". Was it always zero?
Created attachment 234881 [details] Can you test this patch instead?
(In reply to Hans Petter Selasky from comment #2) Yes, map->pagesreserved was always zero in our testing. Tested with: dtrace -n 'fbt::_bus_dmamap_unload:entry { @ = quantize(((struct bus_dmamap*)arg1)->pagesreserved); }' (In reply to Hans Petter Selasky from comment #3) Yes, we'll run test some units with your optimization to skip taking the mutex if map->pagesreserved is 0.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=6c4b6f55f77d8d7cee1b277bd6579a77d6890ef9 commit 6c4b6f55f77d8d7cee1b277bd6579a77d6890ef9 Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2022-06-23 09:31:17 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2022-06-25 10:01:59 +0000 busdma: Protect ARM busdma bounce page counters using the bounce page lock. In bus_dmamap_unload() on ARM, the counters for free_bpages and reserved_bpages appear to be vulnerable to unprotected read-modify-write operations that result in accounting that looks like a page leak. This was noticed on a 2GB quad core i.MX6 system that has more than one device attached via FTDI based USB serial connection. Submitted by: John Hein <jcfyecrayz@liamekaens.com> Differential Revision: https://reviews.freebsd.org/D35553 PR: 264836 MFC after: 3 days Sponsored by: NVIDIA Networking sys/arm/arm/busdma_machdep.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a0ffae30c593d829bc9cdd7b32017352e501d247 commit a0ffae30c593d829bc9cdd7b32017352e501d247 Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2022-06-23 09:31:17 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2022-06-28 06:17:20 +0000 busdma: Protect ARM busdma bounce page counters using the bounce page lock. In bus_dmamap_unload() on ARM, the counters for free_bpages and reserved_bpages appear to be vulnerable to unprotected read-modify-write operations that result in accounting that looks like a page leak. This was noticed on a 2GB quad core i.MX6 system that has more than one device attached via FTDI based USB serial connection. Submitted by: John Hein <jcfyecrayz@liamekaens.com> Differential Revision: https://reviews.freebsd.org/D35553 PR: 264836 Sponsored by: NVIDIA Networking (cherry picked from commit 6c4b6f55f77d8d7cee1b277bd6579a77d6890ef9) sys/arm/arm/busdma_machdep-v6.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=78088c5714e065058e8b18595d31a2c115cc00d6 commit 78088c5714e065058e8b18595d31a2c115cc00d6 Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2022-06-23 09:31:17 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2022-06-28 06:20:34 +0000 busdma: Protect ARM busdma bounce page counters using the bounce page lock. In bus_dmamap_unload() on ARM, the counters for free_bpages and reserved_bpages appear to be vulnerable to unprotected read-modify-write operations that result in accounting that looks like a page leak. This was noticed on a 2GB quad core i.MX6 system that has more than one device attached via FTDI based USB serial connection. Submitted by: John Hein <jcfyecrayz@liamekaens.com> Differential Revision: https://reviews.freebsd.org/D35553 PR: 264836 Sponsored by: NVIDIA Networking (cherry picked from commit 6c4b6f55f77d8d7cee1b277bd6579a77d6890ef9) sys/arm/arm/busdma_machdep.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)