Bug 246852 - bounce_bus_dmamap_load_buffer() does not clean up state on error EFBIG.
Summary: bounce_bus_dmamap_load_buffer() does not clean up state on error EFBIG.
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: arm (show other bugs)
Version: CURRENT
Hardware: arm64 Any
: --- Affects Only Me
Assignee: freebsd-arm (Nobody)
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2020-05-29 23:51 UTC by Thomas Skibo
Modified: 2020-06-04 02:11 UTC (History)
2 users (show)

See Also:


Attachments
bounce_bus_dmamap_load_buffer fix. (735 bytes, text/plain)
2020-05-29 23:51 UTC, Thomas Skibo
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Skibo 2020-05-29 23:51:34 UTC
Created attachment 215037 [details]
bounce_bus_dmamap_load_buffer fix.

When the arm64 version of bounce_bus_dmamap_load_buffer() exits with error EFBIG (due to too many segments), it leaves the dma map in a state where the next call fails with EFBIG too even though the next request has a single segment.

This happens when a network driver calls bus_dmamap_load_mbuf_sg() with a complicated mbuf chain.  Typical drivers respond to the EFBIG error by defragging the mbuf and trying the bus_dmamap_load_mbuf_sg() again.  In my case, the second call would also fail in which case the driver gives up and drops the packet.

It looks like this was fixed in the arm and mips versions of busdma_machdep.c at some point.  In those architectures, the routine calls bus_dmamap_unload() to clean up the state if it fails with EFBIG.  The same change was made to bounce_bus_dmamap_load_phys() too.  I've attached a possible fix based on those.  It seems to work for me.

It looks like riscv also has the same issue.
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2020-06-01 16:12:47 UTC
It looks like x86 has the same issue as well?  I guess the problem is only noticeable if the request required the allocation of bounce pages.
Comment 2 John Baldwin freebsd_committer freebsd_triage 2020-06-03 23:16:09 UTC
For MIPS it least it looks like the unload was added back in r246713 and wasn't present in the old code prior to the rework.  RISC-V undoubtedly copied the bug from arm64 (just as it copied all of the other arm64 code).  arm64 was copied from x86 in r282655 and x86 didn't have the unload calls added in r246713.  I guess it is less work to do the unloads in the MD backends vs doing them in subr_busdma.c itself though conceptually it would be a bit cleaner to do the unload in the MI code I think.  However, I think you'd ultimately have to patch more places in the MI code?
Comment 3 Thomas Skibo 2020-06-04 02:11:36 UTC
I don't think the x86 code has a problem.  It looks like the arm64 code keeps a list of virtual addresses for cache operations and the list doesn't get reset when bounce_bus_dmamap_load_buffer() fails with EFBIG.  I think it's because map->sync_count isn't reset to zero that the next call fails.  That was introduced after the arm64 code was cloned from x86.