Bug 277476 - graphics/drm-515-kmod: amdgpu periodic hangs due to phys contig allocations
Summary: graphics/drm-515-kmod: amdgpu periodic hangs due to phys contig allocations
Status: In Progress
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-x11 (Nobody)
URL: https://github.com/freebsd/drm-kmod/i...
Keywords:
Depends on:
Blocks:
 
Reported: 2024-03-04 14:12 UTC by Josef 'Jeff' Sipek
Modified: 2024-11-14 09:24 UTC (History)
12 users (show)

See Also:
linimon: maintainer-feedback? (x11)


Attachments
PR277476 fix (2.87 KB, patch)
2024-11-14 00:43 UTC, sigsys
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Josef 'Jeff' Sipek 2024-03-04 14:12:20 UTC
Two weeks ago I replaced an ancient nvidia graphics card with an AMD RX580 card to run open source drivers. Everything works fine most of the time, but occasionally the system hangs for a few seconds (5-10, usually).  The longer the system has been up, the worse it gets.

Digging into it a bit, it is because userspace (it always looks like X) does an ioctl into drm which then tries to allocate a large-ish piece of physically contiguous memory.  This explains why it gets worse as uptime increases (free physical memory fragmentation) and when running firefox (the most memory hungry application I use).  I know nothing about graphics cards, the software stack supporting them, or the linux kernel API compatibility layer, but clearly it'd be beneficial if amdgpu/drm/whatever could make use of *virtually* contiguous pages or some kind of allocation caching/reuse to avoid repeatedly asking the vm code for physically contiguous ranges.

To conclude the above, I did a handful of dtrace-based experiments.

While one of the "temporary hangs" was happening, the following was the most common (non-idle) profiler stack:

# dtrace -n 'profile-97{@[stack()]=count()}'
...
              kernel`vm_phys_alloc_contig+0x11d
              kernel`linux_alloc_pages+0x8f
              ttm.ko`ttm_pool_alloc+0x2cb
              ttm.ko`ttm_tt_populate+0xc5
              ttm.ko`ttm_bo_handle_move_mem+0xc3
              ttm.ko`ttm_bo_validate+0xb4
              ttm.ko`ttm_bo_init_reserved+0x199
              amdgpu.ko`amdgpu_bo_create+0x1eb
              amdgpu.ko`amdgpu_bo_create_user+0x21
              amdgpu.ko`amdgpu_gem_create_ioctl+0x1e2
              drm.ko`drm_ioctl_kernel+0xc6
              drm.ko`drm_ioctl+0x2b5
              kernel`linux_file_ioctl+0x312
              kernel`kern_ioctl+0x255
              kernel`sys_ioctl+0x123
              kernel`amd64_syscall+0x109
              kernel`0xffffffff80fe43eb

The latency of vm_phys_alloc_contig (entry to return) is bimodal - with latencies in the single digit *milli*seconds during the "temporary hangs":

# dtrace -n 'fbt::vm_phys_alloc_contig:entry{self->ts=timestamp}' -n 'fbt::vm_phys_alloc_contig:return/self->ts/{this->delta=timestamp-self->ts; @=quantize(this->delta);}' -n 'tick-1sec{printa(@)}'
...
           value  ------------- Distribution ------------- count    
             256 |                                         0        
             512 |@                                        2606     
            1024 |@@@@@@@@@                                18207    
            2048 |@                                        2534     
            4096 |                                         894      
            8192 |                                         34       
           16384 |                                         78       
           32768 |                                         58       
           65536 |                                         219      
          131072 |                                         306      
          262144 |                                         310      
          524288 |                                         735      
         1048576 |                                         174      
         2097152 |@@                                       4364     
         4194304 |@@@@@@@@@@@@@@@@@@@@@@@@                 47475    
         8388608 |@                                        1546     
        16777216 |                                         2        
        33554432 |                                         0     

The number of pages being allocated:

# dtrace -n 'fbt::vm_phys_alloc_contig:entry/arg1>1/{@=quantize(arg1)}' -n 'tick-1sec{printa(@)}'
...
           value  ------------- Distribution ------------- count    
               1 |                                         0        
               2 |@@@                                      15       
               4 |@                                        7        
               8 |@@@                                      16       
              16 |@@                                       10       
              32 |@@                                       10       
              64 |@                                        7        
             128 |@@@                                      12       
             256 |@@@                                      12       
             512 |@@@@@@@@@@@@@@                           68       
            1024 |@@@@@@@                                  32       
            2048 |                                         0      

I did a few more dtrace experiments, but they all point to the same thing - a drm/amdgpu related ioctl wants 4MB of physically contiguous memory often enough to become a headache.  4MB isn't too much given than the system has 32GB of RAM, but physically contiguous takes a while to fulfill sometimes.


The card:

vgapci0@pci0:1:0:0:     class=0x030000 rev=0xe7 hdr=0x00 vendor=0x1002 device=0x67df subvendor=0x1da2 subdevice=0xe353
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]'
    class      = display
    subclass   = VGA

$ pkg info|grep -i amd             
gpu-firmware-amd-kmod-aldebaran-20230625 Firmware modules for aldebaran AMD GPUs
gpu-firmware-amd-kmod-arcturus-20230625 Firmware modules for arcturus AMD GPUs
gpu-firmware-amd-kmod-banks-20230625 Firmware modules for banks AMD GPUs
gpu-firmware-amd-kmod-beige-goby-20230625 Firmware modules for beige_goby AMD GPUs
gpu-firmware-amd-kmod-bonaire-20230625 Firmware modules for bonaire AMD GPUs
gpu-firmware-amd-kmod-carrizo-20230625 Firmware modules for carrizo AMD GPUs
gpu-firmware-amd-kmod-cyan-skillfish2-20230625 Firmware modules for cyan_skillfish2 AMD GPUs
gpu-firmware-amd-kmod-dimgrey-cavefish-20230625 Firmware modules for dimgrey_cavefish AMD GPUs
gpu-firmware-amd-kmod-fiji-20230625 Firmware modules for fiji AMD GPUs
gpu-firmware-amd-kmod-green-sardine-20230625 Firmware modules for green_sardine AMD GPUs
gpu-firmware-amd-kmod-hainan-20230625 Firmware modules for hainan AMD GPUs
gpu-firmware-amd-kmod-hawaii-20230625 Firmware modules for hawaii AMD GPUs
gpu-firmware-amd-kmod-kabini-20230625 Firmware modules for kabini AMD GPUs
gpu-firmware-amd-kmod-kaveri-20230625 Firmware modules for kaveri AMD GPUs
gpu-firmware-amd-kmod-mullins-20230625 Firmware modules for mullins AMD GPUs
gpu-firmware-amd-kmod-navi10-20230625 Firmware modules for navi10 AMD GPUs
gpu-firmware-amd-kmod-navi12-20230625 Firmware modules for navi12 AMD GPUs
gpu-firmware-amd-kmod-navi14-20230625 Firmware modules for navi14 AMD GPUs
gpu-firmware-amd-kmod-navy-flounder-20230625 Firmware modules for navy_flounder AMD GPUs
gpu-firmware-amd-kmod-oland-20230625 Firmware modules for oland AMD GPUs
gpu-firmware-amd-kmod-picasso-20230625 Firmware modules for picasso AMD GPUs
gpu-firmware-amd-kmod-pitcairn-20230625 Firmware modules for pitcairn AMD GPUs
gpu-firmware-amd-kmod-polaris10-20230625 Firmware modules for polaris10 AMD GPUs
gpu-firmware-amd-kmod-polaris11-20230625 Firmware modules for polaris11 AMD GPUs
gpu-firmware-amd-kmod-polaris12-20230625 Firmware modules for polaris12 AMD GPUs
gpu-firmware-amd-kmod-raven-20230625 Firmware modules for raven AMD GPUs
gpu-firmware-amd-kmod-raven2-20230625 Firmware modules for raven2 AMD GPUs
gpu-firmware-amd-kmod-renoir-20230625 Firmware modules for renoir AMD GPUs
gpu-firmware-amd-kmod-si58-20230625 Firmware modules for si58 AMD GPUs
gpu-firmware-amd-kmod-sienna-cichlid-20230625 Firmware modules for sienna_cichlid AMD GPUs
gpu-firmware-amd-kmod-stoney-20230625 Firmware modules for stoney AMD GPUs
gpu-firmware-amd-kmod-tahiti-20230625 Firmware modules for tahiti AMD GPUs
gpu-firmware-amd-kmod-tonga-20230625 Firmware modules for tonga AMD GPUs
gpu-firmware-amd-kmod-topaz-20230625 Firmware modules for topaz AMD GPUs
gpu-firmware-amd-kmod-vangogh-20230625 Firmware modules for vangogh AMD GPUs
gpu-firmware-amd-kmod-vega10-20230625 Firmware modules for vega10 AMD GPUs
gpu-firmware-amd-kmod-vega12-20230625 Firmware modules for vega12 AMD GPUs
gpu-firmware-amd-kmod-vega20-20230625 Firmware modules for vega20 AMD GPUs
gpu-firmware-amd-kmod-vegam-20230625 Firmware modules for vegam AMD GPUs
gpu-firmware-amd-kmod-verde-20230625 Firmware modules for verde AMD GPUs
gpu-firmware-amd-kmod-yellow-carp-20230625 Firmware modules for yellow_carp AMD GPUs
suitesparse-amd-3.3.0          Symmetric approximate minimum degree
suitesparse-camd-3.3.0         Symmetric approximate minimum degree
suitesparse-ccolamd-3.3.0      Constrained column approximate minimum degree ordering
suitesparse-colamd-3.3.0       Column approximate minimum degree ordering algorithm
webcamd-5.17.1.2_1             Port of Linux USB webcam and DVB drivers into userspace
xf86-video-amdgpu-22.0.0_1     X.Org amdgpu display driver
$ pkg info|grep -i drm
drm-515-kmod-5.15.118_3        DRM drivers modules
drm-kmod-20220907_1            Metaport of DRM modules for the linuxkpi-based KMS components
gpu-firmware-kmod-20230210_1,1 Firmware modules for the drm-kmod drivers
libdrm-2.4.120_1,1             Direct Rendering Manager library and headers
Comment 1 Josef 'Jeff' Sipek 2024-03-04 14:28:43 UTC
While gathering all the dtrace data, I was so distracted I forgot to mention:

$ freebsd-version -kru
14.0-RELEASE-p5
14.0-RELEASE-p5
14.0-RELEASE-p5
Comment 2 Josef 'Jeff' Sipek 2024-04-06 14:22:49 UTC
I dug a bit more into this.  It looks like the drm code has provisions for
allocating memory via dma APIs.  The FreeBSD port doesn't implement those.

Specifically, looking at drm-kmod-drm_v5.15.25_5 source:

drivers/gpu/drm/amd/amdgpu/gmc_v*.c sets adev->need_swiotlb to
drm_need_swiotlb(...).  drm_need_swiotlb is implemented in
drivers/gpu/drm/drm_cache.c as a 'return false' on FreeBSD.

Later on, amdgpu_ttm_init calls ttm_device_init with the use_dma_alloc
argument equal to adev->need_swiotlb (IOW, false).

Much later on, ttm_pool_alloc is called to allocate a buffer.  That in turn
calls ttm_pool_alloc_page which amounts to:

	if (!use_dma_alloc)
		return alloc_pages(...);
	
	panic("ttm_pool.c: use_dma_alloc not implemented");

So, because of the 'return false' during initialization, we always call
alloc_pages (aka. linux_alloc_pages) which tries to allocate physically
contiguous memory.

As I said before, I don't know anything about the graphics stack, so it is
possible that this dma API is completely irrelevant.


Looking at ttm_pool_alloc some more, it immediately turns the physically
contiguous allocation into an array of struct page pointers (tt->pagse).
So, depending on how the rest of the module uses the buffer & pages, it
may be relatively easy to switch to a virtually-contiguous allocation.
Comment 3 Tomasz "CeDeROM" CEDRO 2024-05-15 23:23:56 UTC
I also have RX580. After upgrading from 13.2 to 14.0 I got frequent kernel panics. 

Related reports:
* https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276985
* https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=278212

I noticed that setting manual xorg.conf admgpu DRI=2 make kernel panic less frequent but instead system got slower and slower until unresponsive, if I managed to kill xorg in time it could work for a while again until I had to kill xorg again.

I could not work out fallback with safe non accelerated xorg.conf using scfb that would allow dual screen setup with one screen rotated. Dual monitor is only possible with amdgpu loaded. It was also not possible to disable acceleration in amdgpu and have xrandr (secondary screen rotation).

Rolled back to 13.2. DRM 5.15 / AMDGPU / LinuxKPI makes 14.0 unreliable.
Comment 4 Olivier Certner freebsd_committer freebsd_triage 2024-09-02 19:04:09 UTC
(In reply to Josef 'Jeff' Sipek from comment #0)

I reported this independently in the drm-kmod GitHub project as https://github.com/freebsd/drm-kmod/issues/302.  Going to also cross-link this PR there.

I'd really like to get to the bottom of this.  However, I don't plan to have the time to do so before end of month at the very least.

drm-61-kmod exhibits the same problem.  However, drm-510-kmod works fine for me.

(In reply to Tomasz "CeDeROM" CEDRO from comment #3)

Please see my comment 8 in bug #278212.
Comment 5 sigsys 2024-11-08 09:04:51 UTC
Yeah so this problem was super annoying. But thanks to the information already posted here, seems like it wasn't too hard to fix.

IIUC the drm code (ttm_pool_alloc()) asking for contiguous pages doesn't actually need contiguous pages. It's just an opportunistic optimization. When allocation fails, it fallsback to asking for less and less contiguous pages (eventually only asking for one page at a time). When ttm_pool_alloc_page() asks for more than one page, it passes alloc_pages() some extra flags (__GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM).

What's expensive is the vm_page_reclaim_contig() in linux_alloc_pages(). The function tries too hard to find contiguous memory (that the drm code doesn't even require) and as physical memory gets too fragmented it becomes very slow.

So, very simple fix, make linux_alloc_pages() react to one of the flag passed by the drm code:

diff --git a/sys/compat/linuxkpi/common/include/linux/gfp.h b/sys/compat/linuxkpi/common/include/linux/gfp.h
index 2fcc0dc05f29..58a021086c98 100644
--- a/sys/compat/linuxkpi/common/include/linux/gfp.h
+++ b/sys/compat/linuxkpi/common/include/linux/gfp.h
@@ -44,7 +44,6 @@
 #define	__GFP_NOWARN	0
 #define	__GFP_HIGHMEM	0
 #define	__GFP_ZERO	M_ZERO
-#define	__GFP_NORETRY	0
 #define	__GFP_NOMEMALLOC 0
 #define	__GFP_RECLAIM   0
 #define	__GFP_RECLAIMABLE   0
@@ -58,7 +57,8 @@
 #define	__GFP_KSWAPD_RECLAIM	0
 #define	__GFP_WAIT	M_WAITOK
 #define	__GFP_DMA32	(1U << 24) /* LinuxKPI only */
-#define	__GFP_BITS_SHIFT 25
+#define	__GFP_NORETRY	(1U << 25) /* LinuxKPI only */
+#define	__GFP_BITS_SHIFT 26
 #define	__GFP_BITS_MASK	((1 << __GFP_BITS_SHIFT) - 1)
 #define	__GFP_NOFAIL	M_WAITOK
 
diff --git a/sys/compat/linuxkpi/common/src/linux_page.c b/sys/compat/linuxkpi/common/src/linux_page.c
index 18b90b5e3d73..71a6890a3795 100644
--- a/sys/compat/linuxkpi/common/src/linux_page.c
+++ b/sys/compat/linuxkpi/common/src/linux_page.c
@@ -118,7 +118,7 @@ linux_alloc_pages(gfp_t flags, unsigned int order)
 			page = vm_page_alloc_noobj_contig(req, npages, 0, pmax,
 			    PAGE_SIZE, 0, VM_MEMATTR_DEFAULT);
 			if (page == NULL) {
-				if (flags & M_WAITOK) {
+				if ((flags & (M_WAITOK | __GFP_NORETRY)) == M_WAITOK) {
 					int err = vm_page_reclaim_contig(req,
 					    npages, 0, pmax, PAGE_SIZE, 0);
 					if (err == ENOMEM)

Been working fine here with amdgpu for about 3 weeks.

(The drm modules need to be recompiled with the modified kernel header.)
Comment 6 Emmanuel Vadot freebsd_committer freebsd_triage 2024-11-10 11:00:44 UTC
Intersting find, I've never could reproduce this bug on my RX550, Olivier can you test if the code (which looks ok to me) fixes the issue for you ?
Comment 7 rk 2024-11-10 19:55:42 UTC
(In reply to sigsys from comment #5)

I've been suffering from this issue on a Ryzen 9 4900H with embedded
Renoir graphics using drm-61-kmod-6.1.92_2 on stable/14-n268738-048132192698.
Playing videos using mpv easily triggered the slowdown after some time
(especially 4K videos).
I've implemented the suggested fix and now I cannot reproduce the behaviour
anymore (tested for 2 days now). Even playing multiple 4K videos in
parallel does not cause the problem.
Thanks for the fix.
Comment 8 Olivier Certner freebsd_committer freebsd_triage 2024-11-12 09:43:41 UTC
(In reply to sigsys from comment #5)

> IIUC the drm code (ttm_pool_alloc()) asking for contiguous pages doesn't actually need contiguous pages. It's just an opportunistic optimization.

That would be very good news (at least from the users' point of view).

Have not spent time on this issue since my last posts.  I had naively thought that the new DRM ports really needed contiguous allocation for whatever reason, and should probably have looked a bit further instead of assuming this would need some deep and highly time consuming analysis.

(In reply to Emmanuel Vadot from comment #6)

Will test that soon and report.
Comment 9 Emmanuel Vadot freebsd_committer freebsd_triage 2024-11-12 12:23:19 UTC
(In reply to sigsys from comment #5)

Waiting for more people to test but in the meantime could you add a git-format patch to this bug please ? (So with full commit message and correct authorship).
Comment 10 Pierre Pronchery 2024-11-13 19:05:38 UTC
(In reply to Emmanuel Vadot from comment #6)
The patch also works well for me, no slowdowns to report after 24 hours.
Comment 11 Eirik Oeverby 2024-11-13 20:49:14 UTC
(In reply to sigsys from comment #5)
Has this patch landed already? I'm eager to test on my threadripper with Navi 24 [Radeon PRO W6400]; it's borderline useless after ~48h uptime and needs frequent reboots to fix. At least it's better than before I clamped the ARC to 8GB to slow the process down..
Comment 12 sigsys 2024-11-14 00:43:11 UTC
Created attachment 255155 [details]
PR277476 fix
Comment 13 sigsys 2024-11-14 00:48:43 UTC
(In reply to Emmanuel Vadot from comment #9)
Alright here it is.

Is it already too late to have this merged in 14.2?

I'm pretty sure this patch is safe. GFP_NORETRY isn't used in-tree at all right now. And this patch makes it do pretty much what it says. It doesn't retry. You'd hope that any code using this flag would expect allocations to fail...

The problem doesn't always happen for everyone but when it does man it's rough. After a week or two I was getting hangs that lasted 15 seconds sometimes. Restarting firefox would fix it for a while but eventually it becomes unusable.

Even if this made it in 14.2 IIUC it would take a while before the 14.X packages would be compiled against the new kernel headers, but it would already be useful to have it in base so that you could get the fix by compiling drm-kmod from ports.
Comment 14 fullermd 2024-11-14 02:08:03 UTC
FWIW, I definitely ran into what sounds just like this (with several different cards, on both 515 and 61; 510 was always rock solid).  After a few days, I'd sometimes get freezes lasting a minute or more.

A workaround that seems to work for me has been switching from the amdgpu to the modesetting X driver; I still occasionally see little blips, but they resolve and don't seem to pile up the way they did on amdgpu, even after months of uptime.
Comment 15 Olivier Certner freebsd_committer freebsd_triage 2024-11-14 09:24:12 UTC
(In reply to sigsys from comment #13)

It seems manu@ is having a crash with the patch applied on 5.15.  So while it seems safe, we have to rule out some possible impacts in certain situations.

I'm afraid it is too late to have it merged in 14.2 anyway, so let's be sure we are not regressing anything while fixing the problem.