Two weeks ago I replaced an ancient nvidia graphics card with an AMD RX580 card to run open source drivers. Everything works fine most of the time, but occasionally the system hangs for a few seconds (5-10, usually). The longer the system has been up, the worse it gets. Digging into it a bit, it is because userspace (it always looks like X) does an ioctl into drm which then tries to allocate a large-ish piece of physically contiguous memory. This explains why it gets worse as uptime increases (free physical memory fragmentation) and when running firefox (the most memory hungry application I use). I know nothing about graphics cards, the software stack supporting them, or the linux kernel API compatibility layer, but clearly it'd be beneficial if amdgpu/drm/whatever could make use of *virtually* contiguous pages or some kind of allocation caching/reuse to avoid repeatedly asking the vm code for physically contiguous ranges. To conclude the above, I did a handful of dtrace-based experiments. While one of the "temporary hangs" was happening, the following was the most common (non-idle) profiler stack: # dtrace -n 'profile-97{@[stack()]=count()}' ... kernel`vm_phys_alloc_contig+0x11d kernel`linux_alloc_pages+0x8f ttm.ko`ttm_pool_alloc+0x2cb ttm.ko`ttm_tt_populate+0xc5 ttm.ko`ttm_bo_handle_move_mem+0xc3 ttm.ko`ttm_bo_validate+0xb4 ttm.ko`ttm_bo_init_reserved+0x199 amdgpu.ko`amdgpu_bo_create+0x1eb amdgpu.ko`amdgpu_bo_create_user+0x21 amdgpu.ko`amdgpu_gem_create_ioctl+0x1e2 drm.ko`drm_ioctl_kernel+0xc6 drm.ko`drm_ioctl+0x2b5 kernel`linux_file_ioctl+0x312 kernel`kern_ioctl+0x255 kernel`sys_ioctl+0x123 kernel`amd64_syscall+0x109 kernel`0xffffffff80fe43eb The latency of vm_phys_alloc_contig (entry to return) is bimodal - with latencies in the single digit *milli*seconds during the "temporary hangs": # dtrace -n 'fbt::vm_phys_alloc_contig:entry{self->ts=timestamp}' -n 'fbt::vm_phys_alloc_contig:return/self->ts/{this->delta=timestamp-self->ts; @=quantize(this->delta);}' -n 'tick-1sec{printa(@)}' ... value ------------- Distribution ------------- count 256 | 0 512 |@ 2606 1024 |@@@@@@@@@ 18207 2048 |@ 2534 4096 | 894 8192 | 34 16384 | 78 32768 | 58 65536 | 219 131072 | 306 262144 | 310 524288 | 735 1048576 | 174 2097152 |@@ 4364 4194304 |@@@@@@@@@@@@@@@@@@@@@@@@ 47475 8388608 |@ 1546 16777216 | 2 33554432 | 0 The number of pages being allocated: # dtrace -n 'fbt::vm_phys_alloc_contig:entry/arg1>1/{@=quantize(arg1)}' -n 'tick-1sec{printa(@)}' ... value ------------- Distribution ------------- count 1 | 0 2 |@@@ 15 4 |@ 7 8 |@@@ 16 16 |@@ 10 32 |@@ 10 64 |@ 7 128 |@@@ 12 256 |@@@ 12 512 |@@@@@@@@@@@@@@ 68 1024 |@@@@@@@ 32 2048 | 0 I did a few more dtrace experiments, but they all point to the same thing - a drm/amdgpu related ioctl wants 4MB of physically contiguous memory often enough to become a headache. 4MB isn't too much given than the system has 32GB of RAM, but physically contiguous takes a while to fulfill sometimes. The card: vgapci0@pci0:1:0:0: class=0x030000 rev=0xe7 hdr=0x00 vendor=0x1002 device=0x67df subvendor=0x1da2 subdevice=0xe353 vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]' device = 'Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]' class = display subclass = VGA $ pkg info|grep -i amd gpu-firmware-amd-kmod-aldebaran-20230625 Firmware modules for aldebaran AMD GPUs gpu-firmware-amd-kmod-arcturus-20230625 Firmware modules for arcturus AMD GPUs gpu-firmware-amd-kmod-banks-20230625 Firmware modules for banks AMD GPUs gpu-firmware-amd-kmod-beige-goby-20230625 Firmware modules for beige_goby AMD GPUs gpu-firmware-amd-kmod-bonaire-20230625 Firmware modules for bonaire AMD GPUs gpu-firmware-amd-kmod-carrizo-20230625 Firmware modules for carrizo AMD GPUs gpu-firmware-amd-kmod-cyan-skillfish2-20230625 Firmware modules for cyan_skillfish2 AMD GPUs gpu-firmware-amd-kmod-dimgrey-cavefish-20230625 Firmware modules for dimgrey_cavefish AMD GPUs gpu-firmware-amd-kmod-fiji-20230625 Firmware modules for fiji AMD GPUs gpu-firmware-amd-kmod-green-sardine-20230625 Firmware modules for green_sardine AMD GPUs gpu-firmware-amd-kmod-hainan-20230625 Firmware modules for hainan AMD GPUs gpu-firmware-amd-kmod-hawaii-20230625 Firmware modules for hawaii AMD GPUs gpu-firmware-amd-kmod-kabini-20230625 Firmware modules for kabini AMD GPUs gpu-firmware-amd-kmod-kaveri-20230625 Firmware modules for kaveri AMD GPUs gpu-firmware-amd-kmod-mullins-20230625 Firmware modules for mullins AMD GPUs gpu-firmware-amd-kmod-navi10-20230625 Firmware modules for navi10 AMD GPUs gpu-firmware-amd-kmod-navi12-20230625 Firmware modules for navi12 AMD GPUs gpu-firmware-amd-kmod-navi14-20230625 Firmware modules for navi14 AMD GPUs gpu-firmware-amd-kmod-navy-flounder-20230625 Firmware modules for navy_flounder AMD GPUs gpu-firmware-amd-kmod-oland-20230625 Firmware modules for oland AMD GPUs gpu-firmware-amd-kmod-picasso-20230625 Firmware modules for picasso AMD GPUs gpu-firmware-amd-kmod-pitcairn-20230625 Firmware modules for pitcairn AMD GPUs gpu-firmware-amd-kmod-polaris10-20230625 Firmware modules for polaris10 AMD GPUs gpu-firmware-amd-kmod-polaris11-20230625 Firmware modules for polaris11 AMD GPUs gpu-firmware-amd-kmod-polaris12-20230625 Firmware modules for polaris12 AMD GPUs gpu-firmware-amd-kmod-raven-20230625 Firmware modules for raven AMD GPUs gpu-firmware-amd-kmod-raven2-20230625 Firmware modules for raven2 AMD GPUs gpu-firmware-amd-kmod-renoir-20230625 Firmware modules for renoir AMD GPUs gpu-firmware-amd-kmod-si58-20230625 Firmware modules for si58 AMD GPUs gpu-firmware-amd-kmod-sienna-cichlid-20230625 Firmware modules for sienna_cichlid AMD GPUs gpu-firmware-amd-kmod-stoney-20230625 Firmware modules for stoney AMD GPUs gpu-firmware-amd-kmod-tahiti-20230625 Firmware modules for tahiti AMD GPUs gpu-firmware-amd-kmod-tonga-20230625 Firmware modules for tonga AMD GPUs gpu-firmware-amd-kmod-topaz-20230625 Firmware modules for topaz AMD GPUs gpu-firmware-amd-kmod-vangogh-20230625 Firmware modules for vangogh AMD GPUs gpu-firmware-amd-kmod-vega10-20230625 Firmware modules for vega10 AMD GPUs gpu-firmware-amd-kmod-vega12-20230625 Firmware modules for vega12 AMD GPUs gpu-firmware-amd-kmod-vega20-20230625 Firmware modules for vega20 AMD GPUs gpu-firmware-amd-kmod-vegam-20230625 Firmware modules for vegam AMD GPUs gpu-firmware-amd-kmod-verde-20230625 Firmware modules for verde AMD GPUs gpu-firmware-amd-kmod-yellow-carp-20230625 Firmware modules for yellow_carp AMD GPUs suitesparse-amd-3.3.0 Symmetric approximate minimum degree suitesparse-camd-3.3.0 Symmetric approximate minimum degree suitesparse-ccolamd-3.3.0 Constrained column approximate minimum degree ordering suitesparse-colamd-3.3.0 Column approximate minimum degree ordering algorithm webcamd-5.17.1.2_1 Port of Linux USB webcam and DVB drivers into userspace xf86-video-amdgpu-22.0.0_1 X.Org amdgpu display driver $ pkg info|grep -i drm drm-515-kmod-5.15.118_3 DRM drivers modules drm-kmod-20220907_1 Metaport of DRM modules for the linuxkpi-based KMS components gpu-firmware-kmod-20230210_1,1 Firmware modules for the drm-kmod drivers libdrm-2.4.120_1,1 Direct Rendering Manager library and headers
While gathering all the dtrace data, I was so distracted I forgot to mention: $ freebsd-version -kru 14.0-RELEASE-p5 14.0-RELEASE-p5 14.0-RELEASE-p5
I dug a bit more into this. It looks like the drm code has provisions for allocating memory via dma APIs. The FreeBSD port doesn't implement those. Specifically, looking at drm-kmod-drm_v5.15.25_5 source: drivers/gpu/drm/amd/amdgpu/gmc_v*.c sets adev->need_swiotlb to drm_need_swiotlb(...). drm_need_swiotlb is implemented in drivers/gpu/drm/drm_cache.c as a 'return false' on FreeBSD. Later on, amdgpu_ttm_init calls ttm_device_init with the use_dma_alloc argument equal to adev->need_swiotlb (IOW, false). Much later on, ttm_pool_alloc is called to allocate a buffer. That in turn calls ttm_pool_alloc_page which amounts to: if (!use_dma_alloc) return alloc_pages(...); panic("ttm_pool.c: use_dma_alloc not implemented"); So, because of the 'return false' during initialization, we always call alloc_pages (aka. linux_alloc_pages) which tries to allocate physically contiguous memory. As I said before, I don't know anything about the graphics stack, so it is possible that this dma API is completely irrelevant. Looking at ttm_pool_alloc some more, it immediately turns the physically contiguous allocation into an array of struct page pointers (tt->pagse). So, depending on how the rest of the module uses the buffer & pages, it may be relatively easy to switch to a virtually-contiguous allocation.
I also have RX580. After upgrading from 13.2 to 14.0 I got frequent kernel panics. Related reports: * https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276985 * https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=278212 I noticed that setting manual xorg.conf admgpu DRI=2 make kernel panic less frequent but instead system got slower and slower until unresponsive, if I managed to kill xorg in time it could work for a while again until I had to kill xorg again. I could not work out fallback with safe non accelerated xorg.conf using scfb that would allow dual screen setup with one screen rotated. Dual monitor is only possible with amdgpu loaded. It was also not possible to disable acceleration in amdgpu and have xrandr (secondary screen rotation). Rolled back to 13.2. DRM 5.15 / AMDGPU / LinuxKPI makes 14.0 unreliable.
(In reply to Josef 'Jeff' Sipek from comment #0) I reported this independently in the drm-kmod GitHub project as https://github.com/freebsd/drm-kmod/issues/302. Going to also cross-link this PR there. I'd really like to get to the bottom of this. However, I don't plan to have the time to do so before end of month at the very least. drm-61-kmod exhibits the same problem. However, drm-510-kmod works fine for me. (In reply to Tomasz "CeDeROM" CEDRO from comment #3) Please see my comment 8 in bug #278212.
Yeah so this problem was super annoying. But thanks to the information already posted here, seems like it wasn't too hard to fix. IIUC the drm code (ttm_pool_alloc()) asking for contiguous pages doesn't actually need contiguous pages. It's just an opportunistic optimization. When allocation fails, it fallsback to asking for less and less contiguous pages (eventually only asking for one page at a time). When ttm_pool_alloc_page() asks for more than one page, it passes alloc_pages() some extra flags (__GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM). What's expensive is the vm_page_reclaim_contig() in linux_alloc_pages(). The function tries too hard to find contiguous memory (that the drm code doesn't even require) and as physical memory gets too fragmented it becomes very slow. So, very simple fix, make linux_alloc_pages() react to one of the flag passed by the drm code: diff --git a/sys/compat/linuxkpi/common/include/linux/gfp.h b/sys/compat/linuxkpi/common/include/linux/gfp.h index 2fcc0dc05f29..58a021086c98 100644 --- a/sys/compat/linuxkpi/common/include/linux/gfp.h +++ b/sys/compat/linuxkpi/common/include/linux/gfp.h @@ -44,7 +44,6 @@ #define __GFP_NOWARN 0 #define __GFP_HIGHMEM 0 #define __GFP_ZERO M_ZERO -#define __GFP_NORETRY 0 #define __GFP_NOMEMALLOC 0 #define __GFP_RECLAIM 0 #define __GFP_RECLAIMABLE 0 @@ -58,7 +57,8 @@ #define __GFP_KSWAPD_RECLAIM 0 #define __GFP_WAIT M_WAITOK #define __GFP_DMA32 (1U << 24) /* LinuxKPI only */ -#define __GFP_BITS_SHIFT 25 +#define __GFP_NORETRY (1U << 25) /* LinuxKPI only */ +#define __GFP_BITS_SHIFT 26 #define __GFP_BITS_MASK ((1 << __GFP_BITS_SHIFT) - 1) #define __GFP_NOFAIL M_WAITOK diff --git a/sys/compat/linuxkpi/common/src/linux_page.c b/sys/compat/linuxkpi/common/src/linux_page.c index 18b90b5e3d73..71a6890a3795 100644 --- a/sys/compat/linuxkpi/common/src/linux_page.c +++ b/sys/compat/linuxkpi/common/src/linux_page.c @@ -118,7 +118,7 @@ linux_alloc_pages(gfp_t flags, unsigned int order) page = vm_page_alloc_noobj_contig(req, npages, 0, pmax, PAGE_SIZE, 0, VM_MEMATTR_DEFAULT); if (page == NULL) { - if (flags & M_WAITOK) { + if ((flags & (M_WAITOK | __GFP_NORETRY)) == M_WAITOK) { int err = vm_page_reclaim_contig(req, npages, 0, pmax, PAGE_SIZE, 0); if (err == ENOMEM) Been working fine here with amdgpu for about 3 weeks. (The drm modules need to be recompiled with the modified kernel header.)
Intersting find, I've never could reproduce this bug on my RX550, Olivier can you test if the code (which looks ok to me) fixes the issue for you ?
(In reply to sigsys from comment #5) I've been suffering from this issue on a Ryzen 9 4900H with embedded Renoir graphics using drm-61-kmod-6.1.92_2 on stable/14-n268738-048132192698. Playing videos using mpv easily triggered the slowdown after some time (especially 4K videos). I've implemented the suggested fix and now I cannot reproduce the behaviour anymore (tested for 2 days now). Even playing multiple 4K videos in parallel does not cause the problem. Thanks for the fix.
(In reply to sigsys from comment #5) > IIUC the drm code (ttm_pool_alloc()) asking for contiguous pages doesn't actually need contiguous pages. It's just an opportunistic optimization. That would be very good news (at least from the users' point of view). Have not spent time on this issue since my last posts. I had naively thought that the new DRM ports really needed contiguous allocation for whatever reason, and should probably have looked a bit further instead of assuming this would need some deep and highly time consuming analysis. (In reply to Emmanuel Vadot from comment #6) Will test that soon and report.
(In reply to sigsys from comment #5) Waiting for more people to test but in the meantime could you add a git-format patch to this bug please ? (So with full commit message and correct authorship).
(In reply to Emmanuel Vadot from comment #6) The patch also works well for me, no slowdowns to report after 24 hours.
(In reply to sigsys from comment #5) Has this patch landed already? I'm eager to test on my threadripper with Navi 24 [Radeon PRO W6400]; it's borderline useless after ~48h uptime and needs frequent reboots to fix. At least it's better than before I clamped the ARC to 8GB to slow the process down..
Created attachment 255155 [details] PR277476 fix
(In reply to Emmanuel Vadot from comment #9) Alright here it is. Is it already too late to have this merged in 14.2? I'm pretty sure this patch is safe. GFP_NORETRY isn't used in-tree at all right now. And this patch makes it do pretty much what it says. It doesn't retry. You'd hope that any code using this flag would expect allocations to fail... The problem doesn't always happen for everyone but when it does man it's rough. After a week or two I was getting hangs that lasted 15 seconds sometimes. Restarting firefox would fix it for a while but eventually it becomes unusable. Even if this made it in 14.2 IIUC it would take a while before the 14.X packages would be compiled against the new kernel headers, but it would already be useful to have it in base so that you could get the fix by compiling drm-kmod from ports.
FWIW, I definitely ran into what sounds just like this (with several different cards, on both 515 and 61; 510 was always rock solid). After a few days, I'd sometimes get freezes lasting a minute or more. A workaround that seems to work for me has been switching from the amdgpu to the modesetting X driver; I still occasionally see little blips, but they resolve and don't seem to pile up the way they did on amdgpu, even after months of uptime.
(In reply to sigsys from comment #13) It seems manu@ is having a crash with the patch applied on 5.15. So while it seems safe, we have to rule out some possible impacts in certain situations. I'm afraid it is too late to have it merged in 14.2 anyway, so let's be sure we are not regressing anything while fixing the problem.
Never see this issue on "AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx", but always on 5950x + RX 5600 XT. Software config identical, FBSD 14/stable. xorg + amdgpu x driver. To reduce freezes I use: - picom - cpuset to leave 8 cores free while build ports - script that creates 16gb file on tmpfs and remove it Script makes free ~20gb and almost no freezes until freemem > 3gb. But after ~1 month of uptime I start see freezes even with freemem > 20gb. Even without debug tools this was looks like memory fragmentation issue. :) Is it possible to implement some memory defrag code in pagedaemon? vm_page_reclaim_contig() also used by iommu, ktls and shm so improving it will make FBSD better even for server roles. (In reply to Josef 'Jeff' Sipek from comment #0) Thanks for debugging! (In reply to sigsys from comment #5) Thanks for patch, I will test it, but it requires at least 2 weeks to make sure that freezes go away. (In reply to Emmanuel Vadot from comment #6) Do you use xorg + amdgpu x driver?
(In reply to Ivan Rozhuk from comment #16) > Thanks for patch, I will test it, but it requires at least 2 weeks to make sure that freezes go away. Yes, and please report about your experience. I'll do an extra test on my side. Unless something goes wrong, I'd like to move this forward soon, and foremost before we start the release process for 14.3.
(In reply to Olivier Certner from comment #17) > Yes, and please report about your experience. My 2 cents: I have used the patch for months with 6.1 back when it was tip of drm-kmod, then with graphics/drm-66-kmod, and the patch doesn't panic my desktop, and resolves the issue for me. I have never used it with anything less than 6.1.
I've run with the patch (slightly massaged to fit stable/14) with amdgpu and 6.1 for a month without any hint of the freezes showing up, so it certainly feels like a fix here.
Created attachment 258607 [details] dtrace profile Patch did not help, at least in case: xorg + amdgpu xdriver. This how it was landed on 14/stable: https://github.com/rozhuk-im/freebsd/commit/b739c10c50aa37e247dc95f7b93f6fe58d86016d I have attached dtrace profile output that captured while freezes happen. I do not see here vm_phys_alloc_contig() after ttm_pool_alloc(), probably -O2/-O3 opt level "optimize" out it. Here few new things that show increased latency on freezes: (I do not collect many freezes, in some tests only few freezes collected) kernel`lock_delay+0x12 amdgpu.ko`amdgpu_gem_fault+0x86 kernel`linux_cdev_pager_populate+0x128 kernel`vm_fault_allocate+0x185 kernel`vm_fault+0x39c kernel`vm_fault_trap+0x4c kernel`trap_pfault+0x20a kernel`trap+0x4a8 kernel`0xffffffff80a11ca8 20 dtrace -n 'fbt::amdgpu_gem_fault:entry{self->ts=timestamp}' -n 'fbt::amdgpu_gem_fault:return/self->ts/{this->delta=timestamp-self->ts; @=quantize(this->delta);}' -n 'tick-1sec{printa(@)}' 0 66064 :tick-1sec value ------------- Distribution ------------- count 512 | 0 1024 | 1 2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1357 4096 |@@@ 110 8192 |@@@ 103 16384 | 4 32768 | 3 65536 | 0 131072 | 0 262144 | 0 524288 | 0 1048576 | 0 2097152 | 1 4194304 | 3 8388608 | 2 16777216 | 0 33554432 | 0 67108864 | 0 134217728 | 0 268435456 | 1 536870912 | 1 1073741824 | 1 2147483648 | 0 kernel`lock_delay+0x14 kernel`malloc_large+0x2c kernel`lkpi_kmalloc_cb+0x44 kernel`lkpi_kmalloc+0x27 amdgpu.ko`dc_create_state+0x18 amdgpu.ko`amdgpu_dm_atomic_commit_tail+0xd4 drm.ko`commit_tail+0xa7 kernel`linux_work_fn+0xed kernel`taskqueue_run_locked+0x187 kernel`taskqueue_thread_loop+0xc2 kernel`fork_exit+0x86 kernel`0xffffffff80a12d0e 88 dtrace -n 'fbt::dc_create_state:entry{self->ts=timestamp}' -n 'fbt::dc_create_state:return/self->ts/{this->delta=timestamp-self->ts; @=quantize(this->delta);}' -n 'tick-1sec{printa(@)}' 0 66064 :tick-1sec value ------------- Distribution ------------- count 4096 | 0 8192 | 2 16384 |@@@@@@@@@@@@@@@@@@@@@ 1271 32768 |@@@@@@@@@@@@@@@@@@ 1087 65536 | 30 131072 | 1 262144 | 0 524288 | 3 1048576 | 4 2097152 | 2 4194304 | 0 8388608 | 0 16777216 | 0 33554432 | 0 67108864 | 0 134217728 | 1 268435456 | 1 536870912 | 5 1073741824 | 4 2147483648 | 0 kernel`lock_delay+0x14 kernel`free+0x9b amdgpu.ko`amdgpu_dm_atomic_commit_tail+0x2f9a drm.ko`commit_tail+0xa7 kernel`linux_work_fn+0xed kernel`taskqueue_run_locked+0x187 kernel`taskqueue_thread_loop+0xc2 kernel`fork_exit+0x86 kernel`0xffffffff809aaf6e 399 dtrace -n 'fbt::amdgpu_dm_atomic_commit_tail:entry{self->ts=timestamp}' -n 'fbt::amdgpu_dm_atomic_commit_tail:return/self->ts/{this->delta=timestamp-self->ts; @=quantize(this->delta);}' -n 'tick-1sec{printa(@)}' 0 66190 :tick-1sec value ------------- Distribution ------------- count 16384 | 0 32768 | 4 65536 | 6 131072 | 2 262144 | 0 524288 | 6 1048576 | 15 2097152 |@ 29 4194304 |@@@ 106 8388608 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1323 16777216 |@ 44 33554432 | 1 67108864 | 2 134217728 | 4 268435456 | 8 536870912 | 5 1073741824 | 5 2147483648 | 0 kernel`lock_delay+0x14 kernel`zone_import+0xf2 kernel`cache_alloc+0x309 kernel`cache_alloc_retry+0x2c kernel`malloc+0x48 ttm.ko`ttm_sg_tt_init+0x61 amdgpu.ko`amdgpu_ttm_tt_create+0x4a ttm.ko`ttm_tt_create+0x4e ttm.ko`ttm_bo_validate+0x60 ttm.ko`ttm_bo_init_reserved+0x194 amdgpu.ko`amdgpu_bo_create+0x295 amdgpu.ko`amdgpu_bo_create_user+0x21 amdgpu.ko`amdgpu_gem_userptr_ioctl+0x82 drm.ko`drm_ioctl_kernel+0xbc drm.ko`drm_ioctl+0x25e kernel`linux_file_ioctl+0x30f kernel`kern_ioctl+0x1b0 kernel`sys_ioctl+0x117 kernel`amd64_syscall+0xeb kernel`0xffffffff809aa81b 46 dtrace -n 'fbt::amdgpu_ttm_tt_create:entry{self->ts=timestamp}' -n 'fbt::amdgpu_ttm_tt_create:return/self->ts/{this->delta=timestamp-self->ts; @=quantize(this->delta);}' -n 'tick-1sec{printa(@)}' 0 66190 :tick-1sec value ------------- Distribution ------------- count 128 | 0 256 | 4 512 |@@@@@@@@@@ 5764 1024 |@@@@@@@@@@@@ 6635 2048 |@@@@@@@@@ 5087 4096 |@@@@@@@@ 4334 8192 |@@ 875 16384 | 72 32768 | 9 65536 | 3 131072 | 0 (this looks ok) dtrace -n 'fbt::amdgpu_bo_create:entry{self->ts=timestamp}' -n 'fbt::amdgpu_bo_create:return/self->ts/{this->delta=timestamp-self->ts; @=quantize(this->delta);}' -n 'tick-1sec{printa(@)}' 0 66190 :tick-1sec value ------------- Distribution ------------- count 256 | 0 512 | 2 1024 |@@@@@@ 2303 2048 |@@@@@@@@@@ 4190 4096 |@@@@@@@@@@@@ 4800 8192 |@@@@@@@@@@ 4002 16384 |@@ 845 32768 | 124 65536 | 39 131072 | 20 262144 | 4 524288 | 9 1048576 | 2 2097152 | 3 4194304 | 8 8388608 | 5 16777216 | 0 33554432 | 1 67108864 | 2 134217728 | 1 268435456 | 0 536870912 | 2 1073741824 | 0 2147483648 | 1 4294967296 | 0 dtrace -n 'fbt::add_hole:entry{self->ts=timestamp}' -n 'fbt::add_hole:return/self->ts/{this->delta=timestamp-self->ts; @=quantize(this->delta);}' -n 'tick-1sec{printa(@)}' 0 66190 :tick-1sec value ------------- Distribution ------------- count 128 | 0 256 |@@@@@@@@@@@@ 5762 512 |@@@@@@@@@@@@@ 6548 1024 |@@@@@@@ 3287 2048 |@@@@@ 2648 4096 |@@@ 1508 8192 | 105 16384 | 11 32768 | 4 65536 | 1 131072 | 0 (this looks ok) dtrace -n 'fbt::ttm_pool_alloc:entry{self->ts=timestamp}' -n 'fbt::ttm_pool_alloc:return/self->ts/{this->delta=timestamp-self->ts; @=quantize(this->delta);}' -n 'tick-1sec{printa(@)}' 0 66190 :tick-1sec value ------------- Distribution ------------- count 128 | 0 256 |@@ 29 512 |@@@@@@@@ 96 1024 |@@@@@@ 81 2048 |@@@@@ 67 4096 |@@@@@@@@ 106 8192 |@@@ 33 16384 | 5 32768 | 1 65536 |@ 17 131072 |@ 12 262144 | 3 524288 | 6 1048576 |@ 10 2097152 |@ 13 4194304 | 2 8388608 | 2 16777216 | 3 33554432 | 5 67108864 | 3 134217728 | 4 268435456 |@ 7 536870912 | 5 1073741824 | 1 2147483648 | 1 4294967296 | 0 If some one have ideas - I can play more with dtrace and test other patches/settings.
(In reply to Ivan Rozhuk from comment #20) Very important: Once the patch has been applied, you both have to rebuild your kernel *and* the drm-kmod modules with the patched 'gfp.h' header. Given previous analysis, it seems unlikely at this stage that the patch wouldn't fix what you're observing, but let's see.
(In reply to Evgenii Khramtsov from comment #18) (In reply to fullermd from comment #19) I hear you. I have been running stable/14 for months and drm-61-kmod, and it works like a charm here. I have no doubt that this fixes the main problem, the only pending thing was to be sure that the change causes no new crashes, as manu@ hinted at that on -CURRENT. I still have to try with a recent -CURRENT (this is the "extra test" I mentioned above).
(In reply to Olivier Certner from comment #21) Is is done in auto mode by my build scripts. I use PORTS_MODULES+= to make sure that kernel modu;es from ports auto rebuild+install with systems. # ls /boot/kernel ... -r--r--r-- 1 root wheel 29K Mar 12 17:26:39 2025 dtaudit.ko -r--r--r-- 1 root wheel 19K Mar 12 17:26:39 2025 dtmalloc.ko -r--r--r-- 1 root wheel 30K Mar 12 17:26:39 2025 dtnfscl.ko -r--r--r-- 1 root wheel 27K Mar 12 17:26:39 2025 dtrace_test.ko -r--r--r-- 1 root wheel 374K Mar 12 17:26:39 2025 dtrace.ko -r--r--r-- 1 root wheel 16K Mar 12 17:26:39 2025 dtraceall.ko ... -r--r--r-- 1 root wheel 15M Mar 12 17:26:32 2025 kernel -r--r--r-- 1 root wheel 44K Mar 12 17:26:40 2025 kinst.ko -r--r--r-- 1 root wheel 206K Mar 12 17:26:39 2025 krpc.ko -r--r--r-- 1 root wheel 30K Mar 12 17:26:39 2025 ksyms.ko -r--r--r-- 1 root wheel 21K Mar 12 17:26:39 2025 libmchain.ko -r--r--r-- 1 root wheel 59K Mar 12 17:26:39 2025 lindebugfs.ko -rw-r--r-- 1 root wheel 125K Mar 12 17:26:41 2025 linker.hints -r--r--r-- 1 root wheel 165K Mar 12 17:26:39 2025 linux_common.ko -r--r--r-- 1 root wheel 449K Mar 12 17:26:39 2025 linux.ko -r--r--r-- 1 root wheel 414K Mar 12 17:26:39 2025 linux64.ko -r--r--r-- 1 root wheel 46K Mar 12 17:26:39 2025 linuxkpi_hdmi.ko -r--r--r-- 1 root wheel 57K Mar 12 17:26:39 2025 linuxkpi_video.ko -r--r--r-- 1 root wheel 335K Mar 12 17:26:39 2025 linuxkpi.ko ... # ls /boot/modules/ ... -r--r--r-- 1 root wheel 369K Mar 12 17:26:50 2025 amdgpu_raven_vcn_bin.ko -r--r--r-- 1 root wheel 10M Mar 12 17:26:44 2025 amdgpu.ko ... -r--r--r-- 1 root wheel 2.0M Mar 12 17:26:44 2025 radeonkms.ko -r--r--r-- 1 root wheel 100K Mar 12 17:26:44 2025 ttm.ko
(In reply to Olivier Certner from comment #22) > I still have to try with a recent -CURRENT (this is the "extra test" I mentioned above). See bug 282605 to avoid unrelated crashes on main. FWIW, mine "for months" means that I've been running main not older than a week with this patch, during all the mentioned time, e.g. my main now is as of base 717adecbbb52.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=718d1928f8748fe4429c011296f94f194d63c695 commit 718d1928f8748fe4429c011296f94f194d63c695 Author: Mathieu <sigsys@gmail.com> AuthorDate: 2024-11-14 00:24:02 +0000 Commit: Olivier Certner <olce@FreeBSD.org> CommitDate: 2025-03-25 08:41:44 +0000 LinuxKPI: make linux_alloc_pages() honor __GFP_NORETRY This is to fix slowdowns with drm-kmod that get worse over time as physical memory become more fragmented (and probably also depending on other factors). Based on information posted in this bug report: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277476 By default, linux_alloc_pages() retries failed allocations by calling vm_page_reclaim_contig() to attempt to free contiguous physical memory pages. vm_page_reclaim_contig() does not always succeed and calling it can be very slow even when it fails. When physical memory is very fragmented, vm_page_reclaim_contig() can end up being called (and failing) after every allocation attempt. This could cause very noticeable graphical desktop hangs (which could last seconds). The drm-kmod code in question attempts to allocate multiple contiguous pages at once but does not actually require them to be contiguous. It can fallback to doing multiple smaller allocations when larger allocations fail. It passes alloc_pages() the __GFP_NORETRY flag in this case. This patch makes linux_alloc_pages() fail early (without retrying) when this flag is passed. [olce: The problem this patch fixes is longer and longer GUI freezes as a machine's memory gets filled and becomes fragmented, when using amdgpu from DRM kmod 5.15 and DRM kmod 6.1 (DRM kmod 5.10 is unaffected; newer Linux kernel introduced an "optimization" by which a pool of pages is filled preferentially with contiguous pages, which triggered the problem for us). The original commit message above evokes freezes lasting seconds, but I occasionally witnessed some lasting tens of minutes, rendering a machine completely useless. The patch has been reviewed for its potential impacts to other LinuxKPI parts and our existing DRM kmods' code. In particular, there is no other user of __GFP_NORETRY/GFP_NORETRY with Linux's alloc_pages*() functions in our tree or DRM kmod ports. It has also been tested extensively, by me for months against 14-STABLE and sporadically on -CURRENT on a RX580, and by several others as reported below and as is visible in more details in the quoted bugzilla PR and in the initial drm-kmod issue at https://github.com/freebsd/drm-kmod/issues/302, on a variety of other AMD GPUs (several RX580, RX570, Radeon Pro WX5100, Green Sardine 5600G, Ryzen 9 4900H with embedded Renoir).] PR: 277476 Reported by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Reviewed by: olce Tested by: many (olce, Pierre Pronchery, Evgenii Khramtsov, chaplina, rk) MFC after: 2 weeks Relnotes: yes Sponsored by: The FreeBSD Foundation (review and part of testing) sys/compat/linuxkpi/common/include/linux/gfp.h | 4 ++-- sys/compat/linuxkpi/common/src/linux_page.c | 3 ++- 2 files changed, 4 insertions(+), 3 deletions(-)