Bug 277476 - graphics/drm-515-kmod: amdgpu periodic hangs due to phys contig allocations
Summary: graphics/drm-515-kmod: amdgpu periodic hangs due to phys contig allocations
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-x11 (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-03-04 14:12 UTC by Josef 'Jeff' Sipek
Modified: 2024-04-06 14:22 UTC (History)
4 users (show)

See Also:
linimon: maintainer-feedback? (x11)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Josef 'Jeff' Sipek 2024-03-04 14:12:20 UTC
Two weeks ago I replaced an ancient nvidia graphics card with an AMD RX580 card to run open source drivers. Everything works fine most of the time, but occasionally the system hangs for a few seconds (5-10, usually).  The longer the system has been up, the worse it gets.

Digging into it a bit, it is because userspace (it always looks like X) does an ioctl into drm which then tries to allocate a large-ish piece of physically contiguous memory.  This explains why it gets worse as uptime increases (free physical memory fragmentation) and when running firefox (the most memory hungry application I use).  I know nothing about graphics cards, the software stack supporting them, or the linux kernel API compatibility layer, but clearly it'd be beneficial if amdgpu/drm/whatever could make use of *virtually* contiguous pages or some kind of allocation caching/reuse to avoid repeatedly asking the vm code for physically contiguous ranges.

To conclude the above, I did a handful of dtrace-based experiments.

While one of the "temporary hangs" was happening, the following was the most common (non-idle) profiler stack:

# dtrace -n 'profile-97{@[stack()]=count()}'
...
              kernel`vm_phys_alloc_contig+0x11d
              kernel`linux_alloc_pages+0x8f
              ttm.ko`ttm_pool_alloc+0x2cb
              ttm.ko`ttm_tt_populate+0xc5
              ttm.ko`ttm_bo_handle_move_mem+0xc3
              ttm.ko`ttm_bo_validate+0xb4
              ttm.ko`ttm_bo_init_reserved+0x199
              amdgpu.ko`amdgpu_bo_create+0x1eb
              amdgpu.ko`amdgpu_bo_create_user+0x21
              amdgpu.ko`amdgpu_gem_create_ioctl+0x1e2
              drm.ko`drm_ioctl_kernel+0xc6
              drm.ko`drm_ioctl+0x2b5
              kernel`linux_file_ioctl+0x312
              kernel`kern_ioctl+0x255
              kernel`sys_ioctl+0x123
              kernel`amd64_syscall+0x109
              kernel`0xffffffff80fe43eb

The latency of vm_phys_alloc_contig (entry to return) is bimodal - with latencies in the single digit *milli*seconds during the "temporary hangs":

# dtrace -n 'fbt::vm_phys_alloc_contig:entry{self->ts=timestamp}' -n 'fbt::vm_phys_alloc_contig:return/self->ts/{this->delta=timestamp-self->ts; @=quantize(this->delta);}' -n 'tick-1sec{printa(@)}'
...
           value  ------------- Distribution ------------- count    
             256 |                                         0        
             512 |@                                        2606     
            1024 |@@@@@@@@@                                18207    
            2048 |@                                        2534     
            4096 |                                         894      
            8192 |                                         34       
           16384 |                                         78       
           32768 |                                         58       
           65536 |                                         219      
          131072 |                                         306      
          262144 |                                         310      
          524288 |                                         735      
         1048576 |                                         174      
         2097152 |@@                                       4364     
         4194304 |@@@@@@@@@@@@@@@@@@@@@@@@                 47475    
         8388608 |@                                        1546     
        16777216 |                                         2        
        33554432 |                                         0     

The number of pages being allocated:

# dtrace -n 'fbt::vm_phys_alloc_contig:entry/arg1>1/{@=quantize(arg1)}' -n 'tick-1sec{printa(@)}'
...
           value  ------------- Distribution ------------- count    
               1 |                                         0        
               2 |@@@                                      15       
               4 |@                                        7        
               8 |@@@                                      16       
              16 |@@                                       10       
              32 |@@                                       10       
              64 |@                                        7        
             128 |@@@                                      12       
             256 |@@@                                      12       
             512 |@@@@@@@@@@@@@@                           68       
            1024 |@@@@@@@                                  32       
            2048 |                                         0      

I did a few more dtrace experiments, but they all point to the same thing - a drm/amdgpu related ioctl wants 4MB of physically contiguous memory often enough to become a headache.  4MB isn't too much given than the system has 32GB of RAM, but physically contiguous takes a while to fulfill sometimes.


The card:

vgapci0@pci0:1:0:0:     class=0x030000 rev=0xe7 hdr=0x00 vendor=0x1002 device=0x67df subvendor=0x1da2 subdevice=0xe353
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]'
    class      = display
    subclass   = VGA

$ pkg info|grep -i amd             
gpu-firmware-amd-kmod-aldebaran-20230625 Firmware modules for aldebaran AMD GPUs
gpu-firmware-amd-kmod-arcturus-20230625 Firmware modules for arcturus AMD GPUs
gpu-firmware-amd-kmod-banks-20230625 Firmware modules for banks AMD GPUs
gpu-firmware-amd-kmod-beige-goby-20230625 Firmware modules for beige_goby AMD GPUs
gpu-firmware-amd-kmod-bonaire-20230625 Firmware modules for bonaire AMD GPUs
gpu-firmware-amd-kmod-carrizo-20230625 Firmware modules for carrizo AMD GPUs
gpu-firmware-amd-kmod-cyan-skillfish2-20230625 Firmware modules for cyan_skillfish2 AMD GPUs
gpu-firmware-amd-kmod-dimgrey-cavefish-20230625 Firmware modules for dimgrey_cavefish AMD GPUs
gpu-firmware-amd-kmod-fiji-20230625 Firmware modules for fiji AMD GPUs
gpu-firmware-amd-kmod-green-sardine-20230625 Firmware modules for green_sardine AMD GPUs
gpu-firmware-amd-kmod-hainan-20230625 Firmware modules for hainan AMD GPUs
gpu-firmware-amd-kmod-hawaii-20230625 Firmware modules for hawaii AMD GPUs
gpu-firmware-amd-kmod-kabini-20230625 Firmware modules for kabini AMD GPUs
gpu-firmware-amd-kmod-kaveri-20230625 Firmware modules for kaveri AMD GPUs
gpu-firmware-amd-kmod-mullins-20230625 Firmware modules for mullins AMD GPUs
gpu-firmware-amd-kmod-navi10-20230625 Firmware modules for navi10 AMD GPUs
gpu-firmware-amd-kmod-navi12-20230625 Firmware modules for navi12 AMD GPUs
gpu-firmware-amd-kmod-navi14-20230625 Firmware modules for navi14 AMD GPUs
gpu-firmware-amd-kmod-navy-flounder-20230625 Firmware modules for navy_flounder AMD GPUs
gpu-firmware-amd-kmod-oland-20230625 Firmware modules for oland AMD GPUs
gpu-firmware-amd-kmod-picasso-20230625 Firmware modules for picasso AMD GPUs
gpu-firmware-amd-kmod-pitcairn-20230625 Firmware modules for pitcairn AMD GPUs
gpu-firmware-amd-kmod-polaris10-20230625 Firmware modules for polaris10 AMD GPUs
gpu-firmware-amd-kmod-polaris11-20230625 Firmware modules for polaris11 AMD GPUs
gpu-firmware-amd-kmod-polaris12-20230625 Firmware modules for polaris12 AMD GPUs
gpu-firmware-amd-kmod-raven-20230625 Firmware modules for raven AMD GPUs
gpu-firmware-amd-kmod-raven2-20230625 Firmware modules for raven2 AMD GPUs
gpu-firmware-amd-kmod-renoir-20230625 Firmware modules for renoir AMD GPUs
gpu-firmware-amd-kmod-si58-20230625 Firmware modules for si58 AMD GPUs
gpu-firmware-amd-kmod-sienna-cichlid-20230625 Firmware modules for sienna_cichlid AMD GPUs
gpu-firmware-amd-kmod-stoney-20230625 Firmware modules for stoney AMD GPUs
gpu-firmware-amd-kmod-tahiti-20230625 Firmware modules for tahiti AMD GPUs
gpu-firmware-amd-kmod-tonga-20230625 Firmware modules for tonga AMD GPUs
gpu-firmware-amd-kmod-topaz-20230625 Firmware modules for topaz AMD GPUs
gpu-firmware-amd-kmod-vangogh-20230625 Firmware modules for vangogh AMD GPUs
gpu-firmware-amd-kmod-vega10-20230625 Firmware modules for vega10 AMD GPUs
gpu-firmware-amd-kmod-vega12-20230625 Firmware modules for vega12 AMD GPUs
gpu-firmware-amd-kmod-vega20-20230625 Firmware modules for vega20 AMD GPUs
gpu-firmware-amd-kmod-vegam-20230625 Firmware modules for vegam AMD GPUs
gpu-firmware-amd-kmod-verde-20230625 Firmware modules for verde AMD GPUs
gpu-firmware-amd-kmod-yellow-carp-20230625 Firmware modules for yellow_carp AMD GPUs
suitesparse-amd-3.3.0          Symmetric approximate minimum degree
suitesparse-camd-3.3.0         Symmetric approximate minimum degree
suitesparse-ccolamd-3.3.0      Constrained column approximate minimum degree ordering
suitesparse-colamd-3.3.0       Column approximate minimum degree ordering algorithm
webcamd-5.17.1.2_1             Port of Linux USB webcam and DVB drivers into userspace
xf86-video-amdgpu-22.0.0_1     X.Org amdgpu display driver
$ pkg info|grep -i drm
drm-515-kmod-5.15.118_3        DRM drivers modules
drm-kmod-20220907_1            Metaport of DRM modules for the linuxkpi-based KMS components
gpu-firmware-kmod-20230210_1,1 Firmware modules for the drm-kmod drivers
libdrm-2.4.120_1,1             Direct Rendering Manager library and headers
Comment 1 Josef 'Jeff' Sipek 2024-03-04 14:28:43 UTC
While gathering all the dtrace data, I was so distracted I forgot to mention:

$ freebsd-version -kru
14.0-RELEASE-p5
14.0-RELEASE-p5
14.0-RELEASE-p5
Comment 2 Josef 'Jeff' Sipek 2024-04-06 14:22:49 UTC
I dug a bit more into this.  It looks like the drm code has provisions for
allocating memory via dma APIs.  The FreeBSD port doesn't implement those.

Specifically, looking at drm-kmod-drm_v5.15.25_5 source:

drivers/gpu/drm/amd/amdgpu/gmc_v*.c sets adev->need_swiotlb to
drm_need_swiotlb(...).  drm_need_swiotlb is implemented in
drivers/gpu/drm/drm_cache.c as a 'return false' on FreeBSD.

Later on, amdgpu_ttm_init calls ttm_device_init with the use_dma_alloc
argument equal to adev->need_swiotlb (IOW, false).

Much later on, ttm_pool_alloc is called to allocate a buffer.  That in turn
calls ttm_pool_alloc_page which amounts to:

	if (!use_dma_alloc)
		return alloc_pages(...);
	
	panic("ttm_pool.c: use_dma_alloc not implemented");

So, because of the 'return false' during initialization, we always call
alloc_pages (aka. linux_alloc_pages) which tries to allocate physically
contiguous memory.

As I said before, I don't know anything about the graphics stack, so it is
possible that this dma API is completely irrelevant.


Looking at ttm_pool_alloc some more, it immediately turns the physically
contiguous allocation into an array of struct page pointers (tt->pagse).
So, depending on how the rest of the module uses the buffer & pages, it
may be relatively easy to switch to a virtually-contiguous allocation.