My Radeon RX 560 stopped working after a 12-STABLE update. It coincides with some DRM-related MFCs. I nailed it down to r345932. On r345929 (previous revision affecting 12-STABLE), the card still works. But starting from r345932, loading the amdgpu.ko module prints some DRM console messages and then the display freezes. The system is still responsive afterward but it sometimes panics with a page fault not long after. That's with graphics/drm-fbsd12.0-kmod (rebuilt after each update while testing). Here are logs of the kernel messages when the card does not work: kernel: [48] <6>[drm] amdgpu kernel modesetting enabled. kernel: [48] drmn0: <drmn> on vgapci0 kernel: [48] vgapci0: child drmn0 requested pci_enable_io syslogd: last message repeated 1 times kernel: [48] <6>[drm] initializing kernel modesetting (POLARIS11 0x1002:0x67FF 0x1462:0x8A91 0xCF). kernel: [48] <6>[drm] register mmio base: 0xFE900000 kernel: [48] <6>[drm] register mmio size: 262144 kernel: [48] <6>[drm] PCI I/O BAR is not found. kernel: [48] <6>[drm] probing gen 2 caps for device 1022:1453 = 737903/e kernel: [48] <6>[drm] probing mlw for device 1002:67ff = 400883 kernel: [48] <6>[drm] UVD is enabled in VM mode kernel: [48] <6>[drm] UVD ENC is enabled in VM mode kernel: [48] <6>[drm] VCE enabled in VM mode kernel: [48] ATOM BIOS: 113-C994LP-S01 kernel: [48] <6>[drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit kernel: [49] drmn0: successfully loaded firmware image with name: amdgpu/polaris11_mc.bin kernel: [49] drmn0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used) kernel: [49] drmn0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF kernel: [49] Successfully added WC MTRR for [0xe0000000-0xefffffff]: 0; kernel: [49] <6>[drm] Detected VRAM RAM=4096M, BAR=256M kernel: [49] <6>[drm] RAM width 128bits GDDR5 kernel: [49] [drm:amdgpu_ttm_global_init] Failed setting up TTM memory accounting subsystem. kernel: [49] [drm:amdgpu_device_ip_init] sw_init of IP block <gmc_v8_0> failed -12 kernel: [49] drmn0: amdgpu_device_ip_init failed kernel: [49] drmn0: Fatal error during GPU init kernel: [49] <6>[drm] amdgpu: finishing device. kernel: [49] vgapci0: child drmn0 requested pci_disable_io syslogd: last message repeated 1 times kernel: [49] device_attach: drmn0 attach returned 12
CC base r345932 (MFC of base r345105) committer and drm-kmod maintainer
*** Bug 237286 has been marked as a duplicate of this bug. ***
Original commit was additionally MFC'd to stable/11 in base r345931, so any fix needs merging to both stable/12 and stable/11
Hi The port needs to be updated. Latest version in git repo should be working: https://github.com/FreeBSDDesktop/kms-drm/tree/drm-v4.16-fbsd12.0
(In reply to Johannes Lundberg from comment #4) I think should not. 4.16.g20190305 = git commit 4192575 = "lkpi: Allow recursive calls to i2c_transfer" I see no commit after this that affect driver initialization or sysctl() code. PS: https://github.com/FreeBSDDesktop/kms-drm/commit/b5ef47b82bbcd127f82b60f40b3efd3f065cf756 probably this should be fixed in linuxkpi code, some where in sysctl_handle_attr() / sysctl_root_handler_locked(), zero buffer on error, this will prevent many bugs in other places.
https://github.com/FreeBSDDesktop/kms-drm/commit/70ef5ae8f30f8734bd8c85b41d30c23ef956d78b
It has been fixed in linuxkpi which is why it fails. Before it was silently failing. Drivers needs to be updated to do things correctly.
(In reply to Johannes Lundberg from comment #7) Ok, I will test in next few days. I was thinking that main goal of linuxkpi (like webcamd) is minimizing patch size for linux drivers.
(In reply to Johannes Lundberg from comment #4) OK. With yesterday's update to drm-fbsd12.0-kmod, everything seems to work fine in my case now. I don't have to remove r345932 anymore. Thanks to everyone working on this stuff BTW. This seems like pretty hard work. Sure would be nice if -STABLE didn't get "ahead" of the kmod port in an incompatible way like this though. But I can imagine that it would be even more work to keep them synchronized.
(In reply to sigsys from comment #9) We try to keep them in sync but sometimes we slip... It's a lot going on, especially now with the big 5.0 update coming.
(In reply to Johannes Lundberg from comment #6) link_elf_obj: symbol agp_bind_pages undefined linker_load_file: /boot/modules/ttm.ko - unsupported file type KLD amdgpu.ko: depends on ttm - not available or version mismatch linker_load_file: /boot/modules/amdgpu.ko - unsupported file type
(In reply to rozhuk.im from comment #11) What port/package is this? What freebsd version?
(In reply to Johannes Lundberg from comment #12) Fresh 12.0 (svn up before build), drm-fbsd12.0-kmod-4.16.g20190424 Configs: http://www.netlab.linkpc.net/download/software/os_cfg/FBSD/12.0/wks/ + http://www.netlab.linkpc.net/download/software/os_cfg/FBSD/12.0/base/ I try to load agp module, but this not help.
(In reply to rozhuk.im from comment #13) I think the initial commit was missing the ttm.ko file in plist but it should be fixed now. Either build from ports or wait until binary packages are updated.
(In reply to Johannes Lundberg from comment #14) It builded and installed, it can not be load because: link_elf_obj: symbol agp_bind_pages undefined
(In reply to rozhuk.im from comment #15) Do you have /boot/modules/ttm.ko? You should have with latest version from github or ports (but maybe not binary packages yet).
(In reply to Johannes Lundberg from comment #16) Yes, I have this module. It build and install every time then kernel build. I do not use generic kernel and have many tunings for build and kernel config. Workaround for me: add to kernel config device agp # support several AGP chipsets I have no idea why ttm.ko fail to load even then I load agp.ko.
(In reply to rozhuk.im from comment #17) Oh I got it. You're building agp as module. agp needs to be a dependency of ttm. I will fix this in next update. Thanks for finding this!
(In reply to Johannes Lundberg from comment #18) # grep -RF -e "agp_bind_pages" /usr/src/sys /usr/src/sys/dev/agp/agp.c:agp_bind_pages(device_t dev, vm_page_t *pages, vm_size_t size, /usr/src/sys/dev/agp/agp.c: ("agp_bind_pages: page %p hasn't been wired", m)); /usr/src/sys/dev/agp/agpvar.h:int agp_bind_pages(device_t dev, vm_page_t *pages, vm_size_t size, /usr/src/sys/dev/drm2/ttm/ttm_agp_backend.c: ret = -agp_bind_pages(agp_be->bridge, agp_be->pages, /usr/src/sys/dev/drm2/drmP.h:extern DRM_AGP_MEM *drm_agp_bind_pages(struct drm_device *dev, /usr/src/sys/dev/drm2/drmP.h:static inline struct agp_memory *drm_agp_bind_pages(struct drm_device *dev, /usr/src/sys/dev/drm2/drm_agpsupport.c:drm_agp_bind_pages(struct drm_device *dev, /usr/src/sys/dev/drm2/drm_agpsupport.c:EXPORT_SYMBOL(drm_agp_bind_pages); IMHO there is need to be something like in dev/agp/agp* files: ... extern agp_bind_pages(device_t dev, vm_page_t *pages,... ... EXPORT_SYMBOL(agp_bind_pages);
This PR is over a year old now. Has the problem in it been fixed?
yes