Bug 237271 - Radeon video card no longer works on 12-STABLE (after r345105) / CURRENT (after r345932)
Summary: Radeon video card no longer works on 12-STABLE (after r345105) / CURRENT (aft...
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Mark Linimon
URL:
Keywords: needs-patch, regression
: 237286 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-04-14 03:00 UTC by sigsys
Modified: 2020-06-23 01:57 UTC (History)
6 users (show)

See Also:
koobs: mfc-stable12?
koobs: mfc-stable11?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description sigsys 2019-04-14 03:00:54 UTC
My Radeon RX 560 stopped working after a 12-STABLE update.  It coincides with some DRM-related MFCs.  I nailed it down to r345932.  On r345929 (previous revision affecting 12-STABLE), the card still works.  But starting from r345932, loading the amdgpu.ko module prints some DRM console messages and then the display freezes.  The system is still responsive afterward but it sometimes panics with a page fault not long after.  That's with graphics/drm-fbsd12.0-kmod (rebuilt after each update while testing).

Here are logs of the kernel messages when the card does not work:

	kernel: [48] <6>[drm] amdgpu kernel modesetting enabled.
	kernel: [48] drmn0: <drmn> on vgapci0
	kernel: [48] vgapci0: child drmn0 requested pci_enable_io
	syslogd: last message repeated 1 times
	kernel: [48] <6>[drm] initializing kernel modesetting (POLARIS11 0x1002:0x67FF 0x1462:0x8A91 0xCF).
	kernel: [48] <6>[drm] register mmio base: 0xFE900000
	kernel: [48] <6>[drm] register mmio size: 262144
	kernel: [48] <6>[drm] PCI I/O BAR is not found.
	kernel: [48] <6>[drm] probing gen 2 caps for device 1022:1453 = 737903/e
	kernel: [48] <6>[drm] probing mlw for device 1002:67ff = 400883
	kernel: [48] <6>[drm] UVD is enabled in VM mode
	kernel: [48] <6>[drm] UVD ENC is enabled in VM mode
	kernel: [48] <6>[drm] VCE enabled in VM mode
	kernel: [48] ATOM BIOS: 113-C994LP-S01
	kernel: [48] <6>[drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
	kernel: [49] drmn0: successfully loaded firmware image with name: amdgpu/polaris11_mc.bin
	kernel: [49] drmn0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
	kernel: [49] drmn0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
	kernel: [49] Successfully added WC MTRR for [0xe0000000-0xefffffff]: 0; 
	kernel: [49] <6>[drm] Detected VRAM RAM=4096M, BAR=256M
	kernel: [49] <6>[drm] RAM width 128bits GDDR5
	kernel: [49] [drm:amdgpu_ttm_global_init] Failed setting up TTM memory accounting subsystem.
	kernel: [49] [drm:amdgpu_device_ip_init] sw_init of IP block <gmc_v8_0> failed -12
	kernel: [49] drmn0: amdgpu_device_ip_init failed
	kernel: [49] drmn0: Fatal error during GPU init
	kernel: [49] <6>[drm] amdgpu: finishing device.
	kernel: [49] vgapci0: child drmn0 requested pci_disable_io
	syslogd: last message repeated 1 times
	kernel: [49] device_attach: drmn0 attach returned 12
Comment 1 Kubilay Kocak freebsd_committer freebsd_triage 2019-04-14 03:05:23 UTC
CC base r345932 (MFC of base r345105) committer and drm-kmod maintainer
Comment 2 Kubilay Kocak freebsd_committer freebsd_triage 2019-04-15 05:21:19 UTC
*** Bug 237286 has been marked as a duplicate of this bug. ***
Comment 3 Kubilay Kocak freebsd_committer freebsd_triage 2019-04-15 05:25:09 UTC
Original commit was additionally MFC'd to stable/11 in base r345931, so any fix needs merging to both stable/12 and stable/11
Comment 4 Johannes Lundberg freebsd_committer freebsd_triage 2019-04-24 16:09:04 UTC
Hi

The port needs to be updated. Latest version in git repo should be working:

https://github.com/FreeBSDDesktop/kms-drm/tree/drm-v4.16-fbsd12.0
Comment 5 Ivan Rozhuk 2019-04-24 16:51:48 UTC
(In reply to Johannes Lundberg from comment #4)

I think should not.

4.16.g20190305 = git commit 4192575 = "lkpi: Allow recursive calls to i2c_transfer"

I see no commit after this that affect driver initialization or sysctl() code.


PS: https://github.com/FreeBSDDesktop/kms-drm/commit/b5ef47b82bbcd127f82b60f40b3efd3f065cf756
probably this should be fixed in linuxkpi code, some where in sysctl_handle_attr() / sysctl_root_handler_locked(), zero buffer on error, this will prevent many bugs in other places.
Comment 7 Johannes Lundberg freebsd_committer freebsd_triage 2019-04-24 17:15:25 UTC
It has been fixed in linuxkpi which is why it fails. Before it was silently failing. Drivers needs to be updated to do things correctly.
Comment 8 Ivan Rozhuk 2019-04-24 19:40:32 UTC
(In reply to Johannes Lundberg from comment #7)

Ok, I will test in next few days.
I was thinking that main goal of linuxkpi (like webcamd) is minimizing patch size for linux drivers.
Comment 9 sigsys 2019-04-25 17:40:01 UTC
(In reply to Johannes Lundberg from comment #4)
OK. With yesterday's update to drm-fbsd12.0-kmod, everything seems to work fine in my case now.  I don't have to remove r345932 anymore.

Thanks to everyone working on this stuff BTW. This seems like pretty hard work.

Sure would be nice if -STABLE didn't get "ahead" of the kmod port in an incompatible way like this though.  But I can imagine that it would be even more work to keep them synchronized.
Comment 10 Johannes Lundberg freebsd_committer freebsd_triage 2019-04-25 17:49:26 UTC
(In reply to sigsys from comment #9)

We try to keep them in sync but sometimes we slip... It's a lot going on, especially now with the big 5.0 update coming.
Comment 11 Ivan Rozhuk 2019-04-25 21:46:23 UTC
(In reply to Johannes Lundberg from comment #6)

link_elf_obj: symbol agp_bind_pages undefined
linker_load_file: /boot/modules/ttm.ko - unsupported file type
KLD amdgpu.ko: depends on ttm - not available or version mismatch
linker_load_file: /boot/modules/amdgpu.ko - unsupported file type
Comment 12 Johannes Lundberg freebsd_committer freebsd_triage 2019-04-25 22:09:48 UTC
(In reply to rozhuk.im from comment #11)

What port/package is this? What freebsd version?
Comment 13 Ivan Rozhuk 2019-04-25 22:40:57 UTC
(In reply to Johannes Lundberg from comment #12)

Fresh 12.0 (svn up before build), drm-fbsd12.0-kmod-4.16.g20190424
Configs:
http://www.netlab.linkpc.net/download/software/os_cfg/FBSD/12.0/wks/
+
http://www.netlab.linkpc.net/download/software/os_cfg/FBSD/12.0/base/

I try to load agp module, but this not help.
Comment 14 Johannes Lundberg freebsd_committer freebsd_triage 2019-04-25 22:50:38 UTC
(In reply to rozhuk.im from comment #13)

I think the initial commit was missing the ttm.ko file in plist but it should be fixed now. Either build from ports or wait until binary packages are updated.
Comment 15 Ivan Rozhuk 2019-04-25 22:58:46 UTC
(In reply to Johannes Lundberg from comment #14)

It builded and installed, it can not be load because:
link_elf_obj: symbol agp_bind_pages undefined
Comment 16 Johannes Lundberg freebsd_committer freebsd_triage 2019-04-25 23:05:10 UTC
(In reply to rozhuk.im from comment #15)

Do you have /boot/modules/ttm.ko? You should have with latest version from github or ports (but maybe not binary packages yet).
Comment 17 Ivan Rozhuk 2019-04-25 23:15:41 UTC
(In reply to Johannes Lundberg from comment #16)

Yes, I have this module.
It build and install every time then kernel build.

I do not use generic kernel and have many tunings for build and kernel config.
Workaround for me: add to kernel config
device		agp			# support several AGP chipsets

I have no idea why ttm.ko fail to load even then I load agp.ko.
Comment 18 Johannes Lundberg freebsd_committer freebsd_triage 2019-04-25 23:25:43 UTC
(In reply to rozhuk.im from comment #17)

Oh I got it. You're building agp as module. agp needs to be a dependency of ttm. I will fix this in next update. Thanks for finding this!
Comment 19 Ivan Rozhuk 2019-04-25 23:34:29 UTC
(In reply to Johannes Lundberg from comment #18)
# grep -RF -e "agp_bind_pages" /usr/src/sys
/usr/src/sys/dev/agp/agp.c:agp_bind_pages(device_t dev, vm_page_t *pages, vm_size_t size,
/usr/src/sys/dev/agp/agp.c:		    ("agp_bind_pages: page %p hasn't been wired", m));
/usr/src/sys/dev/agp/agpvar.h:int agp_bind_pages(device_t dev, vm_page_t *pages, vm_size_t size,
/usr/src/sys/dev/drm2/ttm/ttm_agp_backend.c:	ret = -agp_bind_pages(agp_be->bridge, agp_be->pages,
/usr/src/sys/dev/drm2/drmP.h:extern DRM_AGP_MEM *drm_agp_bind_pages(struct drm_device *dev,
/usr/src/sys/dev/drm2/drmP.h:static inline struct agp_memory *drm_agp_bind_pages(struct drm_device *dev,
/usr/src/sys/dev/drm2/drm_agpsupport.c:drm_agp_bind_pages(struct drm_device *dev,
/usr/src/sys/dev/drm2/drm_agpsupport.c:EXPORT_SYMBOL(drm_agp_bind_pages);


IMHO there is need to be something like in dev/agp/agp* files:
...
extern agp_bind_pages(device_t dev, vm_page_t *pages,...
...
EXPORT_SYMBOL(agp_bind_pages);
Comment 20 Mark Linimon freebsd_committer freebsd_triage 2020-06-23 00:08:42 UTC
This PR is over a year old now.  Has the problem in it been fixed?
Comment 21 Ivan Rozhuk 2020-06-23 00:14:20 UTC
yes