With the default configuration of multimedia/libva-intel-driver, the HYBRID option is enabled. On my amd64 Sandy Bridge system this results in periodic system deadlocks. These were on the order of daily events from the time I installed the hybrid driver. None had occurred prior to the the hybrid driver was added and non has occurred since I disabled this option. I might also mention that I was seeing something similar about 8 months ago and reported it in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=226495. It disappeared after updates to mesa and moving to drm-stable-kmod. Here are the event logs of the events of 9-Nov, 10-Nov, and 11-Nov.: Nov 9 13:09:11 rogue kernel: [drm:fw_domain_wait_ack] render: timed out waiting for forcewake ack request. Nov 9 13:09:11 rogue kernel: [drm:__gen6_gt_wait_for_thread_c0] GT thread status wait timed out Nov 9 13:09:13 rogue kernel: [drm:fw_domain_wait_ack] render: timed out waiting for forcewake ack request. Nov 9 13:09:13 rogue kernel: [drm:__gen6_gt_wait_for_thread_c0] GT thread status wait timed out Nov 9 13:09:15 rogue kernel: [drm:fw_domain_wait_ack] render: timed out waiting for forcewake ack request. Nov 9 13:09:15 rogue kernel: [drm:__gen6_gt_wait_for_thread_c0] GT thread status wait timed out Nov 9 13:09:18 rogue kernel: [drm:fw_domain_wait_ack] render: timed out waiting for forcewake ack request. Nov 9 13:09:18 rogue kernel: [drm:__gen6_gt_wait_for_thread_c0] GT thread status wait timed out Nov 9 13:09:18 rogue kernel: [drm:fw_domain_wait_ack] render: timed out waiting for forcewake ack request. Nov 9 13:09:18 rogue kernel: [drm:__gen6_gt_wait_for_thread_c0] GT thread status wait timed out Nov 9 13:09:18 rogue kernel: [drm] GPU HANG: ecode 6:0:0x00000000, in Xorg [1454], reason: Hang on render ring, action: reset Nov 9 13:09:18 rogue kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Nov 9 13:09:18 rogue kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Nov 9 13:09:18 rogue kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Nov 9 13:09:18 rogue kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Nov 9 13:09:18 rogue kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error Nov 9 13:09:18 rogue kernel: [drm:fw_domain_wait_ack] render: timed out waiting for forcewake ack request. Nov 9 13:09:18 rogue kernel: [drm:__gen6_gt_wait_for_thread_c0] GT thread status wait timed out Nov 9 13:09:18 rogue kernel: drm/i915: Resetting chip after gpu hang Nov 9 13:09:18 rogue kernel: [drm:fw_domain_wait_ack] render: timed out waiting for forcewake ack request. Nov 9 13:09:18 rogue kernel: [drm:__gen6_gt_wait_for_thread_c0] GT thread status wait timed out Nov 10 13:25:57 rogue kernel: [drm:fw_domain_wait_ack] render: timed out waiting for forcewake ack request. Nov 10 13:25:57 rogue kernel: [drm:__gen6_gt_wait_for_thread_c0] GT thread status wait timed out Nov 10 13:25:57 rogue kernel: [drm:fw_domain_wait_ack] render: timed out waiting for forcewake ack request. Nov 10 13:25:57 rogue kernel: [drm:__gen6_gt_wait_for_thread_c0] GT thread status wait timed out Nov 10 13:26:00 rogue kernel: [drm:fw_domain_wait_ack] render: timed out waiting for forcewake ack request. Nov 10 13:26:00 rogue kernel: [drm:__gen6_gt_wait_for_thread_c0] GT thread status wait timed out Nov 10 13:26:02 rogue kernel: [drm:fw_domain_wait_ack] render: timed out waiting for forcewake ack request. Nov 10 13:26:02 rogue kernel: [drm:__gen6_gt_wait_for_thread_c0] GT thread status wait timed out Nov 10 13:26:02 rogue kernel: [drm:fw_domain_wait_ack] render: timed out waiting for forcewake ack request. Nov 10 13:26:02 rogue kernel: [drm:__gen6_gt_wait_for_thread_c0] GT thread status wait timed out Nov 10 13:26:02 rogue kernel: [drm:fw_domain_wait_ack] render: timed out waiting for forcewake ack request. Nov 10 13:26:02 rogue kernel: [drm:__gen6_gt_wait_for_thread_c0] GT thread status wait timed out Nov 10 13:26:02 rogue kernel: [drm] GPU HANG: ecode 6:0:0x00000000, in Xorg [1419], reason: Hang on render ring, action: reset Nov 10 13:26:02 rogue kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Nov 10 13:26:02 rogue kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Nov 10 13:26:02 rogue kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Nov 10 13:26:02 rogue kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Nov 10 13:26:02 rogue kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error Nov 10 13:26:02 rogue kernel: [drm:fw_domain_wait_ack] render: timed out waiting for forcewake ack request. Nov 10 13:26:02 rogue kernel: [drm:__gen6_gt_wait_for_thread_c0] GT thread status wait timed out Nov 10 13:26:02 rogue kernel: drm/i915: Resetting chip after gpu hang Nov 10 13:26:02 rogue kernel: [drm:fw_domain_wait_ack] render: timed out waiting for forcewake ack request. Nov 10 13:26:02 rogue kernel: [drm:__gen6_gt_wait_for_thread_c0] GT thread status wait timed out Nov 11 22:46:21 rogue kernel: [drm] GPU HANG: ecode 6:0:0xf4e9fffe, in Xorg [1355], reason: Hang on blitter ring, action: reset Nov 11 22:46:21 rogue kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Nov 11 22:46:21 rogue kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Nov 11 22:46:21 rogue kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Nov 11 22:46:21 rogue kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Nov 11 22:46:21 rogue kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error Nov 11 22:46:21 rogue kernel: drm/i915: Resetting chip after gpu hang
Thanks for the report. I'm adding Jan Beich, who proposed the HYBRID option and enabled it. Unluckily I'm unable to test on a wide range of hardware. It worked on my system and I had no objection to the plan. BTW I'm using the drm-devel-kmod package, and I'm on head. Could you test with drm-next and see if the problem mitigates? After this report I'm prone to disable the option by default. I'd like to get feedback about this from Jan though. Jan can you express your ideas about this?
(In reply to Guido Falsi from comment #1) I already dropped a note to Jan. I have been his Sandy Bridge tester and tested this for functionality, but didn't see the first hang for a bit because I was busy moving and my new home's network was not working. There is an old issue with GPU hangs on Sandy Bridge with power management enabled that were supposed to have been worked around. (Not fixed as it was a hardware problem.) I certainly stopped seeing it. Then the issue early this year that looked just like what I've been seeing now with hybrid mode. It went away with an updated mesa and move to drm-stable-kmod. I suspect that the support of the hybrid driver again exposed the problem. After all, hybrid mode does not actually work on Sandy Bridge, so I suspect that there was no thought to dealing with the problem. This is really conjecture, though. I'll try to install drm-devel-kmod a bit later and see what happens, though it will be 3 or 4 days before I will feel confident of success. Failure my take far less time.
I'm OK with HYBRID disabled. It can be turned into flavor instead e.g., libva-intel-driver@hybrid. Giving up on drm-stable-kmod isn't a good idea (even if graphics team plans it) as later versions aren't stable on xf86-video-intel at least with SNA enabled. Making media_driver_data_init() return false on Skylake doesn't lead to GPU hangs. Prior to that hybrid driver does some initialization (e.g., intel_bufmgr_gem_*) which probably exacerbates SandyBridge stability on drm-stable-kmod.
in the while I'm going to disable the HYBRID option by default, since the risk of causing lockups to further unsuspecting users is too big. I'll look at adding it as a flavor in the next few days.
A commit references this bug: Author: madpilot Date: Sat Nov 17 15:33:34 UTC 2018 New revision: 485138 URL: https://svnweb.freebsd.org/changeset/ports/485138 Log: Disable HYBRID option by default due to lockups being reported on Sandy Bridge CPUs. PR: 233259 Submitted by: rkoberman@gmail.com Changes: head/multimedia/libva-intel-driver/Makefile
Kevin, do hangs from HYBRID still occur after ports r487275 or ports r487274?
(In reply to Jan Beich from comment #6) Now running 12.0-stable and the latest drm-fbsd12.0-kmod (g20181215). I have not been running with HYBRID, but will build with it tonight and see how it goes. Since the hangs were infrequent, it will tak a bit of time before I can report, at least a day or two. a day or two.
Passing this PR to new port maintainer.
(In reply to rkoberman from comment #7) After two weeks of HYBRID on my Sandy Bridge system 12-STABLE and mesa 19.3.1 I have had no significant issues. I have seen occasional blocks of garbage pop up. They always are the same height, probably 128 pixels, and highly variable width. The blocks appear to contain random noise. Redrawing the window with minimizing or window shading makes the blocks vanish and they only appear very rarely. While I have only seen them since installing with the HYBRID option, they have been so rare that I am not at all sure that libva-intel-driver is the cause. The prior lockup issues have not recurred.
(In reply to rkoberman from comment #9) > I have seen occasional blocks of garbage pop up. On modesetting(4x)? Can you try UXA on xf86-video-intel? I see rendering glitches with modesetting(4x) on Skylake myself: stutter on GL init and switching workspaces, black screen flickering on VAAPI init. Otherwise, thanks for testing. HYBRID can probably re-enabled after adding a warning into UPDATING, so users (on FreeBSD 11.*) can report if HYBRID=off helps in case of stability issues. Unfortunately, we don't have telemetry in order to reduce guessing.
If you're tired of testing, a composite manager may help to clean up rendering glitches. Try installing x11-wm/compton and maybe check my config[1]. [1] https://github.com/FreeBSDDesktop/kms-drm/issues/32 "vsync" is disabled because it rarely helps but incurs performance cost
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=0fe260b63e82ede58d8e551f724f09d7fb63240e commit 0fe260b63e82ede58d8e551f724f09d7fb63240e Author: Jan Beich <jbeich@FreeBSD.org> AuthorDate: 2021-08-18 21:29:20 +0000 Commit: Jan Beich <jbeich@FreeBSD.org> CommitDate: 2021-08-18 21:46:12 +0000 multimedia/libva-intel-driver: enable HYBRID by default (again) Originally disabled due to GPU freezes on Sandy Bridge which disappeared after upgrading FreeBSD from 11 to 12. drm-*-kmod also had several updates since then. So, let's re-try. PR: 233259 multimedia/libva-intel-driver/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)