Bug 266315

Summary: linuxkpi panic after recent updates (13.1-STABLE #0 stable/13-9cbba5950: Wed Sep 7 23:42:41 CEST 2022)
Product: Base System Reporter: jakub_lach
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Open ---    
Severity: Affects Only Me CC: dufresnep, grahamperrin, lhersch, manu, moonlapse81, x11
Priority: --- Keywords: crash, needs-qa
Version: 13.1-STABLE   
Hardware: amd64   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=259670

Description jakub_lach 2022-09-09 09:09:46 UTC
czw.  8 wrz 2022 09:32:47 CEST
drmn0: [drm] GPU HANG: ecode 4:1:ca8ffffb, in MainThread [100645]
drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
drmn0: [drm] Xorg[100645] context reset due to GPU hang
drmn0: [drm] GPU HANG: ecode 4:1:ca8ffffb, in MainThread [100645]


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x61
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff8072f018
stack pointer           = 0x28:0xfffffe00baa8db60
frame pointer           = 0x28:0xfffffe00baa8dba0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (linuxkpi_short_wq_3)
trap number             = 12
panic: page fault
cpuid = 1
time = 1662695425
Uptime: 20h17m47s
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...
cpu_reset: Restarting BSP
cpu_reset_proxy: Stopped CPU 1
---<<BOOT>>---

Recent updates - 

drm-510-kmod-5.10.113_4     (recent switch to 510 on STABLE)
libdrm-2.4.113,1
Comment 1 jakub_lach 2022-09-09 09:15:59 UTC
agp0: <Intel GM45 SVGA controller> on vgapci0

<...>

vgapci0@pci0:0:2:0:     class=0x030000 rev=0x07 hdr=0x00 vendor=0x8086 device=0x2a42 subvendor=0x17aa subdevice=0x20e4
    vendor     = 'Intel Corporation'
    device     = 'Mobile 4 Series Chipset Integrated Graphics Controller'
    class      = display
    subclass   = VGA
vgapci1@pci0:0:2:1:     class=0x038000 rev=0x07 hdr=0x00 vendor=0x8086 device=0x2a43 subvendor=0x17aa subdevice=0x20e4
    vendor     = 'Intel Corporation'
    device     = 'Mobile 4 Series Chipset Integrated Graphics Controller'
    class      = display

<...>

    Vendor: Intel Open Source Technology Center (0x8086)
    Device: Mesa DRI Mobile Intel(R) GM45 Express Chipset (CTG) (0x2a42)
    Version: 21.3.8
    Accelerated: yes
    Video memory: 1536MB
    Unified memory: yes
    Preferred profile: compat (0x2)
    Max core profile version: 0.0
    Max compat profile version: 2.1
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 2.0

<...>
                                                                                                                     
mesa-demos-8.4.0_3             OpenGL demos distributed with Mesa
mesa-dri-21.3.8                OpenGL hardware acceleration drivers for DRI2+
mesa-libs-21.3.8               OpenGL libraries that support GLX and EGL clients

<...>

libva-intel-driver-2.4.1_1     VAAPI legacy driver for Intel GMA 4500 (Gen4) to UHD 630 (Gen9.5)
xf86-video-intel-2.99.917.916_2,1 X.Org legacy driver for Intel integrated graphics chipsets
Comment 2 jakub_lach 2022-09-09 09:16:43 UTC
(In reply to jakub_lach from comment #1)

$ kldstat                                                                                                                                            
Id Refs Address                Size Name
 1   49 0xffffffff80200000   d56500 kernel
 2    1 0xffffffff80f57000    489e8 snd_hda.ko
 3    1 0xffffffff80fa0000    74650 if_em.ko
 4    1 0xffffffff81920000     3530 fdescfs.ko
 5    1 0xffffffff81924000   17f8b8 i915kms.ko
 6    1 0xffffffff81aa4000    72bd8 drm.ko
 7    4 0xffffffff81b17000    1a170 linuxkpi.ko
 8    2 0xffffffff81b32000     2220 backlight.ko
 9    2 0xffffffff81b35000     30fc linuxkpi_gplv2.ko
10    3 0xffffffff81b39000     62d8 dmabuf.ko
Comment 3 jakub_lach 2022-09-10 16:52:40 UTC
Currently at 13.1-STABLE #0 stable/13-dc96fb072: Fri Sep  9 09:17:51 CEST 

Something I never saw before -

Sep 10 16:15:36 Thinkpad kernel: MCA: Bank 3, Status 0x9020004b0001010a
Sep 10 16:15:36 Thinkpad kernel: MCA: Global Cap 0x0000000000000806, Status 0x0000000000000000
Sep 10 16:15:36 Thinkpad kernel: MCA: Vendor "GenuineIntel", ID 0x1067a, APIC ID 0
Sep 10 16:15:36 Thinkpad kernel: MCA: CPU 0 COR (Green) (Yellow) EN GCACHE L2 ERR error
Comment 4 jakub_lach 2022-09-14 08:23:13 UTC
(In reply to jakub_lach from comment #0)
Another one -

FreeBSD 13.1-STABLE #0 stable/13-3f4e44f38 amd 64

<...>
                                           
mesa-demos-8.4.0_3             OpenGL demos distributed with Mesa
mesa-dri-21.3.8                OpenGL hardware acceleration drivers for DRI2+
mesa-libs-21.3.8               OpenGL libraries that support GLX and EGL clients

<...>
                                           
drm-510-kmod-5.10.113_6        DRM drivers modules

<...>
                                         
drm-510-kmod-5.10.113_6        DRM drivers modules
libdrm-2.4.113,1               Userspace interface to kernel Direct Rendering Module services

<...>

libva-intel-driver-2.4.1_1     VAAPI legacy driver for Intel GMA 4500 (Gen4) to UHD 630 (Gen9.5)
xf86-video-intel-2.99.917.916_2,1 X.Org legacy driver for Intel integrated graphics chipsets

drmn0: [drm] GPU HANG: ecode 4:1:ca8ffffb, in MainThread [100642]
drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
drmn0: [drm] Xorg[100642] context reset due to GPU hang
drmn0: [drm] GPU HANG: ecode 4:1:cecffffb, in MainThread [100642]


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x61
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff8072e388
stack pointer           = 0x28:0xfffffe00baa97b60
frame pointer           = 0x28:0xfffffe00baa97ba0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (linuxkpi_short_wq_1)
trap number             = 12
panic: page fault
cpuid = 0
time = 1663143125
Uptime: 22h24m1s

The hangs happened before drm-510-kmod _but_ they did not lead to panics.
Comment 5 jakub_lach 2022-09-17 07:18:09 UTC
(In reply to jakub_lach from comment #4)

FWIW, saving pdfs as files in in Firefox (which triggers dialog box) causes hangs and panics described above in about half cases. Previously, it only lead to hangs (as mentioned).
Comment 6 jakub_lach 2022-09-20 22:26:23 UTC
Currently on 13.1-STABLE #0 stable/13-b63021e00, after unsuccessfully trying to replicate with GENERIC/debug on, it looks to me that the problem doesn't exist if there is options COMPAT_LINUXKPI in kernel (linuxkpi module is not loaded).
Comment 7 jakub_lach 2022-12-31 13:29:30 UTC
(In reply to jakub_lach from comment #6)

Currently FreeBSD 13.1-STABLE #0 stable/13-1815de4fe, 

$ pkg info | grep 'kmod'                                                
drm-510-kmod-5.10.113_8        DRM drivers modules

$ pkg info | grep 'mesa'                                                
mesa-demos-8.4.0_3             OpenGL demos distributed with Mesa
mesa-dri-22.3.1_1              OpenGL hardware acceleration drivers for DRI2+
mesa-libs-22.3.1_1             OpenGL libraries that support GLX and EGL clients

$ pkg info | grep 'intel'                                               
devcpu-data-intel-20221108     Intel CPU microcode updates
libva-intel-driver-2.4.1_1     VAAPI legacy driver for Intel GMA 4500 (Gen4) to UHD 630 (Gen9.5)
xf86-video-intel-2.99.917.916_2,1 X.Org legacy driver for Intel integrated graphics chipsets

$ pkg info | grep 'drm'                                                 
drm-510-kmod-5.10.113_8        DRM drivers modules
libdrm-2.4.114,1               Userspace interface to kernel Direct Rendering Module services

After I compiled in-kernel linuxkpi, I get hard restart/reset when opening dialog boxes in Firefox (sporadically), no trace of restart.
Comment 8 jakub_lach 2022-12-31 13:53:09 UTC
(In reply to jakub_lach from comment #7)

Currently testing if - 

#options         COMPAT_LINUXKPI

and adding 

linuxkpi and linuxkpi_wlan as modules (the latter just in case - I shouldn't need it) will change the output.
Comment 9 jakub_lach 2023-01-03 09:44:44 UTC
I've got another hang with (modules out of kernel) - 

FreeBSD 13.1-STABLE #0 stable/13-03c82ccba amd64

$ kldstat                                                           
Id Refs Address                Size Name
 1   49 0xffffffff80200000   d56f90 kernel
 2    1 0xffffffff81596000    74560 if_em.ko
 3    1 0xffffffff8160c000    48b20 snd_hda.ko
 4    1 0xffffffff81920000     3530 fdescfs.ko
 5    1 0xffffffff81924000   17f8b8 i915kms.ko
 6    1 0xffffffff81aa4000    72bd8 drm.ko
 7    4 0xffffffff81b17000    1d230 linuxkpi.ko
 8    2 0xffffffff81b35000     2220 backlight.ko
 9    2 0xffffffff81b38000     30fc linuxkpi_gplv2.ko
10    3 0xffffffff81b3c000     62d8 dmabuf.ko

$ pkg info | grep 'kmod'                                            
drm-510-kmod-5.10.113_8        DRM drivers modules

$ pkg info | grep 'mesa' 
mesa-demos-8.4.0_3             OpenGL demos distributed with Mesa
mesa-dri-22.3.2_1              OpenGL hardware acceleration drivers for DRI2+
mesa-libs-22.3.2_1             OpenGL libraries that support GLX and EGL clients

$ pkg info | grep 'intel' 
devcpu-data-intel-20221108     Intel CPU microcode updates
libva-intel-driver-2.4.1_1     VAAPI legacy driver for Intel GMA 4500 (Gen4) to UHD 630 (Gen9.5)
xf86-video-intel-2.99.917.916_2,1 X.Org legacy driver for Intel integrated graphics chipsets

$ pkg info | grep 'drm' 
drm-510-kmod-5.10.113_8        DRM drivers modules
libdrm-2.4.114,1               Userspace interface to kernel Direct Rendering Module services

Jan  3 10:38:05 Thinkpad syslogd: kernel boot file is /boot/kernel/kernel
Jan  3 10:38:05 Thinkpad kernel: drmn0: [drm] GPU HANG: ecode 4:1:ce8ffffb, in MainThread [100091]
Jan  3 10:38:05 Thinkpad kernel: drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
Jan  3 10:38:05 Thinkpad kernel: drmn0: [drm] Xorg[100091] context reset due to GPU hang
Jan  3 10:38:05 Thinkpad kernel: drmn0: [drm] GPU HANG: ecode 4:1:ce8ffffb, in MainThread [100091]
Jan  3 10:38:05 Thinkpad kernel:
Jan  3 10:38:05 Thinkpad syslogd: last message repeated 1 times
Jan  3 10:38:05 Thinkpad kernel: Fatal trap 12: page fault while in kernel mode
Jan  3 10:38:05 Thinkpad kernel: cpuid = 0; apic id = 00
Jan  3 10:38:05 Thinkpad kernel: fault virtual address  = 0x61
Jan  3 10:38:05 Thinkpad kernel: fault code             = supervisor read data, page not present
Jan  3 10:38:05 Thinkpad kernel: instruction pointer    = 0x20:0xffffffff807331b8
Jan  3 10:38:05 Thinkpad kernel: stack pointer          = 0x28:0xfffffe00c7046b60
Jan  3 10:38:05 Thinkpad kernel: frame pointer          = 0x28:0xfffffe00c7046ba0
Jan  3 10:38:05 Thinkpad kernel: code segment           = base rx0, limit 0xfffff, type 0x1b
Jan  3 10:38:05 Thinkpad kernel:                        = DPL 0, pres 1, long 1, def32 0, gran 1
Jan  3 10:38:05 Thinkpad kernel: processor eflags       = interrupt enabled, resume, IOPL = 0
Jan  3 10:38:05 Thinkpad kernel: current process                = 0 (linuxkpi_short_wq_3)
Jan  3 10:38:05 Thinkpad kernel: trap number            = 12
Jan  3 10:38:05 Thinkpad kernel: panic: page fault
Jan  3 10:38:05 Thinkpad kernel: cpuid = 0
Jan  3 10:38:05 Thinkpad kernel: time = 1672738222
Jan  3 10:38:05 Thinkpad kernel: Uptime: 1d16h37m4s
Jan  3 10:38:05 Thinkpad kernel: Automatic reboot in 15 seconds - press a key on the console to abort
Jan  3 10:38:05 Thinkpad kernel: Rebooting...
Jan  3 10:38:05 Thinkpad kernel: ---<<BOOT>>---
Comment 10 jakub_lach 2023-01-24 10:55:04 UTC
(In reply to jakub_lach from comment #9)

FreeBSD 13.1-STABLE #0 stable/13-c84ec3076 amd64

$ pkg info | grep 'kmod'                                            
drm-510-kmod-5.10.163          DRM drivers modules

$ pkg info | grep 'mesa' 
mesa-demos-8.4.0_3             OpenGL demos distributed with Mesa
mesa-dri-22.3.3_2              OpenGL hardware acceleration drivers for DRI2+
mesa-libs-22.3.3_1             OpenGL libraries that support GLX and EGL clients

$ pkg info | grep 'intel' 
devcpu-data-intel-20221108     Intel CPU microcode updates
libva-intel-driver-2.4.1_1     VAAPI legacy driver for Intel GMA 4500 (Gen4) to UHD 630 (Gen9.5)
xf86-video-intel-2.99.917.916_2,1 X.Org legacy driver for Intel integrated graphics chipsets

$ pkg info | grep 'drm'   
drm-510-kmod-5.10.163          DRM drivers modules
libdrm-2.4.114,1               Userspace interface to kernel Direct Rendering Module services

Jan 24 11:49:15 Thinkpad syslogd: kernel boot file is /boot/kernel/kernel
Jan 24 11:49:15 Thinkpad kernel: drmn0: [drm] GPU HANG: ecode 4:1:cecffffb, in MainThread [100639]
Jan 24 11:49:15 Thinkpad kernel: drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
Jan 24 11:49:15 Thinkpad kernel: drmn0: [drm] Xorg[100639] context reset due to GPU hang
Jan 24 11:49:15 Thinkpad kernel: drmn0: [drm] GPU HANG: ecode 4:1:cecffffb, in MainThread [100639]
Jan 24 11:49:15 Thinkpad kernel:
Jan 24 11:49:15 Thinkpad syslogd: last message repeated 1 times
Jan 24 11:49:15 Thinkpad kernel: Fatal trap 12: page fault while in kernel mode
Jan 24 11:49:15 Thinkpad kernel: cpuid = 1; apic id = 01
Jan 24 11:49:15 Thinkpad kernel: fault virtual address  = 0x61
Jan 24 11:49:15 Thinkpad kernel: fault code             = supervisor read data, page not present
Jan 24 11:49:15 Thinkpad kernel: instruction pointer    = 0x20:0xffffffff80736298
Jan 24 11:49:15 Thinkpad kernel: stack pointer          = 0x28:0xfffffe00baa97b60
Jan 24 11:49:15 Thinkpad kernel: frame pointer          = 0x28:0xfffffe00baa97ba0
Jan 24 11:49:15 Thinkpad kernel: code segment           = base rx0, limit 0xfffff, type 0x1b
Jan 24 11:49:15 Thinkpad kernel:                        = DPL 0, pres 1, long 1, def32 0, gran 1
Jan 24 11:49:15 Thinkpad kernel: processor eflags       = interrupt enabled, resume, IOPL = 0
Jan 24 11:49:15 Thinkpad kernel: current process                = 0 (linuxkpi_short_wq_3)
Jan 24 11:49:15 Thinkpad kernel: trap number            = 12
Jan 24 11:49:15 Thinkpad kernel: panic: page fault
Jan 24 11:49:15 Thinkpad kernel: cpuid = 1
Jan 24 11:49:15 Thinkpad kernel: time = 1674557051
Jan 24 11:49:15 Thinkpad kernel: Uptime: 3d3h49m42s
Jan 24 11:49:15 Thinkpad kernel: Automatic reboot in 15 seconds - press a key on the console to abort
Jan 24 11:49:15 Thinkpad kernel: Rebooting...
Jan 24 11:49:15 Thinkpad kernel: cpu_reset: Restarting BSP
Jan 24 11:49:15 Thinkpad kernel: cpu_reset_proxy: Stopped CPU 1
Jan 24 11:49:15 Thinkpad kernel: ---<<BOOT>>---
Comment 11 jakub_lach 2023-01-28 10:33:16 UTC
(In reply to jakub_lach from comment #10)

FreeBSD 13.1-STABLE #0 stable/13-df8c42f5a amd64

$ pkg info | grep 'kmod' 
drm-510-kmod-5.10.163          DRM drivers modules

$ pkg info | grep 'mesa' 
mesa-demos-8.4.0_3             OpenGL demos distributed with Mesa
mesa-dri-22.3.4                OpenGL hardware acceleration drivers for DRI2+
mesa-libs-22.3.4               OpenGL libraries that support GLX and EGL clients

$ pkg info | grep 'intel' 
devcpu-data-intel-20221108     Intel CPU microcode updates
libva-intel-driver-2.4.1_1     VAAPI legacy driver for Intel GMA 4500 (Gen4) to UHD 630 (Gen9.5)
xf86-video-intel-2.99.917.916_2,1 X.Org legacy driver for Intel integrated graphics chipsets

$ pkg info | grep 'drm'  
drm-510-kmod-5.10.163          DRM drivers modules
libdrm-2.4.114,1               Userspace interface to kernel Direct Rendering Module services

Jan 28 11:25:15 Thinkpad kernel: drmn0: [drm] GPU HANG: ecode 4:1:fdfffffb, in MainThread [100643]
Jan 28 11:25:15 Thinkpad kernel: drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
Jan 28 11:25:15 Thinkpad kernel: drmn0: [drm] Xorg[100643] context reset due to GPU hang
Jan 28 11:28:55 Thinkpad syslogd: kernel boot file is /boot/kernel/kernel
Jan 28 11:28:55 Thinkpad kernel: drmn0: [drm] GPU HANG: ecode 4:1:ca8ffffb, in MainThread [100643]
Jan 28 11:28:55 Thinkpad kernel:
Jan 28 11:28:55 Thinkpad syslogd: last message repeated 1 times
Jan 28 11:28:55 Thinkpad kernel: Fatal trap 12: page fault while in kernel mode
Jan 28 11:28:55 Thinkpad kernel: cpuid = 0; apic id = 00
Jan 28 11:28:55 Thinkpad kernel: fault virtual address  = 0x61
Jan 28 11:28:55 Thinkpad kernel: fault code             = supervisor read data, page not present
Jan 28 11:28:55 Thinkpad kernel: instruction pointer    = 0x20:0xffffffff80736d48
Jan 28 11:28:55 Thinkpad kernel: stack pointer          = 0x28:0xfffffe00c5fbdb60
Jan 28 11:28:55 Thinkpad kernel: frame pointer          = 0x28:0xfffffe00c5fbdba0
Jan 28 11:28:55 Thinkpad kernel: code segment           = base rx0, limit 0xfffff, type 0x1b
Jan 28 11:28:55 Thinkpad kernel:                        = DPL 0, pres 1, long 1, def32 0, gran 1
Jan 28 11:28:55 Thinkpad kernel: processor eflags       = interrupt enabled, resume, IOPL = 0
Jan 28 11:28:55 Thinkpad kernel: current process                = 0 (linuxkpi_short_wq_1)
Jan 28 11:28:55 Thinkpad kernel: trap number            = 12
Jan 28 11:28:55 Thinkpad kernel: panic: page fault
Jan 28 11:28:55 Thinkpad kernel: cpuid = 0
Jan 28 11:28:55 Thinkpad kernel: time = 1674901517
Jan 28 11:28:55 Thinkpad kernel: Uptime: 1d0h35m14s
Jan 28 11:28:55 Thinkpad kernel: Automatic reboot in 15 seconds - press a key on the console to abort
Jan 28 11:28:55 Thinkpad kernel: Rebooting...
Jan 28 11:28:55 Thinkpad kernel: ---<<BOOT>>---
Comment 12 jakub_lach 2023-02-07 12:26:46 UTC
(In reply to jakub_lach from comment #11)

FreeBSD 13.2-PRERELEASE #0 stable/13-3d7a88248 amd64

$ pkg info | grep 'kmod' 
drm-510-kmod-5.10.163          DRM drivers modules

$ pkg info | grep 'mesa'
mesa-demos-8.4.0_3             OpenGL demos distributed with Mesa
mesa-dri-22.3.4                OpenGL hardware acceleration drivers for DRI2+
mesa-libs-22.3.4               OpenGL libraries that support GLX and EGL clients

$ pkg info | grep 'intel' 
devcpu-data-intel-20221108     Intel CPU microcode updates
libva-intel-driver-2.4.1_1     VAAPI legacy driver for Intel GMA 4500 (Gen4) to UHD 630 (Gen9.5)
xf86-video-intel-2.99.917.916_3,1 X.Org legacy driver for Intel integrated graphics chipsets

$ pkg info | grep 'drm' 
drm-510-kmod-5.10.163          DRM drivers modules
libdrm-2.4.114,1               Userspace interface to kernel Direct Rendering Module services

Feb  7 13:16:42 Thinkpad syslogd: kernel boot file is /boot/kernel/kernel
Feb  7 13:16:42 Thinkpad kernel: drmn0: [drm] GPU HANG: ecode 4:1:cecffffb, in MainThread [100645]
Feb  7 13:16:42 Thinkpad kernel: drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
Feb  7 13:16:42 Thinkpad kernel: drmn0: [drm] Xorg[100645] context reset due to GPU hang
Feb  7 13:16:42 Thinkpad kernel: drmn0: [drm] GPU HANG: ecode 4:1:cecffffb, in MainThread [100645]
Feb  7 13:16:42 Thinkpad kernel:
Feb  7 13:16:42 Thinkpad syslogd: last message repeated 1 times
Feb  7 13:16:42 Thinkpad kernel: Fatal trap 12: page fault while in kernel mode
Feb  7 13:16:42 Thinkpad kernel: cpuid = 1; apic id = 01
Feb  7 13:16:42 Thinkpad kernel: fault virtual address  = 0x61
Feb  7 13:16:42 Thinkpad kernel: fault code             = supervisor read data, page not present
Feb  7 13:16:42 Thinkpad kernel: instruction pointer    = 0x20:0xffffffff80736578
Feb  7 13:16:42 Thinkpad kernel: stack pointer          = 0x28:0xfffffe00c7046b60
Feb  7 13:16:42 Thinkpad kernel: frame pointer          = 0x28:0xfffffe00c7046ba0
Feb  7 13:16:42 Thinkpad kernel: code segment           = base rx0, limit 0xfffff, type 0x1b
Feb  7 13:16:42 Thinkpad kernel:                        = DPL 0, pres 1, long 1, def32 0, gran 1
Feb  7 13:16:42 Thinkpad kernel: processor eflags       = interrupt enabled, resume, IOPL = 0
Feb  7 13:16:42 Thinkpad kernel: current process                = 0 (linuxkpi_short_wq_3)
Feb  7 13:16:42 Thinkpad kernel: trap number            = 12
Feb  7 13:16:42 Thinkpad kernel: panic: page fault
Feb  7 13:16:42 Thinkpad kernel: cpuid = 1
Feb  7 13:16:42 Thinkpad kernel: time = 1675772031
Feb  7 13:16:42 Thinkpad kernel: Uptime: 1d23h28m43s
Feb  7 13:16:42 Thinkpad kernel: Automatic reboot in 15 seconds - press a key on the console to abort
Feb  7 13:16:42 Thinkpad kernel: --> Press a key on the console to reboot,
Feb  7 13:16:42 Thinkpad kernel: --> or switch off the system now.
Feb  7 13:16:42 Thinkpad kernel: Rebooting...
Feb  7 13:16:42 Thinkpad kernel: cpu_reset: Restarting BSP
Feb  7 13:16:42 Thinkpad kernel: cpu_reset_proxy: Stopped CPU 1
Feb  7 13:16:42 Thinkpad kernel: ---<<BOOT>>---
Comment 13 jakub_lach 2023-02-13 20:16:37 UTC
FreeBSD 13.2-STABLE #0 stable/13-77733aaa5 amd64

$ pkg info | grep 'kmod' 
drm-510-kmod-5.10.163          DRM drivers modules

$ pkg info | grep 'mesa'
mesa-demos-8.4.0_3             OpenGL demos distributed with Mesa
mesa-dri-22.3.5                OpenGL hardware acceleration drivers for DRI2+
mesa-libs-22.3.5               OpenGL libraries that support GLX and EGL clients

$ pkg info | grep 'intel' 
devcpu-data-intel-20221108     Intel CPU microcode updates
libva-intel-driver-2.4.1_1     VAAPI legacy driver for Intel GMA 4500 (Gen4) to UHD 630 (Gen9.5)
xf86-video-intel-2.99.917.916_3,1 X.Org legacy driver for Intel integrated graphics chipsets

$ pkg info | grep 'drm'                                             
drm-510-kmod-5.10.163          DRM drivers modules
libdrm-2.4.114,1               Userspace interface to kernel Direct Rendering Module services

Feb 13 19:58:44 Thinkpad syslogd: kernel boot file is /boot/kernel/kernel
Feb 13 19:58:44 Thinkpad kernel: drmn0: [drm] GPU HANG: ecode 4:1:ca8ffffb, in MainThread [100633]
Feb 13 19:58:44 Thinkpad kernel: drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
Feb 13 19:58:44 Thinkpad kernel: drmn0: [drm] Xorg[100633] context reset due to GPU hang
Feb 13 19:58:44 Thinkpad kernel: drmn0: [drm] GPU HANG: ecode 4:1:ca8ffffb, in MainThread [100633]
Feb 13 19:58:44 Thinkpad kernel: 
Feb 13 19:58:44 Thinkpad syslogd: last message repeated 1 times
Feb 13 19:58:44 Thinkpad kernel: Fatal trap 12: page fault while in kernel mode
Feb 13 19:58:44 Thinkpad kernel: cpuid = 0; apic id = 00
Feb 13 19:58:44 Thinkpad kernel: fault virtual address  = 0x61
Feb 13 19:58:44 Thinkpad kernel: fault code             = supervisor read data, page not present
Feb 13 19:58:44 Thinkpad kernel: instruction pointer    = 0x20:0xffffffff80736c98
Feb 13 19:58:44 Thinkpad kernel: stack pointer          = 0x28:0xfffffe00c705ab60
Feb 13 19:58:44 Thinkpad kernel: frame pointer          = 0x28:0xfffffe00c705aba0
Feb 13 19:58:44 Thinkpad kernel: code segment           = base rx0, limit 0xfffff, type 0x1b
Feb 13 19:58:44 Thinkpad kernel:                        = DPL 0, pres 1, long 1, def32 0, gran 1
Feb 13 19:58:44 Thinkpad kernel: processor eflags       = interrupt enabled, resume, IOPL = 0
Feb 13 19:58:44 Thinkpad kernel: current process                = 0 (linuxkpi_short_wq_1)
Feb 13 19:58:44 Thinkpad kernel: trap number            = 12
Feb 13 19:58:44 Thinkpad kernel: panic: page fault
Feb 13 19:58:44 Thinkpad kernel: cpuid = 0
Feb 13 19:58:44 Thinkpad kernel: time = 1676312082
Feb 13 19:58:44 Thinkpad kernel: Uptime: 10h35m44s
Feb 13 19:58:44 Thinkpad kernel: Automatic reboot in 15 seconds - press a key on the console to abort
Feb 13 19:58:44 Thinkpad kernel: Rebooting...
Feb 13 19:58:44 Thinkpad kernel: ---<<BOOT>>---
Comment 14 jakub_lach 2023-02-20 10:19:25 UTC
(In reply to jakub_lach from comment #13)

Another reboot - first time after updating src, no trace in messages.

FreeBSD 13.2-STABLE #0 stable/13-f3f350c5c amd64

$ pkg info | grep 'kmod'
drm-510-kmod-5.10.163_2        DRM drivers modules

$ pkg info | grep 'mesa'
mesa-demos-8.4.0_3             OpenGL demos distributed with Mesa
mesa-dri-22.3.5                OpenGL hardware acceleration drivers for DRI2+
mesa-libs-22.3.5               OpenGL libraries that support GLX and EGL clients

$ pkg info | grep 'intel'                                
devcpu-data-intel-20230214     Intel CPU microcode updates
libva-intel-driver-2.4.1_1     VAAPI legacy driver for Intel GMA 4500 (Gen4) to UHD 630 (Gen9.5)
xf86-video-intel-2.99.917.916_3,1 X.Org legacy driver for Intel integrated graphics chipsets
Comment 15 Oleh Vinichenko 2023-02-20 13:06:37 UTC
now, that i updated to FreeBSD-CURRENT for testing purposes, i had less frequents hangs with master branch of drm-kmod ( which was linux-5.12.x ).
Now, after, drm-kmod update to linux-5.15.x hangs and GPU resets returned. System freezes and only hard reset required to bring system back.
Only thing i seeing is:
drmn0: [drm] GPU HANG: ecode 6:1:0ffe0000, in Renderer [101196]
drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
drmn0: [drm] Renderer[101196] context reset due to GPU hang
Notice, that i already reported this in:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=259670

To me this looks like a grave issue, which is not fully addressed and simple updates to newer linux graphic stack would not help. Since, it is really hard to get meaningful information and ability to debug by regular users, i propose this issues, which are quite common, to be reported to FreeBSD foundation/core developers as it looks like subtle bug within FreeBSD linuxkpi layer.
Comment 16 Emmanuel Vadot freebsd_committer freebsd_triage 2023-02-20 17:21:58 UTC
(In reply to Oleh Vinichenko from comment #15)

Do any of you have a way to reproduce this reliably ?
I've never encounter this problems on any of my i915 machine (3-4 machines here).
Comment 17 Oleh Vinichenko 2023-02-20 17:34:11 UTC
problem is that it happens out of nowhere, in random fashion. but always, when running firefox or any application that is using gpu. this happens on this particular box, so it might be specific to it's hardware. in #259670 i provided some information but i do not know how helpful it is, neither i have knowledge how to to debug such events. if there are steps or ideas how to  debug, please, share. it was very very reliable and stable prior to drm-kmod-5.4
Comment 18 jakub_lach 2023-02-20 17:43:37 UTC
(In reply to Emmanuel Vadot from comment #16)

Usually triggered in Firefox upon opening a dialog window (save file or similar).
Comment 19 jakub_lach 2023-02-20 17:45:49 UTC
(In reply to jakub_lach from comment #18)

Not always - usually after a longer uptime.
Comment 20 jakub_lach 2023-02-20 17:48:04 UTC
(In reply to jakub_lach from comment #19)

The hangs happened before drm-510-kmod _but_ they did not lead to panics.
Comment 21 Oleh Vinichenko 2023-02-20 17:59:46 UTC
i ran 20 random save page as, which saved pdf and it does not trigger anything
Comment 22 Emmanuel Vadot freebsd_committer freebsd_triage 2023-02-20 18:03:14 UTC
You both have old i915 chipset so I'm not surprised that only you two have reported this problem.
Comment 23 Oleh Vinichenko 2023-02-20 18:15:30 UTC
Fun. Seriously.
Comment 24 jakub_lach 2023-02-20 18:19:57 UTC
(In reply to Oleh Vinichenko from comment #21)

Here it would hang on first instance of dialog box spawn - or not at all (as in your case). 

To me it looks state-depended, that is, conditional on longer uptime, heavier memory usage (maybe).
Comment 25 jakub_lach 2023-02-21 02:12:35 UTC
(In reply to Oleh Vinichenko from comment #15)

Do you run GENERIC? I wonder if there is some relation to running custom kernel.
Comment 26 Oleh Vinichenko 2023-02-21 04:29:42 UTC
I use GENERIC.
Comment 27 Paul Dufresne 2023-02-21 17:59:20 UTC
On Linux a /sys/class/drm/card0/error is created on a GPU Hang...
Do we get that file on FreeBSD too?
Comment 28 Paul Dufresne 2023-02-21 18:09:54 UTC
Also found:
torvalds/linux:3d7cb6b0:drivers/gpu/drm/i915/i915_gpu_error.c
	}
	len = scnprintf(error->error_msg, sizeof(error->error_msg),
			"GPU HANG: ecode %d:%x:%08x",
			GRAPHICS_VER(error->i915), hung_classes,
			generate_ecode(first));
	if (first && first->context.pid) {

So first number after ecode would be the version of GPU (I think)... second an indication of the cause , and third I would guess the address.
Comment 29 Paul Dufresne 2023-02-21 18:20:18 UTC
(In reply to Paul Dufresne from comment #27)
looks like we have no sys/class...
and I doubt that creating the directory would help the driver write the information.
Comment 30 Paul Dufresne 2023-02-21 18:39:22 UTC
Have you tried the "modesetting" driver:

Like:
paul@dufresnep:~ $ cat /usr/local/etc/X11/xorg.conf.d/20-intel.conf 
Section "Device"
    Identifier "Card"
    #Driver "vesa"
    #Driver "intel"
    #Driver "scfb"
    # does not exist: Driver "glamor"
    Driver "modesetting"
EndSection
paul@dufresnep:~ $
Comment 31 jakub_lach 2023-02-21 19:15:24 UTC
(In reply to Paul Dufresne from comment #30)

Thanks, but this is hardly a solution. 

Note, that I had working/stable accelerated intel driver for years prior to drm-510-kmod. I've already experienced old drm, newcons and drm2/kms2 introduction. 

At that point, intel was more or less stable for me. Last crashes (and corruption) I had was related to forcing 'sna' acceleration method. 

At some point, I've used drm-legacy-kmod, as I had believed at that time it was last one supporting GM45. It was also stable iirc.

I could live with some hang ups as long as they did not crash the system (prior to drm-510-kmod).
Comment 32 Paul Dufresne 2023-02-21 21:28:33 UTC
(In reply to jakub_lach from comment #31)

>Note, that I had working/stable accelerated intel driver for years prior to  rm-510-kmod.
You seems to consider modesetting as no hardware acceleration.
Most likely, modesetting would use Glamor as hardware acceleration.
A kind of hardware acceleration that use the card 3D to achieve 2D acceleration.

Old 2D acceleration, is I believed, a bit forgotten by developers... because they prefer Glamor.

Also, and I am less sure... I think Glamor also used the drm stack... so testing with Intel would allows like help to determine if the problem is within the Xorg Intel driver or in the drm stack.

But yeah... frankly I think most people should just use modesetting with Glamor acceleration and avoid the Xorg intel driver.
Comment 33 jakub_lach 2023-02-21 22:27:21 UTC
(In reply to Paul Dufresne from comment #32)

I've accidentally used modesetting a while ago, it was noticeably slower than intel (that's how I've noticed something was not right).
Comment 34 jakub_lach 2023-02-22 11:59:33 UTC
(In reply to jakub_lach from comment #33)

FWIW, modesetting is now still somewhat slower in real applications (ioquake3), though faster in glxgears (using glamor) and YouTube performance is approximately the same. 

I think the last time I've checked it could be preglamor. I will probably run modesetting now to check if the linuxkpi problem reoccurs.
Comment 35 jakub_lach 2023-02-22 12:32:01 UTC
(In reply to jakub_lach from comment #31)

FWIW, I've noticed that intel uses 'UXA' by default here. I've tried forcing SNA (as I had in the past) - there was no corruption and glxgears were faster than with modesetting, however I've got instacrash when trying to run Firefox 

Feb 22 13:19:51 Thinkpad syslogd: last message repeated 1 times
Feb 22 13:25:46 Thinkpad syslogd: kernel boot file is /boot/kernel/kernel
Feb 22 13:25:46 Thinkpad kernel:
Feb 22 13:25:46 Thinkpad syslogd: last message repeated 1 times
Feb 22 13:25:46 Thinkpad kernel: Fatal trap 12: page fault while in kernel mode
Feb 22 13:25:46 Thinkpad kernel: cpuid = 1; apic id = 01
Feb 22 13:25:46 Thinkpad kernel: fault virtual address  = 0x0
Feb 22 13:25:46 Thinkpad kernel: fault code             = supervisor read data, page not present
Feb 22 13:25:46 Thinkpad kernel: instruction pointer    = 0x20:0xffffffff82055cc7
Feb 22 13:25:46 Thinkpad kernel: stack pointer          = 0x28:0xfffffe00c859ad00
Feb 22 13:25:46 Thinkpad kernel: frame pointer          = 0x28:0xfffffe00c859ad70
Feb 22 13:25:46 Thinkpad kernel: code segment           = base rx0, limit 0xfffff, type 0x1b
Feb 22 13:25:46 Thinkpad kernel:                        = DPL 0, pres 1, long 1, def32 0, gran 1
Feb 22 13:25:46 Thinkpad kernel: processor eflags       = interrupt enabled, resume, IOPL = 0
Feb 22 13:25:46 Thinkpad kernel: current process                = 0 (i915-userptr-acquir)
Feb 22 13:25:46 Thinkpad kernel: trap number            = 12
Feb 22 13:25:46 Thinkpad kernel: panic: page fault
Feb 22 13:25:46 Thinkpad kernel: cpuid = 1
Feb 22 13:25:46 Thinkpad kernel: time = 1677068559
Feb 22 13:25:46 Thinkpad kernel: Uptime: 6m55s
Feb 22 13:25:46 Thinkpad kernel: Automatic reboot in 15 seconds - press a key on the console to abort
Feb 22 13:25:46 Thinkpad kernel: --> Press a key on the console to reboot,
Feb 22 13:25:46 Thinkpad kernel: --> or switch off the system now.
Feb 22 13:25:46 Thinkpad kernel: Rebooting...
Feb 22 13:25:46 Thinkpad kernel: cpu_reset: Restarting BSP
Feb 22 13:25:46 Thinkpad kernel: cpu_reset_proxy: Stopped CPU 1
Comment 36 Paul Dufresne 2023-02-22 16:59:43 UTC
Sorry... I am still a bit newbie.

Is the information you give are in /var/messages files, or is it you that get it from procedure described in:
https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/

I note that:
dumpon -l
shows the device where crash info is saved in swap area before rebooting after a kernel panic

savecore -C -v
(to be done after the reboot I think)
would show if a crash report exist

if it exist:
savecore -v
should save core crash in /var/crash
while
savecore -v .
would save core crash in current directory

then:
kgdb -n last
would get you in kernel debugger where info you given would be shown.

but the interesting part is after, read lines 41 and 53 in the:
https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/

where you get shown source code, and the backtrace.
Comment 37 jakub_lach 2023-02-22 17:09:29 UTC
(In reply to Paul Dufresne from comment #36)

Those are from /var/log/messages, as a side note that SNA is not stable for me with intel on gen4 (GM45). I do not have debug on/crash dumps enabled as of now (iirc compiling Firefox with debug was almost impossible for me).
Comment 38 Paul Dufresne 2023-02-22 17:40:35 UTC
I think files won't happens in /var/crash until you do (after reboot):
savecore -v
Comment 39 Paul Dufresne 2023-02-22 17:50:32 UTC
I think you would need to rebuild the kernel with debug symbols:
so with a file like:
root@dufresnep:/var/crash # cat /etc/make.conf
WITH_DEBUG=YES
DEBUG_FLAGS= -g -O0
root@dufresnep:/var/crash # 

Read https://docs.freebsd.org/en/books/developers-handbook/kernelbuild/

Don't know hard it looks to you.
Comment 40 jakub_lach 2023-02-22 18:00:20 UTC
(In reply to Paul Dufresne from comment #39)

Yes, I know that I would need to (at least) enable debug and configure dump device to debug linuxkpi crash further. See my previous message (comment #6), last time I turned on debug with GENERIC, I couldn't replicate the crash.
Comment 41 Paul Dufresne 2023-02-22 18:33:02 UTC
Thinking aloud...
root@dufresnep:/var/crash # pkg which  /boot/modules/i915kms.ko
/boot/modules/i915kms.ko was installed by package drm-510-kmod-5.10.163_2

So, maybe it is not so much recompiling the kernel with debug symbols, but
more:
make sure:
root@dufresnep:/var/crash # cat /etc/make.conf
WITH_DEBUG=YES
DEBUG_FLAGS= -g -O0
root@dufresnep:/var/crash # 

root@dufresnep:/var/crash # cd /usr/ports/graphics/drm-kmod
# make
# make install
Comment 42 Paul Dufresne 2023-02-22 18:47:54 UTC
This would help to figure out problem with sna acceleration but not so much for GPU hang, because GPU hang have no clear point of failure...

What would help maybe with GPU hang is:
sysctl compat.linuxkpi.drm_debug:1

Maybe lower:
sys.class.drm.card0.engine.rcs0.heartbeat_interval_ms

like:
sys.class.drm.card0.engine.rcs0.heartbeat_interval_ms:1000
each second, rather than each 2.5 second (2500 ms)
Comment 43 Paul Dufresne 2023-02-22 23:21:06 UTC
I believe bug #268138 is the same.
Comment 44 Paul Dufresne 2023-02-23 01:25:22 UTC
(In reply to jakub_lach from comment #35)
I searched: i915-userptr-acquir
and come to this patch:
https://lore.kernel.org/lkml/20191010083524.098009692@linuxfoundation.org/

With that patch:
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203317
Fixes: 5cc9ed4b9a7a ("drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190708140327.26825-1-chris@chris-wilson.co.uk
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/gpu/drm/i915/i915_gem_userptr.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 05ae8c4a8a1b6..9760b67dab28b 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -691,7 +691,15 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj,
 
 	for_each_sgt_page(page, sgt_iter, pages) {
 		if (obj->mm.dirty)
-			set_page_dirty(page);
+			/*
+			 * As this may not be anonymous memory (e.g. shmem)
+			 * but exist on a real mapping, we have to lock
+			 * the page in order to dirty it -- holding
+			 * the page reference is not sufficient to
+			 * prevent the inode from being truncated.
+			 * Play safe and take the lock.
+			 */
+			set_page_dirty_lock(page);
 
 		mark_page_accessed(page);
 		put_page(page);
-- 
2.20.1

Need to find where the equivalent for us is and see if it is applied.
Comment 45 Jan Beich freebsd_committer freebsd_triage 2023-02-23 01:47:58 UTC
(In reply to Paul Dufresne from comment #44)
- userptr isn't supported on FreeBSD except the unstable version per https://github.com/FreeBSDDesktop/kms-drm/issues/197
- xf86-video-intel prefers the unstable version since https://gitlab.freedesktop.org/xorg/driver/xf86-video-intel/-/commit/dd66ba8e5666
- drm-515-kmod no longer supports the unstable version since https://github.com/torvalds/linux/commit/c6bcc0c2fdfd

Given the above one can try disabling userptr in xf86-video-intel e.g.,

--- src/sna/kgem.c.orig	2021-01-15 20:59:05 UTC
+++ src/sna/kgem.c
@@ -68,7 +68,7 @@ search_snoop_cache(struct kgem *kgem, unsigned int num
 #define DBG_NO_CACHE_LEVEL 0
 #define DBG_NO_CPU 0
 #define DBG_NO_CREATE2 0
-#define DBG_NO_USERPTR 0
+#define DBG_NO_USERPTR 1
 #define DBG_NO_UNSYNCHRONIZED_USERPTR 0
 #define DBG_NO_COHERENT_MMAP_GTT 0
 #define DBG_NO_LLC 0
Comment 46 Paul Dufresne 2023-02-24 01:02:25 UTC
For me sna works flawlessly (with uxa I get: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=269606 ).
On i3-8100...
root@dufresnep:/usr/home/paul # uname -v
FreeBSD 13.2-BETA2 releng/13.2-fbb102b2c GENERIC

I did not apply a patch like suggested in comment #45.
Comment 47 Paul Dufresne 2023-02-24 01:28:26 UTC
(In reply to Jan Beich from comment #45)
To test this you would do (as a newbie helping others):
as root (su -):
cd /usr/ports/x11-drivers/xf86-video-intel/
make clean
make fetch extract patch
cp work/xf86-video-intel-31486f40f8e8f8923ca0799aea84b58799754564/src/sna/kgem.c work/xf86-video-intel-31486f40f8e8f8923ca0799aea84b58799754564/src/sna/kgem.c.orig
nano work/xf86-video-intel-31486f40f8e8f8923ca0799aea84b58799754564/src/sna/kgem.c

[change line 71 to change 0 to 1]
Ctrl-O
make makepatch
make build
make deinstall
make install

exit
(to exit root)

Hope I am really helping.
Comment 48 Paul Dufresne 2023-02-24 02:01:45 UTC
Computer still working with previous patch...
But as it does not need it in my case, erasing it:
su
cd /usr/ports/x11-drivers/xf86-video-intel/
rm files/patch-src_sna_kgem.c 
make clean deinstall install
libtool --finish /usr/local/lib/xorg/modules/drivers

last step suggested by previous command output.
Comment 49 Paul Dufresne 2023-02-24 04:30:48 UTC
(In reply to Paul Dufresne from comment #47)

I did mislead you!

Don't do:
cp work/xf86-video-intel-31486f40f8e8f8923ca0799aea84b58799754564/src/sna/kgem.c work/xf86-video-intel-31486f40f8e8f8923ca0799aea84b58799754564/src/sna/kgem.c.orig
nano work/xf86-video-intel-31486f40f8e8f8923ca0799aea84b58799754564/src/sna/kgem.c

because the .orig file was existing... and so the file was patched

by doing this step I am overwriting the patch... erasing it, at the next step:
make makepatch (which will compare .c.orig with .c file).

Do that only when the .orig file did not exist.
Comment 50 Paul Dufresne 2023-02-24 04:35:25 UTC
(In reply to Paul Dufresne from comment #48)

Don't do that!

The file was patched... and by erasing the file you also erase the previous changes for FreeBSD!

Please edit the .c file to change back your change.

Then:
make makepatch

make build
make deinstall
make install
Comment 51 Paul Dufresne 2023-02-24 15:03:05 UTC
Tried NomadBSD with 13.1 version. (with intel sna accel method).
Had to place my file in /usr/local/share/X11/xorg.conf.d/
Works well with my i3-8100...
But I believe this is not the same driver:
nomad@NomadBSD ~> pkg info xf86-video-intel
xf86-video-intel-2.99.917.916_2,1
Name           : xf86-video-intel
Version        : 2.99.917.916_2,1
Installed on   : Wed Nov 30 13:03:23 2022 EST
Origin         : x11-drivers/xf86-video-intel
Architecture   : FreeBSD:13:amd64
Prefix         : /usr/local
Categories     : x11-drivers
Licenses       : MIT
Maintainer     : x11@FreeBSD.org
WWW            : https://01.org/linuxgraphics/
Comment        : X.Org legacy driver for Intel integrated graphics chipsets

I think you said the legacy version was removed in FreeBSD.

So unsure if it is the driver version... or because my GPU is more recent that it works for me.
Comment 52 jakub_lach 2023-02-24 15:47:18 UTC
(In reply to Paul Dufresne from comment #51)

It would be the same driver, if you would update your system in line with ports, however as yours is a newer card, it might just use sna by default. 

(In reply to Jan Beich from comment #45)

Currently trying to replicate with modesetting, if I try intel again (with sna), will try that, thanks.
Comment 53 Paul Dufresne 2023-02-24 17:08:29 UTC
Trying sna on intel on an other computer:
vgapci1@pci0:0:2:0:	class=0x030000 rev=0x06 hdr=0x00 vendor=0x8086 device=0x0412 subvendor=0x103c subdevice=0x18e7
    vendor     = 'Intel Corporation'
    device     = 'Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller'
    class      = display
    subclass   = VGA
nomad@NomadBSD /v/log [1]> uname -v
FreeBSD 13.1-RELEASE-p5 NOMADBSD
Seems to works fine too.
Comment 54 Paul Dufresne 2023-02-24 17:14:21 UTC
(In reply to Paul Dufresne from comment #53)
with uxa: not testing much... but working... with:
Feb 24 12:10:33 NomadBSD kernel: drmn1: <drmn> on vgapci1
Feb 24 12:10:33 NomadBSD kernel: vgapci1: child drmn1 requested pci_enable_io
Feb 24 12:10:33 NomadBSD syslogd: last message repeated 1 times
Feb 24 12:10:33 NomadBSD kernel: [drm] Unable to create a private tmpfs mount, hugepage support will be disabled(-19).
Feb 24 12:10:33 NomadBSD kernel: [drm] Got stolen memory base rxcc200000, size 0x2000000
Feb 24 12:10:33 NomadBSD kernel: sysctl_warn_reuse: can't re-use a leaf (hw.dri.debug)!
Feb 24 12:10:33 NomadBSD kernel: [drm] Initialized i915 1.6.0 20200917 for drmn1 on minor 0
Feb 24 12:10:34 NomadBSD kernel: VT: Replacing driver "efifb" with new "fb".
Feb 24 12:10:34 NomadBSD kernel: start FB_INFO:
Feb 24 12:10:34 NomadBSD kernel: type=11 height=900 width=1440 depth=32
Feb 24 12:10:34 NomadBSD kernel: pbase=0xd0000000 vbase=0xfffff800d0000000
Feb 24 12:10:34 NomadBSD kernel: name=drmn1 flags=0x0 stride=5760 bpp=32
Feb 24 12:10:34 NomadBSD kernel: end FB_INFO
Feb 24 12:10:34 NomadBSD kernel: sysctl: unknown oid 'dev.hdac.3.polling' at line 45
Feb 24 12:10:34 NomadBSD kernel: 
Feb 24 12:10:34 NomadBSD kernel: acpi_hp0: <HP ACPI-WMI Mapping> on acpi_wmi0
Feb 24 12:10:34 NomadBSD kernel: acpi_hp0: HP event GUID detected, installing event handler
Feb 24 12:10:34 NomadBSD kernel: acpi_video1: <ACPI video extension> on vgapci1
Feb 24 12:10:34 NomadBSD kernel: 
Feb 24 12:10:34 NomadBSD kernel: 
Feb 24 12:10:46 NomadBSD pulseaudio[1649]: [(null)] oss-util.c: '/dev/dsp0' doesn't support full duplex
Feb 24 12:10:46 NomadBSD pulseaudio[1649]: [(null)] oss-util.c: '/dev/dsp1' doesn't support full duplex
Feb 24 12:10:46 NomadBSD pulseaudio[1649]: [(null)] oss-util.c: '/dev/dsp2' doesn't support full duplex
Feb 24 12:10:46 NomadBSD pulseaudio[1649]: [(null)] oss-util.c: '/dev/dsp4' doesn't support full duplex
Feb 24 12:10:51 NomadBSD devd[811]: notify_clients: send() failed; dropping unresponsive client
Comment 55 jakub_lach 2023-02-24 18:30:30 UTC
(In reply to Paul Dufresne from comment #54)

You should verify /var/log/Xorg.0.log to check which accel method is used, Sandybridge and newer will probabaly default to sna.
Comment 56 jakub_lach 2023-02-24 19:23:34 UTC
(In reply to Jan Beich from comment #45)

FWIW, after applying your patch (added under xf86-video-intel/files to keep it simple), Firefox runs with intel/sna (GM45). No idea about stability. Previously I've dropped sna due to corruption (I believe this is solved) and occasional hangups.

As far as glxgears goes - sna > modesetting > uxa. YouTube performance is the same, 1080p being maximum usable resolution.
Comment 57 jakub_lach 2023-02-24 19:29:54 UTC
(In reply to jakub_lach from comment #56)

FWIW, I'v triggered similar corruption with SNA just now (as previously reported) https://bz-attachments.freebsd.org/attachment.cgi?id=231765 though this time in Firefox save dialog - that's still better than hangup/crash with uxa in similar scenario.
Comment 58 Paul Dufresne 2023-02-24 22:45:28 UTC
(In reply to jakub_lach from comment #55)

man intel (on FreeBSD) shows that UXA is default, and I can confirm that from Xorg.0.log:
[    52.583] (II) Module "dri2" already built-in
[    52.583] (II) intel(0): Allocated new frame buffer 1280x1024 stride 5120, tiled
[    52.606] (II) UXA(0): Driver registered support for the following operations:
[    52.606] (II)         solid
[    52.606] (II)         copy
[    52.606] (II)         put_image
[    52.606] (II)         get_image
[    52.606] (II) intel(0): [DRI2] Setup complete
[    52.606] (II) intel(0): [DRI2]   DRI driver: iris
[    52.606] (II) intel(0): [DRI2]   VDPAU driver: va_gl

But on Ubuntu:
https://manpages.ubuntu.com/manpages/xenial/man4/intel.4.html
default is sna, apparently

There is also "AccelMethod" "blt": (limited UXA)
or "blt" to disable render acceleration and only use the BLT engine

I found info on BLT engine in section 3.2.2.2 of:
https://www.intel.ca/content/dam/doc/datasheet/atom-d2000-n2000-vol-1-datasheet.pdf

You might want to check:
      Option "RelaxedFencing" "boolean"
              This option controls whether we attempt to allocate the minimal
              amount of memory required for the buffers. The reduction in
              working set has a substantial improvement on system performance.
              However, this has been demonstrate to be buggy on older hardware
              (845-865 and 915-945, but ok on PineView and later) so on those
              chipsets defaults to off.

              Default: Enabled for G33 (includes PineView), and later, class
              machines.
Should be decided by your GPU, but might make sure.
Comment 59 Paul Dufresne 2023-02-25 00:26:35 UTC
We use UXA by default because:
# XXX bug 214593: SNA crashes on pre-SandyBridge hardware
CONFIGURE_ARGS+=--with-default-accel=uxa
See bug #214593
Comment 60 Paul Dufresne 2023-02-25 14:40:50 UTC
Running with Firefox for a few hours on a fresh install of:
root@Simbad:/var/log # freebsd-version -ru
13.2-BETA3
13.2-BETA3
not yet officially released, and with a similar GPU than jakub:
root@Simbad:/var/log # pciconf -lev
...
vgapci0@pci0:0:2:0:	class=0x030000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x2e12 subvendor=0x103c subdevice=0x3048
    vendor     = 'Intel Corporation'
    device     = '4 Series Chipset Integrated Graphics Controller'
    class      = display
    subclass   = VGA

Now using UXA, been able to use sna without recompiling anything.

I am thinking about installing 13.1 to see if I can replicate this bug.
Comment 61 jakub_lach 2023-02-25 16:21:24 UTC
(In reply to jakub_lach from comment #34)

There have been no problems with modesetting here so far, apart from one message (no other effect) - 

Feb 25 09:00:33 Thinkpad kernel: drmn0: [drm] *ERROR* CPU pipe A FIFO underrun
Comment 62 jakub_lach 2023-02-25 16:24:26 UTC
(In reply to Paul Dufresne from comment #60)

The thing I was reporting with intel/uxa with GM45 is not something I could reliably replicate in few hours. The crashes I've reported are few days apart (see also included uptime).
Comment 63 jakub_lach 2023-02-28 12:40:37 UTC
(In reply to jakub_lach from comment #62)

As far as I can tell, the conclusion is:

1. I cannot replicate linuxkpi crash with modesetting.
2. linuxkpi crash only occurs here with UXA.
3. SNA needs Jan Beich's patch from comment #45, to not crash when starting Firefox. With patch, there were no crashes/hangs, however he corruption appears in similar scenarios to crashes with UXA (dialog spawn in Firefox).