Bug 220444 - x11-servers/xorg-server crashes on attempt to play a video using VDPAU
Summary: x11-servers/xorg-server crashes on attempt to play a video using VDPAU
Status: Open
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-x11 mailing list
URL:
Keywords: crash, needs-qa
Depends on:
Blocks:
 
Reported: 2017-07-03 05:02 UTC by Mikhail Teterin
Modified: 2019-06-25 23:22 UTC (History)
1 user (show)

See Also:
rezny: maintainer-feedback+
koobs: merge-quarterly?


Attachments
xorg.conf as requested (5.20 KB, text/plain)
2017-07-03 13:46 UTC, Mikhail Teterin
no flags Details
Backtrace with debug information (3.61 KB, text/plain)
2017-07-03 14:08 UTC, Mikhail Teterin
no flags Details
Error instead of crashing (454 bytes, patch)
2019-06-24 23:54 UTC, Mikhail Teterin
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mikhail Teterin freebsd_committer 2017-07-03 05:02:30 UTC
After replacing graphics/dri with graphics/mesa-dri and upgrading everything else on the system, I tried to play a video using mplayer and the "vdpau" mode:

% mplayer -vo vdpau ....

To my surprise, the entire Xorg-process crashed:

Core was generated by `/opt/bin/X -listen tcp :0 -auth /var/db/xdm/authdir/authfiles/A:0-xxxxx'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00000008026b7a6a in thr_kill () from /lib/libc.so.7
[Current thread is 1 (LWP 100800)]
(gdb) where
#0  0x00000008026b7a6a in thr_kill () from /lib/libc.so.7
#1  0x00000008026b79a6 in raise () from /lib/libc.so.7
#2  0x00000008026b6149 in abort () from /lib/libc.so.7
#3  0x000000000059f35e in OsAbort ()
#4  0x000000000047cccb in ddxGiveUp ()
#5  0x00000000005a6022 in ?? ()
#6  0x00000000005a4ae8 in FatalError ()
#7  0x000000000059d0b7 in ?? ()
#8  0x0000000802367a26 in ?? () from /lib/libthr.so.3
#9  0x000000080236711c in ?? () from /lib/libthr.so.3
#10 <signal handler called>
#11 0x0000000000000000 in ?? ()
#12 0x0000000806f87c37 in ?? () from /opt/lib/libgbm.so.1
#13 0x0000000806d59f0b in glamor_back_pixmap_from_fd () from /opt/lib/xorg/modules/libglamoregl.so
#14 0x0000000806d59fb7 in glamor_pixmap_from_fd () from /opt/lib/xorg/modules/libglamoregl.so
#15 0x00000008066fa3e0 in ?? () from /opt/lib/xorg/modules/drivers/radeon_drv.so
#16 0x000000000056d50e in ?? ()
#17 0x000000000056cd4c in ?? ()
#18 0x00000000004304fd in ?? ()
#19 0x0000000000439e57 in ?? ()
#20 0x0000000000424f8f in _start ()

Please, advise. Thanks!
Comment 1 Jan Beich freebsd_committer 2017-07-03 07:00:21 UTC
What FreeBSD version/architecture? Which GPU? Can you upload Xorg.?.log? It seems you're using Mesa implementation of VDPAU. Can you bisect port updates to track down which one caused the regression? Otherwise, provide a stacktrace after building everything, or at least mesa-* and xorg-server, with debugging symbols.
Comment 2 Mikhail Teterin freebsd_committer 2017-07-03 13:46:23 UTC
Created attachment 184034 [details]
xorg.conf as requested

> What FreeBSD version/architecture?
FreeBSD-10.3-stable, amd64

> Which GPU?
The relevant part of Xorg.log is here:

[107685.614] (--) RADEON(0): Chipset: "ATI Radeon HD 5670" (ChipID = 0x68d8)
[107685.761] (II) RADEON(0): glamor detected, initialising EGL layer.
[107685.761] (II) RADEON(0): KMS Color Tiling: enabled
[107685.761] (II) RADEON(0): KMS Color Tiling 2D: enabled
[107685.761] (==) RADEON(0): TearFree property default: auto
[107685.761] (II) RADEON(0): KMS Pageflipping: enabled
[107685.763] (II) RADEON(0): Output DisplayPort-0 using monitor section Monitor0
[107685.829] (II) RADEON(0): Output HDMI-0 has no monitor section
[107685.895] (II) RADEON(0): Output DVI-0 has no monitor section

> It seems you're using Mesa implementation of VDPAU.
Is there any other option for Radeon?

> Can you upload Xorg.?.log?
Attaching... I have not changed it in a while -- but never attempted VDPAU before either.

> Can you bisect port updates to track down which one caused the regression?
No, this is my desktop -- can't have it "under maintenance" for that long :( Besides, the x11 and graphics ports changed too much since the last time...

> Otherwise, provide a stacktrace after building everything, or at least
> mesa-* and xorg-server, with debugging symbols.
According to the stack I already have, the crash is in libgbm.so.1 -- installed by mesa-libs. I'll rebuild _that_ WITH_DEBUG and try again. According to the stack, it tries to execute a function at address 0x0 -- like something remains simply uninitialized...
Comment 3 Mikhail Teterin freebsd_committer 2017-07-03 14:08:50 UTC
Created attachment 184035 [details]
Backtrace with debug information

The crash is right here -- one of the methods (createImageFromFds) is NULL:

(gdb) frame 12
#12 0x0000000806fc9882 in gbm_dri_bo_import (gbm=0x804049600, type=21763, 
    buffer=0x7fffffffe828, usage=0) at backends/dri/gbm_dri.c:921
921           image = dri->image->createImageFromFds(dri->screen,
(gdb) p dri->image
$1 = (const __DRIimageExtension *) 0x808a5f7a0
(gdb) p *dri->image
$2 = {base = {name = 0x808713f2f "DRI_IMAGE", version = 12}, 
  createImageFromName = 0x8083c0260, 
  createImageFromRenderbuffer = 0x8083c0310, destroyImage = 0x8083c0320, 
  createImage = 0x8083c0360, queryImage = 0x8083c04c0, dupImage = 0x8083c0680, 
  validateUsage = 0x8083c06d0, createImageFromNames = 0x8083c06e0, 
  fromPlanar = 0x8083c0860, createImageFromTexture = 0x8083c08f0, 
  createImageFromFds = 0x0, createImageFromDmaBufs = 0x0, 
  blitImage = 0x8083c0aa0, getCapabilities = 0x8083c0c50, 
  mapImage = 0x8083c0c70, unmapImage = 0x8083c0d00, 
  createImageWithModifiers = 0x0}

I'll keep the core around -- in case you need me to poke at any other variables...
Comment 4 Jan Beich freebsd_committer 2017-07-03 15:28:52 UTC
(In reply to Mikhail Teterin from comment #3)
> The crash is right here -- one of the methods (createImageFromFds) is NULL:

createImageFromFds needs DRM_CAP_PRIME according to src/gallium/state_trackers/dri/dri2.c but it's only known to work on drm-next branch.

>> It seems you're using Mesa implementation of VDPAU.
> Is there any other option for Radeon?

xf86-video-amdgpu with kernel from drm-next. It should be compatible with 10.3 world/userland as long as you don't remove corresponding COMPAT_FREEBSD* in kernel config.
Comment 5 Mikhail Teterin freebsd_committer 2017-07-03 15:35:42 UTC
(In reply to Jan Beich from comment #4)
> createImageFromFds needs DRM_CAP_PRIME according to src/gallium/state_trackers
> /dri/dri2.c but it's only known to work on drm-next branch.

Why, then, did the mesa-dri port even offer me the VDPAU-option? Sounds like it would not work on any official FreeBSD-version...

> xf86-video-amdgpu with kernel from drm-next

I would really prefer not to have to upgrade yet... Will it never work on 10-STABLE nor 11-STABLE?

So, have I reported a BUG, or is this a known limitation, which will not be addressed and the most I can expect is the VDPAU-option disabled to avoid crashes?..
Comment 6 Jan Beich freebsd_committer 2017-07-03 15:39:46 UTC
(In reply to Mikhail Teterin from comment #3)
> (gdb) frame 12
> #12 0x0000000806fc9882 in gbm_dri_bo_import (gbm=0x804049600, type=21763, 
>     buffer=0x7fffffffe828, usage=0) at backends/dri/gbm_dri.c:921
> 921           image = dri->image->createImageFromFds(dri->screen,

This call appeared since Mesa 17.1.0. Can you try to downgrade to Mesa 17.0.*? Alternatively, try to explicitly set Option "DRI" "2" in xorg.conf.

https://cgit.freedesktop.org/mesa/mesa/commit/?id=a43d286ef7ff
Comment 7 Mikhail Teterin freebsd_committer 2017-07-03 15:52:13 UTC
(In reply to Jan Beich from comment #6)
> try to explicitly set Option "DRI" "2" in xorg.conf
Yep! This worked. The video plays in the "vdpau" mode now, no crashes. Thank you!

I wonder, what I am missing, however, by not using DRI3 -- previously, mplayer could play using "gl" and "xv" modes without an obvious difference. Now it can also do VDPAU -- is there some kind of hardware acceleration now, that was not used before?

Would using DRI3 offer such an acceleration even if DRI2 does not? Playing YouTube videos inside Firefox, for example, still causes the browser to use 2-3 CPUs (according to top(1)), plus the Xorg process using 20-30% (I have 4 cores).

Should Firefox be able to use vdpau as well -- and consume less CPU as a result?
Comment 8 Jan Beich freebsd_committer 2017-07-03 16:36:38 UTC
(In reply to Mikhail Teterin from comment #7)
> I wonder, what I am missing, however, by not using DRI3

Maybe see https://en.wikipedia.org/wiki/Direct_Rendering_Infrastructure#DRI3

> Now it can also do VDPAU

mplayer (unlike mpv) is too old school to support VAAPI or VDPAU via -vo opengl.
VAAPI also supports encoding but I can only confirm it works on i965 + drm-next.

> Should Firefox be able to use vdpau as well -- and consume less CPU as a result?

Not yet: https://bugzilla.mozilla.org/show_bug.cgi?id=1210729

In some cases layers.acceleration.force-enabled=true may provide better compositing performance. However, I suspect the default won't change before WebRender circa FF57+.
Comment 9 Mikhail Teterin freebsd_committer 2017-07-03 23:15:48 UTC
(In reply to Jan Beich from comment #8)
> Maybe see https://en.wikipedia.org/wiki/Direct_Rendering_Infrastructure#DRI3
Thanks! Looks like I am not missing anything -- the DRI3 may make coding things up easier for developers, but whatever already works, works just as well.

My own test with mplayer using different video-output settings (-vo foo) with the same video-file is:

vdpau:   108.792u 3.347s 9:30.43 19.6%   3934+697k 3+35io 2056pf+0w
xv:      107.773u 3.170s 9:29.83 19.4%   3936+697k 3+35io 2056pf+0w
gl_nosw: 107.862u 3.402s 9:29.85 19.5%   3935+697k 3+35io 2056pf+0w
x11:     108.078u 3.252s 9:29.82 19.5%   3943+699k 3+35io 2056pf+0w
sdl:     107.477u 3.406s 9:29.80 19.4%   3950+700k 3+35io 2056pf+0w

Thank you for your help, Jan. Maybe, DRI should be limited to 2 automatically, unless whatever functionality is necessary in kernel is detected at run-time?

> https://bugzilla.mozilla.org/show_bug.cgi?id=1210729
Neither the age of the bug, nor the treatment of the impatient comments there (labeling them as spam?) are encouraging :(

> layers.acceleration.force-enabled=true may provide better compositing
No obvious difference, unfortunately...
Comment 10 Jan Beich freebsd_committer 2017-07-04 12:08:48 UTC
(In reply to Jan Beich from comment #6)
> Can you try to downgrade to Mesa 17.0.*? 

Mikhail, you haven't provided feedback for this question. Knowing if this is a recent regression or an old bug is important if the maintainer would like to report upstream and come up with a better workaround.

For example, see bug 217664 which led to graphics/mesa-dri/files/patch-src_egl_drivers_dri2_platform__x11.c.
Comment 11 Mikhail Teterin freebsd_committer 2017-07-04 14:21:53 UTC
(In reply to Jan Beich from comment #10)
>> Can you try to downgrade to Mesa 17.0.*? 
> Mikhail, you haven't provided feedback for this question.
I'm sorry, but you indicated (in comment #6), that the alternative to downgrading is to try DRI2, which I did do -- and reported the success...

The answer to this question is "No, I can not". There are only so many times per weekend I am willing to restart my entire desktop. Also, the move from graphics/dri to graphics/mesa-dri (which, BTW, remains unmentioned in MOVED), makes the task more difficult then simply unrolling one revision at a time.

But, I thought, we've identified the true nature of the problem: VDPAU tries to exercise unimplemented DRI3 code -- unimplemented on all stock versions of FreeBSD (except, maybe, for NVidia users).

The fix would seem to be to simply check for the missing functionality at the driver-initialization time and limit the X11-server to DRI2 if xorg.conf does not provide any other value.

The unimplemented method (createImageFromFds) is not set to junk -- it is the good old NULL, which means, something set it to that value. Inserting the additional check into that code seems like an easy enough task for anyone familiar with the code.
Comment 12 Matthew Rezny freebsd_committer 2017-07-08 10:27:25 UTC
(In reply to Mikhail Teterin from comment #5)

The VDPAU option was added for users of drm-next and drangonfly as noted in pkg-help, which is accessible by pressing F1 in the ports options dialog.

The drm drivers in kernel are too old for VDPAU to work properly and unfortunately nobody has had the time to update them. There are newer drm drivers in the drm-next branch which rely on linuxkpi and thus are only available for amd64 currently.
Comment 13 Mikhail Teterin freebsd_committer 2019-06-24 03:40:20 UTC
I'm now seeing this again -- mplayer starts, the audio begins and the video window comes up. But then Xorg crashes:

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
(gdb) where
#0  0x00000000 in ?? ()
#1  0x28ef186d in ?? () from /opt/lib/libgbm.so.1
#2  0x28ef0f33 in gbm_bo_import () from /opt/lib/libgbm.so.1
#3  0x28ec7a1d in glamor_back_pixmap_from_fd ()
   from /opt/lib/xorg/modules/libglamoregl.so
#4  0x28ec7af1 in glamor_pixmap_from_fd ()
   from /opt/lib/xorg/modules/libglamoregl.so
#5  0x081c5646 in ?? ()
#6  0x081c4d98 in ?? ()
#7  0x081c51dc in ?? ()
#8  0x0807303b in ?? ()
#9  0x0807db97 in ?? ()
#10 0x080662fe in ?? ()
#11 0x08066164 in ?? ()
#12 0x08066038 in _start ()

Mplayer says:

Requested audio codec family [mpg123] (afm=mpg123) not available.
Enable it at compilation.
Opening audio decoder: [ffmpeg] FFmpeg/libavcodec audio decoders
AUDIO: 48000 Hz, 2 ch, floatle, 384.0 kbit/12.50% (ratio: 48000->384000)
Selected audio codec: [ffmp2float] afm: ffmpeg (FFmpeg MPEG layer-1 and layer-2 audio)
==========================================================================
AO: [oss] 48000Hz 2ch s16le (2 bytes per sample)
Starting playback...
The selected video_out device is incompatible with this codec.
Try appending the scale filter to your filter list,
e.g. -vf spp,scale instead of -vf spp.
Movie-Aspect is undefined - no prescaling applied.
VO: [vdpau] 1440x1080 => 1440x1080 Planar YV12 
Movie-Aspect is 1.78:1 - prescaling to correct movie aspect.
VO: [vdpau] 1440x1080 => 1920x1080 Planar YV12 
A: 776.0 V: 775.7 A-V:  0.287 ct:  0.000   2/  2 ??% ??% ??,?% 1 0 
Select error: Bad file descriptor

The computer is an older laptop running today's 12.0/i386 with:

      drmn0: <ATI Mobility Radeon HD 4670> on vgapci0

The problem strikes regardless of whether I use the stock kernel drm2/radeonkms modules or those from drm-legacy-kmod-g20190523.

Unfortunately, the old "DRI 2" trick is not helping this time -- I dropped the following dri.conf into /etc/X11/xorg.conf.d

Section "Device"
        Option "DRI" "2"
        Identifier "Card0"
EndSection

to no avail. mpv seems to work -- but it is not using vdpau.

Please, advise...
Comment 14 Mikhail Teterin freebsd_committer 2019-06-24 23:52:24 UTC
Recompiling with debugging, I get the following details:

(gdb) bt full
#0  0x00000000 in ?? ()
No symbol table info available.
#1  0x28f2748a in gbm_dri_bo_import (gbm=0x28c001c0, type=21763, buffer=0xffbfec20, usage=0) at backends/dri/gbm_dri.c:995
        fd_data = 0xffbfec20
        stride = 0
        offset = 0
        fourcc = 875713089
        dri = 0x28c001c0
        bo = 0x28f22980 <gbm_bo_import@got.plt>
        image = 0x28f24415
        dri_use = 0
        gbm_format = 686596608

...

The line is:

      image = dri->image->createImageFromFds(dri->screen,
                                             fd_data->width,
                                             fd_data->height,
                                             fourcc,
                                             &fd_data->fd, 1,
                                             &stride, &offset,
                                             NULL);

and the createImageFromFds is NULL:

(gdb) p dri->image->createImageFromFds
$2 = (__DRIimage *(*)(__DRIscreen *, int, int, int, int *, int, int *, int *, void *)) 0x0

Seems like the same error as reported two years ago, except the old workaround (limiting DRI to 2) does not help... Please, advise.
Comment 15 Mikhail Teterin freebsd_committer 2019-06-24 23:54:27 UTC
Created attachment 205318 [details]
Error instead of crashing

This patch causes the code return an error instead of crashing. MPlayer will now hang - but can be killed with Ctrl-C - and the server will keep running. Users will no longer suffer data-loss.
Comment 16 Jan Beich freebsd_committer 2019-06-25 11:44:02 UTC
(In reply to Mikhail Teterin from comment #14)
> Seems like the same error as reported two years ago, except the old
> workaround (limiting DRI to 2) does not help... Please, advise.

Try the patch mentioned in bug 225415 comment 22 or switch to graphics/drm-kmod.
Comment 17 Mikhail Teterin freebsd_committer 2019-06-25 23:22:32 UTC
(In reply to Jan Beich from comment #16)
> Try the patch mentioned in bug 225415 comment 22

Yes, that's the same patch I attached. It does not enable VDPAU -- it merely prevents Xorg crashing.

> or switch to graphics/drm-kmod.

graphics/drm-kmod simply means drm-legacy-kmod-g20190523.
 in my case. Which I already tried, as I say in Comment 14:

The problem strikes regardless of whether I use the stock kernel drm2/radeonkms modules or those from drm-legacy-kmod-g20190523. Here are the loaded kernel modules, according to "kldstat -v". Note, that the file paths are all from the /boot/modules (populated by port) rather than /boot/kernel:

...
 2    1 0x1cc00000    cf000 radeonkms.ko (/boot/modules/radeonkms.ko)
        Contains modules:
                 Id Name
                268 vgapci/radeonkms
                273 drmn/radeon_atom_hw_i2c
                270 radeon_iicbb/iicbb
                272 radeon_hw_i2c/iicbus
                269 drmn/radeon_iicbb
                271 drm/radeon_hw_i2c
 3    1 0x1cccf000    43000 drm2.ko (/boot/modules/drm2.ko)
        Contains modules:
                 Id Name
                266 drmn/drm_iic_dp_aux
                267 drmn
 4    1 0x1cd12000     2000 radeon_RV730_pfp_bin.ko (/boot/modules/radeon_RV730_pfp_bin.ko)
        Contains modules:
                 Id Name
                274 radeon_RV730_pfp_bin_fw
 5    1 0x1cd14000     3000 radeon_RV730_me_bin.ko (/boot/modules/radeon_RV730_me_bin.ko)
        Contains modules:
                 Id Name
                275 radeon_RV730_me_bin_fw
 6    1 0x1cd17000     3000 radeon_R700_rlc_bin.ko (/boot/modules/radeon_R700_rlc_bin.ko)
        Contains modules:
                 Id Name
                276 radeon_R700_rlc_bin_fw


Am I missing something?