Bug 133144 - [linux] linuxulator 2.6 crashes with nvidias libGL.so.1
Summary: [linux] linuxulator 2.6 crashes with nvidias libGL.so.1
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 8.0-CURRENT
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-emulation (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-03-28 00:00 UTC by Alexander Best
Modified: 2015-01-27 14:09 UTC (History)
1 user (show)

See Also:


Attachments
gridwars.kdump (32.23 KB, application/octet-stream)
2009-03-28 12:27 UTC, Alexander Best
no flags Details
symlinks.sh.txt (175 bytes, text/plain)
2010-01-06 16:20 UTC, Alexander Best
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Best 2009-03-28 00:00:12 UTC
with compat.linux.osrelease=2.6.16 and linux_base-f8 almost every 3d linux application crashes when using the closed source nvidia driver. when switching to graphics/linux_dri thus replacing the nvidia linux version of libGL.so.1 the error disappears.

it seems the linuxulator 2.6 is missing a vital syscall (or doesn't fully support it) which is required by the nvidia version of libGL.so.1.

switching to compat.linux.osrelease=2.4.2 and replacing linux_base-f8 with linux_base-fc4 resolves the problem.

here are 2 excerpts from a linux_kdump:

dump from unreal tournament 2004 demo:

---
  1180 ut2004-bin RET   close 0
  1180 ut2004-bin CALL  linux_brk(0xae5c000)
  1180 ut2004-bin RET   linux_brk 182829056/0xae5c000
  1180 ut2004-bin CALL  linux_getpid
  1180 ut2004-bin RET   linux_getpid 1180/0x49c
  1180 ut2004-bin CALL  linux_getpid
  1180 ut2004-bin RET   linux_getpid 1180/0x49c
  1180 ut2004-bin CALL  linux_getpid
  1180 ut2004-bin RET   linux_getpid 1180/0x49c
  1180 ut2004-bin CALL
  linux_sys_futex(0x2b406e30,0x81,0x7fffffff,0,0x49c,0x7)
  1180 ut2004-bin RET   linux_sys_futex 1
  1180 ut2004-bin PSIG  SIGSEGV caught handler=0x874bd50 mask=0x0 code=0x0
  1180 ut2004-bin CALL  linux_fstat64(0x1,0xbfbfa9e8,0x28fe8ff4)
  1180 ut2004-bin UNKNOWN(8)    1180 ut2004-bin RET   linux_fstat64 0
  1180 ut2004-bin CALL  linux_mmap2(0,0x1000,0x3,0x22,0xffffffff,0)
  1180 ut2004-bin RET   linux_mmap2 688971776/0x2910e000
  1180 ut2004-bin CALL  write(0x1,0x2910e000,0x25)
  1180 ut2004-bin GIO   fd 1 wrote 37 bytes
       "Signal: SIGSEGV [segmentation fault]
       "
  1180 ut2004-bin RET   write 37/0x25
  1180 ut2004-bin CALL  write(0x1,0x2910e000,0xa)
  1180 ut2004-bin GIO   fd 1 wrote 10 bytes
       "Aborting.
       "
  1180 ut2004-bin RET   write 10/0xa
  1180 ut2004-bin CALL  write(0x1,0x2910e000,0x1)
  1180 ut2004-bin GIO   fd 1 wrote 1 byte
       "
       "
  1180 ut2004-bin RET   write 1
  1180 ut2004-bin CALL  write(0x1,0x2910e000,0x1)
  1180 ut2004-bin GIO   fd 1 wrote 1 byte
       "
       "
  1180 ut2004-bin RET   write 1
  1180 ut2004-bin CALL  write(0x1,0x2910e000,0x31)
  1180 ut2004-bin GIO   fd 1 wrote 49 bytes
       "Crash information will be saved to your logfile.
       "
  1180 ut2004-bin RET   write 49/0x31
  1180 ut2004-bin CALL
  linux_sys_futex(0x28feba34,0x81,0x7fffffff,0,0xbfbfab14,0xbfbfaaec)
  1180 ut2004-bin RET   linux_sys_futex 1
  1180 ut2004-bin CALL
  linux_sys_futex(0x28e8eb48,0x81,0x7fffffff,0,0xbfbfaa30,0xbfbfa93c)
  1180 ut2004-bin RET   linux_sys_futex 1
  1180 ut2004-bin CALL  write(0x4,0x937c3c8,0xc)
---

dump from quake 4 demo:

---
  1285 quake4.x86 RET   close 0
  1285 quake4.x86 CALL  linux_getpid
  1285 quake4.x86 RET   linux_getpid 1285/0x505
  1285 quake4.x86 CALL  linux_getpid
  1285 quake4.x86 RET   linux_getpid 1285/0x505
  1285 quake4.x86 CALL  linux_getpid
  1285 quake4.x86 RET   linux_getpid 1285/0x505
  1285 quake4.x86 CALL
  linux_sys_futex(0x2dbece30,0x81,0x7fffffff,0,0x505,0x7)
  1285 quake4.x86 RET   linux_sys_futex 1
  1285 quake4.x86 PSIG  SIGSEGV caught handler=0x8254b10 mask=0x0 code=0x0
  1285 quake4.x86 CALL
  linux_sys_futex(0x286cd620,0x81,0x7fffffff,0,0x505,0xbfbfc51c)
  1285 quake4.x86 RET   linux_sys_futex 1
  1285 quake4.x86 CALL  write(0x1,0x283dd000,0x22)
  1285 quake4.x86 GIO   fd 1 wrote 34 bytes
       "signal caught: Segmentation fault
       "
  1285 quake4.x86 RET   write 34/0x22
  1285 quake4.x86 CALL  write(0x1,0x283dd000,0xa)
  1285 quake4.x86 GIO   fd 1 wrote 10 bytes
       "si_code 1
       "
  1285 quake4.x86 RET   write 10/0xa
  1285 quake4.x86 CALL  write(0x1,0x283dd000,0x1c)
  1285 quake4.x86 GIO   fd 1 wrote 28 bytes
       "Trying to exit gracefully..
       "
  1285 quake4.x86 RET   write 28/0x1c
  1285 quake4.x86 CALL  write(0x1,0x283dd000,0x2e)
  1285 quake4.x86 GIO   fd 1 wrote 46 bytes
       "--------------- BSE Shutdown ----------------
       "
  1285 quake4.x86 RET   write 46/0x2e
  1285 quake4.x86 CALL  write(0x1,0x283dd000,0x2e)
  1285 quake4.x86 GIO   fd 1 wrote 46 bytes
       "---------------------------------------------
       "
  1285 quake4.x86 RET   write 46/0x2e
  1285 quake4.x86 CALL  write(0x1,0x283dd000,0x35)
  1285 quake4.x86 GIO   fd 1 wrote 53 bytes
       "WARNING: rvServerScanGUI::Clear() - invalid scanGUI

       "
  1285 quake4.x86 RET   write 53/0x35
  1285 quake4.x86 CALL  munmap(0x2d0ee000,0x101000)
  1285 quake4.x86 RET   munmap 0
  1285 quake4.x86 CALL  munmap(0x2d1ef000,0x101000)
---

for a discussion concerning this problem please take a look at the following thread:

http://lists.freebsd.org/pipermail/freebsd-current/2009-March/004563.html

i'm not sure the linux_kdump excerpts document the actual problem. if a complete dump is required (~40MB) or a different excerpt please drop me a note.

i've also applied the futex patch. yet that didn't solve the issue. here's a linux_kdump from the quake 4 demo after appliying the patch:

---
1837 quake4.x86 CALL  linux_sys_futex(0x2dbece30,0x81,0x7fffffff,0,0x72d,0x7)
  1837 quake4.x86 RET   linux_sys_futex 0
  1837 quake4.x86 PSIG  SIGSEGV caught handler=0x8254b10 mask=0x0 code=0x0
  1837 quake4.x86 CALL
  linux_sys_futex(0x286ce620,0x81,0x7fffffff,0,0x72d,0xbfbfc4fc)
  1837 quake4.x86 RET   linux_sys_futex 0
  1837 quake4.x86 CALL  write(0x1,0x283dd000,0x22)
  1837 quake4.x86 GIO   fd 1 wrote 34 bytes
       "signal caught: Segmentation fault
       "
  1837 quake4.x86 RET   write 34/0x22
  1837 quake4.x86 CALL  write(0x1,0x283dd000,0xa)
  1837 quake4.x86 GIO   fd 1 wrote 10 bytes
       "si_code 1
       "
  1837 quake4.x86 RET   write 10/0xa
  1837 quake4.x86 CALL  write(0x1,0x283dd000,0x1c)
  1837 quake4.x86 GIO   fd 1 wrote 28 bytes
       "Trying to exit gracefully..
       "
---

cheers.
Comment 1 Alexander Best 2009-03-28 12:27:11 UTC
here's an entire linux_kdump from a little linux game called gridwars. it's a
lot smaller than those produced by unreal tournament 2004 or quake 4 so it
should be easier to find the problem.

cheers.
Comment 2 Boris B.Samorodov 2009-05-05 18:11:30 UTC
Can you test with graphics/linux-f8-dri?


WBR
-- 
bsam
Comment 3 Alexander Best 2009-05-05 18:34:44 UTC
thanks, but that's not really my goal. installing linux-f8-dri overwrites the
nvidia libraries. i'm able to run linux 3d apps after installing the linux-dri
port, but i want to run games with the nvidia libraries which are highly
optimized for nvidia graphic cards.

somebody needs to fix the linuxulator, because obviously it's buggy. at least
when emulating the 2.6 linux kernel.

cheers.
alex
Comment 4 Barbara 2009-05-10 23:30:25 UTC
I have the same problem here.
It was working on 6-STABLE and 7-STABLE using linux_base-fc4 and compat.l=
inux.osrelease: 2.4.2.
It never worked on 7-STABLE with linux_base-fc6/linux_base-f8 and compat.=
linux.osrelease: 2.6.16.
And the same setup it's not working on 8-CURRENT too.

I've tried with linux-enemyterritory but I'm getting:
    ...loading libGL.so.1: Received signal 11, exiting...
    Segmentation fault: 11

On dmesg I'm getting the following 2 lines:
    pid 26151 (et.x86), uid 1001: exited on signal 11
    linux_sys_futex: unknown op 800164673


As the original OP said, it's working using libGL.so.1 from linux-f8-dri,=
 but with very bad performance.


$ uname -a
FreeBSD satanasso.local.net 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Sun May 1=
0 16:18:47 CEST 2009     root@satanasso.local.net:/usr/obj/usr/src/sys/SA=
TANASSO  i386

$ pkg_info -Ix linux nvidia
linux-enemyterritory-2.60b Wolfenstein: Enemy Territory (Linux version)
linux-f8-dri-7.0.2  Mesa libGL runtime libraries and DRI drivers (Linux F=
edora 
linux-f8-expat-2.0.1 Linux/i386 binary port of Expat XML-parsing library =
(Linux 
linux-f8-fontconfig-2.4.2 An XML-based font configuration API for X Windo=
ws (Linux Fe
linux-f8-xorg-libs-7.3_2 Xorg libraries (Linux Fedora 8)
linux_base-f8-8_11  Base set of packages needed in Linux mode (for i386/a=
md64)
nvidia-driver-180.44 NVidia graphics card binary drivers for hardware Ope=
nGL ren

$ sysctl -a compat
compat.linux.oss_version: 198144
compat.linux.osrelease: 2.6.16
compat.linux.osname: Linux
Comment 5 Barbara 2009-05-12 20:34:48 UTC
As asked by Chagin Dmitry...
> hmm, please, make a trace by ktrace or truss.

You can find the full dump here:
http://filebin.ca/owgdhn/l_kdmp.bz2

And these are some lines from the end:

49332 et.x86 CALL linux_getpid
49332 et.x86 RET linux_getpid 49332/0xc0b4
49332 et.x86 CALL linux_modify_ldt(0x11,0xbfbfdaf4,0x10)
49332 et.x86 RET linux_modify_ldt 666/0x29a
49332 et.x86 PSIG SIGSEGV caught handler=3D0x808c720 mask=3D0x0 code=3D0x=
0
49332 et.x86 CALL linux_fstat64(0x1,0xbfbfd13c,0x2847aff4)
49332 et.x86 UNKNOWN(8) 49332 et.x86 RET linux_fstat64 0
49332 et.x86 CALL linux_mmap2(0,0x1000,0x3,0x22,0xffffffff,0)
49332 et.x86 RET linux_mmap2 760414208/0x2d530000
49332 et.x86 CALL write(0x1,0x2d530000,0x1f)
49332 et.x86 GIO fd 1 wrote 31 bytes
"Received signal 11, exiting...
"
49332 et.x86 RET write 31/0x1f
49332 et.x86 CALL linux_sys_futex(0x2847c0b0,0x2fb18b41,0x1,0x2847b4c0,0x=
d,0xbfbfd81c)
49332 et.x86 RET linux_sys_futex -1 errno 38 Socket operation on non-sock=
et
49332 et.x86 PSIG SIGSEGV SIG_DFL
49332 et.x86 NAMI "et.x86.core"


Hope it will help. Please ask if you need more info.

Thanks
Barbara
Comment 6 Alexander Best 2009-06-07 09:47:30 UTC
this problem report can be closed!

the reason all the linux 3d applications crashed was due to a wrong linux
libraries which got shipped with the nvidia freebsd driver. the fix will be in
one of the next driver releases.

for a quick fix do the following:

1. go to ftp://download.nvidia.com/XFree86/Linux-x86/ and enter the directory
which is named after the release of the nvidia drivers which you are currently
using. (`sysctl hw.nvidia.version`)
2. download the file NVIDIA-Linux-x86-XXX-pkg0.run (XXX being the relase
you're running)
3. sh NVIDIA-Linux-x86-XXX-pkg0.run -x (XXX being the relase you're running)
4. cp -pR NVIDIA-Linux-x86-XXX-pkg0/usr/lib/tls/libnvidia-tls.so.XXX \
/compat/linux/usr/lib (XXX being the relase you're running)

this should fix the issue and let you run linux 3d apps with
compat.linux.osrelease set to 2.6.16 and a linux linux_base port > fc4.

for more information have a look at this thread:
http://www.nvnews.net/vbulletin/showthread.php?t=129584

cheers.
Comment 7 Alexander Best 2009-06-28 00:34:23 UTC
i talked to zander who is responsible for the freebsd nvidia driver and he
said the following about this PR:

"the two libnvidia-tls libraries support different TLS models: the one
currently shipped with the NVIDIA FreeBSD graphics driver supports the
old-style TLS model, the tls/ one the new ELF TLS model. The crashes you were
seeing were not due to a problem with the Linux emulation layer. Future NVIDIA
FreeBSD graphics driver releases will automatically determine which library to
install."

so even if the modify_ldt() linux syscall isn't implemented properly, this PR
is not related to it.

oh...btw: there have been some changes to modify_ldt() in HEAD. i think
running the linux test project now passes that syscall.

cheers.
Comment 8 Gavin Atkinson freebsd_committer freebsd_triage 2009-10-31 14:57:41 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-emulation

Apparently this is actually a problem in our linuxulator, involving 
the threading model used.  Submitter will provide more details shortly.
Comment 9 Alexander Best 2009-10-31 15:48:33 UTC
it took some time to entirely identify the cause of the problems reported in
this PR. please disregard all previous comments trying to describe problem!
they merely dealt with symptoms and not the actual cause! they're superseded
by this comment!

1. although the problem report deals with a segfault related to a linux lib
supplied with the nvidia closed source freebsd driver the problem isn't
limited to this specific linux lib.

2. the problem should occur with any linux binary/lib which was built
under/for a linux version which uses one of the old linux threading models.
this comment from http://wiki.freebsd.org/linux-kernel provides a short
description of the problem:

"Linux has gone through two threading model changes. If a Linux application or
library has been linked against the old pthreads without fast TLS support or
pthreads with internal TLS support libraries it will segfault."

a detailed description of the threading situation under linux as well as under
freebsd can be found in this thread:
http://lists.freebsd.org/pipermail/freebsd-threads/2003-June/000530.html

3. the nvidia closed source drivers are no longer suffering from the problem
described in this PR. the reason for that is that during installation of the
driver an application is run which detects the linux kernel version. the
application detects whether libnvidia-tls.so (old threading model) or
libnvidia-tls.so (new threading model) needs to be installed. the old
threading model is used on linux kernel < 2.6, the new one on >= 2.6. the
symptoms described in this PR were caused by this libnvidia-tls.so the whole
time and NOT by libGL.so (it's merely linked against libnvidia-tls.so). the
following short statement by zander@nvidia.com is added as a reference:

"the two libnvidia-tls libraries support different TLS models: the one
currently shipped with the NVIDIA FreeBSD graphics driver supports the
old-style TLS model, the tls/ one the new ELF TLS model. The crashes you were
seeing were not due to a problem with the Linux emulation layer. Future NVIDIA
FreeBSD graphics driver releases will automatically determine which library to
install."

4. right now the only way to run linux bins/libs which got build against a
linux kernel with an old threading model is to alter compat.linux.osrelease
and revert to 2.4 linux emulation mode.

5. what needs to be done to solve this PR is to determine the threading model
of a bin/lib and a) figure out a way to execute it under 2.6 linux emulation
or b) issue a warning and abort execution.

right now this PR should be considered a 2.6.26 emulation stopper and makes it
impossible to remove 2.4.2 emulation legacy code since this would prevent
certain bins/libs to run at all.

alex
Comment 10 Barbara 2010-01-06 00:32:09 UTC
Ok, from what I've understood it should be a linuxlator problem.

Certainly it's because of my ignorance, but I'm a little confused, becaus=
e from what I've tested in the past (after the post by zander on nvidia f=
orum) and also form what I've got from your words (@ 2,3), the tls versio=
n should work, am I wrong?
The problem I'm facing is that now it's not working, so I made some tests=
(*) with wolfsp (games/rtcw) and the lastest version of different major v=
ersions of the nvidia driver:
180.60 -> it doesn't work, it works after replacing libnvidia-tls.so (not=
e that it's the same major version for which zander suggested the fix)
185.18.36 -> it works, no workaround required (nvidia fixed it on new ver=
sions?)
190.53 -> it doesn't work - even replacing libnvidia-tls.so
195.22 (ports)(**) -> it doesn't work - same as above(***)

So I'm wondering why it stopped working between 185 and 190? Shouldn't it=
 be working with the tls version?
It's a nvidia fault and should be reported, or >185 are exposing new "bug=
s" in linuxlator, or because of changes in linuxlator having a bad impact=
 on >185,...?

Sorry but my English is not good, so I hope I don't get misunderstood.

If you need more tests, kdump, or anything else, I will be happy to help.=


Sorry again and thank you for the patience...
Barbara


(*)
# uname -a
FreeBSD satanasso.local.net 8.0-STABLE FreeBSD 8.0-STABLE #0: Fri Jan  1 =
18:47:59 CET 2010     root@satanasso.local.net:/usr/obj/usr/src/sys/SATAN=
ASSO  i386

# sysctl compat.linux.osrelease
compat.linux.osrelease: 2.6.16

# pkg_info -Ix linux_base
linux_base-f10-10_2

(**)
wolfsp doesn't work *anymore* also on RELENG_7, linux_base-fc-4_15, compa=
t.linux.osrelease: 2.4.2. On July it was working.
Anyway, just to add more confusion, linux-enemyterritory is working!!!(?)=
 (not tested on RELENG_8).

(***)
...loading libGL.so.1: QGL_Init: Can't load libGL.so.1 from /etc/ld.so.co=
nf or current dir: /usr/local/share/rtcw/libGL.so.1: cannot open shared o=
bject file: No such file or directory
Comment 11 Alexander Best 2010-01-06 16:20:51 UTC
i remember having a similar problem a while ago. it seems some games from id
software use a hardcoded libGL.so path. please try if the attached script
solves the problem.

cheers.
alex

p.s.: please keep in mind that the nvidia drivers performs some checks in
places like /compat/linux/usr/{local|X11R6} and removes any graphic libs it
finds in those locations. that way nvidia wants to make sure that no existing
graphic libs conflict with their libs. this means you have to re-run the
script everytime you re-install the nvidia drivers.
Comment 12 Barbara 2010-01-06 17:32:41 UTC
> i remember having a similar problem a while ago. it seems some games fr=
om id
> software use a hardcoded libGL.so path. please try if the attached scri=
pt
> solves the problem.
> 
> cheers.
> alex

Yes, I know that perfectly: http://www.freebsd.org/cgi/query-pr.cgi?pr=3D=
118230. As you can see the one reporting that was me.
Thank you anyway.
The recent answer to that pr, now more than 2 years old, has been the rea=
son to do some tests and to report here the failures.

But that wasn't the problem, in fact rtcw and linux-enemyterritory never =
required that fix.

As wolfsp is working with 185.18.36 and not with 190.53, I was able to st=
art it (on both RELENG_7 and RELENG_8) with nvidia-driver-195.22 from por=
ts, setting the generated extension string to a pre 190 version:
$ __GL_ExtensionStringVersion=3D18999 wolfsp

Sorry for all the noise about that.
Maybe this should be added to rtcw pkg-message.in, I will ask the maintai=
ner.

Anyway, doing some more tests, it seems that linux-doom3 and linux-quake4=
, both working in the past, are now failing on RELENG_7. But I want to ch=
eck again to make sure that the ports are still installed correctly.
Then I tried installing linux-doom3 on RELENG_8 and surprisingly it works=
 perfectly! I'll try with linux-quake4 as soon as I can.

If someone need it I have ktrace/linux_kdump collected on RELENG_7 that I=
 can upload on the web.

Thanks
Barbara
Comment 13 Barbara 2010-01-30 15:49:35 UTC
For who is still interested, linux-quake4 works on RELENG_8.
It just needs some "updated" workarounds.
On RELENG_7, both linux-doom3 and linux-quake4 are working, but they need=
s some "new and updated" workarounds too.
For detail, look in my PR, ports/118230.

Best Regards
Barbara
Comment 14 Alexander Best freebsd_committer freebsd_triage 2010-08-23 23:41:43 UTC
State Changed
From-To: open->suspended

Suspend this PR for now. This can't be easily fixed. The Linux 2.6.x emulation 
layer is missing support for pre 2.6.x TLS models.
Comment 15 Johannes Jost Meixner freebsd_committer freebsd_triage 2015-01-27 14:09:24 UTC
Linuxulator 2.6 by now works fine with recent versions of x11/nvidia-driver (with LINUX option enabled.)